Deep Neural Network (DNN)

1. Activation Function

An activation function performs normalization on the values inside the neural network.
At the end, it ensures that no single value dominates others.
It also introduces non-linearity to the neural network.

2. When to Use Each Activation Function

You can choose any activation function except Softmax (inside the hidden layers).
Reason: Softmax gives probabilities of the values, which can create bias on features inside the neural network — this is undesirable.

3. Activation Function at the End of the Network

The choice depends on the task type:

Regression task:
- Use Linear or ReLU (if you are sure there won’t be negative values).
Binary classification (1 output neuron):
- Use Sigmoid.
Multi-class classification (2+ output neurons):
- Use Softmax, since it provides probabilities (via exponential values of the features).

4. Additional Advice

Choose the number of neurons as powers of 2 (2^n), because it aligns with machine language and is computationally faster.
A good design pattern is:
- Start with a small number, then go bigger, then bigger, then gradually reduce (small → big → bigger → small → smaller).

Example Code (Proof of Work)

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = pd.read_csv("Airline_Delay_Cause.csv")
data.head()
print(data.info())
print(data.describe())
data = data.drop(['carrier','carrier_name','airport','airport_name'],axis=1)
data.head()
data.dropna(inplace= True)
data.info()
data.describe()
data['WDcase'] = data['weather_delay'].apply(lambda x : 1 if x >100 else 0)
data['WDcase'].value_counts()

X = data.drop(['WDcase'], axis = 1)
y = data['WDcase']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.25, random_state= 44, shuffle= True)
X_train.shape
import tensorflow as tf
import keras
kerasModel = keras.models.Sequential([
    # keras.layers.Input(shape = 17),
    keras.layers.Dense(8, activation= 'tanh'),
    # keras.layers.Dropout(0.1),
    keras.layers.Dense(128, activation= 'sigmoid'),
    # keras.layers.Dropout(0.1),
    keras.layers.Dense(64, activation= 'tanh'),
    # keras.layers.Dropout(0.1),
    keras.layers.Dense(32, activation= 'tanh'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(1, activation='sigmoid')
])

MyOptimizer = tf.keras.optimizers.AdamW(
    learning_rate=0.001,
    weight_decay=0.004,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=False,
    clipnorm=None,
    clipvalue=None,
    global_clipnorm=None,
    use_ema=False,
    ema_momentum=0.99,
    ema_overwrite_frequency=None,
    name="AdamW"
)
kerasModel.compile(optimizer= MyOptimizer, loss = 'binary_crossentropy', metrics=['accuracy'])

history = kerasModel.fit(X_train, y_train, validation_data= (X_test,y_test), epochs = 100, batch_size= 1000, verbose= 1,
                        callbacks= [tf.keras.callbacks.EarlyStopping(
                            patience=10,
                            monitor= 'val_accuracy',
                            restore_best_weights= True
                        )])

print(kerasModel.summary())

kerasModel.save('KerasModelAirPlane.keras')
reconstructed_model = keras.models.load_model("KerasModelAirPlane.keras")
y_prd = reconstructed_model.predict(X_test)

modelLoss,modelAcc = reconstructed_model.evaluate(X_test,y_test)
print(modelAcc)

y_prd = np.round(y_prd)
y_prd

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model Acc')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','validation'], loc = 'upper left')
plt.show()

#multi class
def Value(x) :
  if x <=30 :
    return 0
  elif x <= 100 :
    return 1
  elif x <= 200 :
    return 2
  else :
    return 3
data['WDCase'] = data['weather_delay'].apply(lambda x : Value(x))

data['WDCase'].value_counts()

X = data.drop(['WDCase'],axis = 1)
y =data['WDCase']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle =True)

print('X_train shape is ' , X_train.shape)
print('X_test shape is ' , X_test.shape)
print('y_train shape is ' , y_train.shape)
print('y_test shape is ' , y_test.shape)

KerasModel = keras.models.Sequential([
        # keras.layers.Input(shape=(17,)),
        keras.layers.Dense(8,  activation = 'tanh'),
        keras.layers.Dense(128, activation = 'tanh'),
        keras.layers.Dense(64, activation = 'tanh'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(4, activation = 'softmax'),
        ])
                
y_cat = tf.keras.utils.to_categorical(y)
y[:20]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y_cat, test_size=0.25, random_state=44, shuffle =True)

print('X_train shape is ' , X_train.shape)
print('X_test shape is ' , X_test.shape)
print('y_train shape is ' , y_train.shape)
print('y_test shape is ' , y_test.shape)

KerasModel = keras.models.Sequential([
        # keras.layers.Input(shape=(17,)),
        keras.layers.Dense(8,  activation = 'tanh'),
        keras.layers.Dense(128, activation = 'tanh'),
        keras.layers.Dense(64, activation = 'tanh'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(4, activation = 'softmax'),
        ])

KerasModel.compile(optimizer ='adam',loss='categorical_crossentropy',metrics=['accuracy'])

history = KerasModel.fit(X_train,
                         y_train,
                         validation_data=(X_test,y_test),
                         epochs=100,
                         batch_size=10000,
                         verbose=1,
                         callbacks=[tf.keras.callbacks.EarlyStopping(
                                            patience=5,
                                            monitor='val_accuracy',#"val_loss",
                                            restore_best_weights=True)])