Lecture #6 Activation Functions & Training | Notion

⚡ 1. Why Activation Functions Matter

Activation functions add non-linearity to neural networks. Without them, your model would just be a stack of linear equations — unable to capture complex patterns.

🔢 2. Sigmoid Activation

Maps input values to a range (0, 1).
Negative values tend toward 0; positive values toward 1.
Problem: When inputs are very positive or negative, the gradient becomes tiny → leading to vanishing gradients.

🚫 3. Vanishing Gradient Problem

When activations saturate (output near 0 or 1), the gradient is nearly zero.
This stops weights from updating effectively — killing the learning.

⚖️ 4. Zero-Centered Issue

Sigmoid outputs aren’t zero-centered.
If inputs are always positive, so is the output → gradients move in a single direction (always positive or negative).
This makes convergence inefficient.

🔵 5. Tanh Activation

Similar to sigmoid but outputs in the range [-1, 1].