⚡ 1. Why Activation Functions Matter
Activation functions add non-linearity to neural networks. Without them, your model would just be a stack of linear equations — unable to capture complex patterns.
🔢 2. Sigmoid Activation
- Maps input values to a range (0, 1).
- Negative values tend toward 0; positive values toward 1.
- Problem: When inputs are very positive or negative, the gradient becomes tiny → leading to vanishing gradients.
🚫 3. Vanishing Gradient Problem
- When activations saturate (output near 0 or 1), the gradient is nearly zero.
- This stops weights from updating effectively — killing the learning.
⚖️ 4. Zero-Centered Issue
- Sigmoid outputs aren’t zero-centered.
- If inputs are always positive, so is the output → gradients move in a single direction (always positive or negative).
- This makes convergence inefficient.
🔵 5. Tanh Activation
- Similar to sigmoid but outputs in the range [-1, 1].