Gradient Descent & Newton's Method Notes

1. Gradient Descent Basics

The update rule for Gradient Descent is:

w_{t+1} = w_t - \\\\alpha \\\\cdot g(w_t)

where:


2. AdaGrad (Adaptive Gradient)

AdaGrad adapts the learning rate individually for each parameter by scaling with past squared gradients.

Update rule:

w_{t+1} = w_t - \\\\frac{\\\\alpha}{\\\\sqrt{G_t + \\\\epsilon}} \\\\cdot g(w_t)

where:


3. Newton's Method (Speeding Up Near Minima)

To accelerate convergence in flat regions, Newton’s Method uses second-order curvature.