Lecture #6 Margins & The Perceptron | Notion

📏 1. What is a Margin?

The margin is the distance between the decision boundary (hyperplane) and the closest data points.
A large margin is preferred:
- It makes the classifier more robust to noise and unseen examples.
- A small margin can cause points to flip to the wrong side easily.

This idea leads to Support Vector Machines (SVMs), which maximize the margin.

✅ 2. The Perceptron Convergence Proof

⚙️ Goal

Prove that the Perceptron algorithm converges in a finite number of updates if the training data is linearly separable.

📚 Setup

Training Data:

D = { (x^(i), y^(i)) }, y^(i) ∈ {−1, +1}
Assumption:

The data is linearly separable, so there exists a weight vector w* and bias b such that: y^(i) (w*^T x^(i)) > 0 for all i

✏️ Update Rule

If misclassified:

w_(t+1) = w_t + y^(i) x^(i)

🧮 Definitions

Margin (γ):

Minimum distance from any point to hyperplane:

γ = min_i [ y^(i) (w*^T x^(i)) / ||w*|| ]