This idea leads to Support Vector Machines (SVMs), which maximize the margin.
Prove that the Perceptron algorithm converges in a finite number of updates if the training data is linearly separable.
Training Data:
D = { (x^(i), y^(i)) }, y^(i) ∈ {−1, +1}
Assumption:
The data is linearly separable, so there exists a weight vector w* and bias b such that: y^(i) (w*^T x^(i)) > 0 for all i
If misclassified:
w_(t+1) = w_t + y^(i) x^(i)
Margin (γ):
Minimum distance from any point to hyperplane:
γ = min_i [ y^(i) (w*^T x^(i)) / ||w*|| ]