Lecture #2 KNN & Linear Models

🧩 K-Nearest Neighbors (KNN)

Semantic Gap

There’s always a semantic gap between how humans see an image (as meaningful content) and how computers see it — just a grid of numbers.
Data-Driven Approach

A classic ML approach: collect a dataset, label it, train a model, then evaluate its performance. The better the data, the better the results.
Nearest Neighbor Algorithm

Finds the most similar image in the training set and copies its label. Simple, but slow: every prediction needs comparing to the entire dataset (O(N)). Plus, relying on a single neighbor can fail if it’s an outlier.
K-Nearest Neighbors

Solves the outlier problem by looking at multiple neighbors (K), then taking a majority vote — more robust and less sensitive to noise.
L1 vs. L2 Distances
- L1 (Manhattan Distance): Uses absolute differences; makes a square around a point. Sensitive to coordinate changes.
- L2 (Euclidean Distance): Uses squared differences and square roots; forms a circle around a point. More stable if you rotate or shift coordinates.
- In brief if you draw a line from the center to the edge of the shape it will be the same in L2 but not the same in L1.
When to Use L1 or L2

Use L1 when each feature’s individual contribution is meaningful. Use L2 if you care more about overall similarity without feature-specific importance.
Cross Validation

Instead of one fixed train/validation/test split, cross-validation splits data into folds and tests across them. This helps ensure the model generalizes well — both approaches (classic split vs. cross-validation) are valid.
Limitations of KNN

KNN struggles to capture complex differences between images because it relies purely on distance metrics — and it’s slow to predict since it compares against all training samples.
Curse of Dimensionality

In high dimensions, finding neighbors gets exponentially harder. For example:
- 1D: 4 neighbors
- 2D: 4² = 16
- 3D: 4³ = 64 The space grows fast, making nearest-neighbor search inefficient and less meaningful.

📏 Linear Classification

Parametric Approach

Linear models learn parameters (weights) for each feature — making predictions faster because you just use these weights instead of searching through the dataset.
Parametric vs. KNN

Unlike KNN, a linear classifier doesn’t need the training dataset at test time. It has already distilled what it learned into the weights (W) — efficient and scalable.
Bias in Linear Models

If you have more cat photos than dog photos, the bias for “cat” will be higher. Bias terms help correct for class imbalance and prevent the model from overfitting. Too little bias? Overfits. Too much? Might underfit.
Problems with Linear Classification

Linear models can only draw straight lines (or planes) to separate classes — limiting when dealing with complex, non-linear data. They can struggle with images where patterns are more abstract.

✨ Key Takeaway

From KNN’s simple “find your friends” idea to linear models’ efficient parametric learning, both approaches laid the groundwork for today’s deep CNNs — pushing us closer to teaching machines to truly see.