đź§© K-Nearest Neighbors (KNN)

  1. Semantic Gap

    There’s always a semantic gap between how humans see an image (as meaningful content) and how computers see it — just a grid of numbers.

  2. Data-Driven Approach

    A classic ML approach: collect a dataset, label it, train a model, then evaluate its performance. The better the data, the better the results.

  3. Nearest Neighbor Algorithm

    Finds the most similar image in the training set and copies its label. Simple, but slow: every prediction needs comparing to the entire dataset (O(N)). Plus, relying on a single neighbor can fail if it’s an outlier.

  4. K-Nearest Neighbors

    Solves the outlier problem by looking at multiple neighbors (K), then taking a majority vote — more robust and less sensitive to noise.

  5. L1 vs. L2 Distances

  6. When to Use L1 or L2

    Use L1 when each feature’s individual contribution is meaningful. Use L2 if you care more about overall similarity without feature-specific importance.

  7. Cross Validation

    Instead of one fixed train/validation/test split, cross-validation splits data into folds and tests across them. This helps ensure the model generalizes well — both approaches (classic split vs. cross-validation) are valid.

  8. Limitations of KNN

    KNN struggles to capture complex differences between images because it relies purely on distance metrics — and it’s slow to predict since it compares against all training samples.

  9. Curse of Dimensionality

    In high dimensions, finding neighbors gets exponentially harder. For example:


📏 Linear Classification

  1. Parametric Approach

    Linear models learn parameters (weights) for each feature — making predictions faster because you just use these weights instead of searching through the dataset.

  2. Parametric vs. KNN

    Unlike KNN, a linear classifier doesn’t need the training dataset at test time. It has already distilled what it learned into the weights (W) — efficient and scalable.

  3. Bias in Linear Models

    If you have more cat photos than dog photos, the bias for “cat” will be higher. Bias terms help correct for class imbalance and prevent the model from overfitting. Too little bias? Overfits. Too much? Might underfit.

  4. Problems with Linear Classification

    Linear models can only draw straight lines (or planes) to separate classes — limiting when dealing with complex, non-linear data. They can struggle with images where patterns are more abstract.

    image.png


✨ Key Takeaway

From KNN’s simple “find your friends” idea to linear models’ efficient parametric learning, both approaches laid the groundwork for today’s deep CNNs — pushing us closer to teaching machines to truly see.