Lecture #2 How ML works | Notion

🧩 Dense vs. Sparse Representations

Dense Representation

Means you use every value in your feature vector.
Example: Pixel values in an image — each pixel contributes to the prediction.

Sparse Representation

Most feature values are zeros — you only store the non-zero ones.
Example: Text data (like emails for spam filtering). You don’t store every word in the English language, just the words that actually appear.

📉 Different Loss Functions

0/1 Loss

Used for binary classification problems.
The model gets a score of 0 if it predicts correctly, 1 if it’s wrong.

Squared Loss

Best for regression tasks.
Use when small prediction errors don’t matter much — the loss penalizes larger errors more heavily.

Absolute Loss

Use when all errors matter equally.
Good for regression when you want to preserve the scale of values — big or small, every difference counts.

🧪 Train-Test Split & Generalization