Lecture #8 Estimating Probabilities from Data: Naive Bayes | Notion

📊 1. MLE vs MAP — What’s the Difference?

Maximum Likelihood Estimation (MLE):
- Treats θ as a fixed parameter.
- Computes p(D | θ) → how likely is the data given my parameters.
- Example: Estimate the bias of a coin purely based on the observed flips.
Maximum A Posteriori (MAP):
- Treats θ as a probability.
- Computes p(θ | D) using Bayes’ Rule → how likely are the parameters given the data.
- Incorporates priors to reflect what you believe before seeing data.

✅ Key point:

MLE is purely data-driven.
MAP balances the data and your prior beliefs.

✅ 2. When MAP is Useful

MAP is powerful when you have reasonable priors:
- For example, in spam detection, you may already know spam messages have certain word patterns.
Good priors help avoid overfitting when you have limited data.

🔢 3. Basic Probability Formula

The general conditional probability: p(y = y | x = x) = (p(y) · p(x)) / p(x)
But wait — this doesn’t capture how features interact when you have multiple variables!

⚠️ 4. The Problem with Tied Features

Using the basic rule can be problematic:
- It treats all features as tightly connected.
- This means you’d need to estimate all combinations of features → impractical for high dimensions.

🧩 5. Enter Naive Bayes: The Simplifying Assumption