📊 1. MLE vs MAP — What’s the Difference?
- Maximum Likelihood Estimation (MLE):
- Treats θ as a fixed parameter.
- Computes p(D | θ) → how likely is the data given my parameters.
- Example: Estimate the bias of a coin purely based on the observed flips.
- Maximum A Posteriori (MAP):
- Treats θ as a probability.
- Computes p(θ | D) using Bayes’ Rule → how likely are the parameters given the data.
- Incorporates priors to reflect what you believe before seeing data.
âś… Key point:
- MLE is purely data-driven.
- MAP balances the data and your prior beliefs.
âś… 2. When MAP is Useful
- MAP is powerful when you have reasonable priors:
- For example, in spam detection, you may already know spam messages have certain word patterns.
- Good priors help avoid overfitting when you have limited data.
🔢 3. Basic Probability Formula
- The general conditional probability:
p(y = y | x = x) = (p(y) · p(x)) / p(x)
- But wait — this doesn’t capture how features interact when you have multiple variables!
⚠️ 4. The Problem with Tied Features
- Using the basic rule can be problematic:
- It treats all features as tightly connected.
- This means you’d need to estimate all combinations of features → impractical for high dimensions.
đź§© 5. Enter Naive Bayes: The Simplifying Assumption