Lecture #7 Maximum Likelihood Estimation

📊 1. Discriminative vs Generative Learning

Joint Probability:

p(x, y) = p(y | x) p(x) or p(x | y) p(y)
Discriminative Learning:
- Focuses on directly modeling p(y | x) — the probability of the output y given input x.
- Example: “Who won?” in classification.
- Typical models: Logistic Regression, SVMs.
Generative Learning:
- Focuses on modeling p(x | y) and p(y).
- Learns how data is generated for each class, then uses Bayes’ theorem.
- Example: Naive Bayes, Gaussian Mixture Models.

✅ Key difference:

Discriminative: direct decision boundary.

Generative: models full data distribution for each class.

Likelihood:

P(D | θ) (How likely is the data given my model parameters)
- This is the Maximum Likelihood Estimation (MLE).
- Goal: find parameter θ that maximizes the likelihood.
Posterior:

P(θ | D) (How probable are the parameters given my data)
- This is the Maximum A Posteriori Estimation (MAP).
- Combines prior beliefs and observed data.

✅ Key idea:

MLE: purely data-driven.

MAP: combines data and prior.

p(heads | θ, D) * p(D | θ) dθ

To know the probability of an event (e.g., coin comes up heads):
- Multiply the model’s prediction (p(heads | θ, D))
- By the probability of seeing that model given your data (p(D | θ))
- Integrate over all possible θ.

✅ This is the Bayesian way: combine your model’s predictions with how well the model fits your data.