Joint Probability:
p(x, y) = p(y | x) p(x) or p(x | y) p(y)
Discriminative Learning:
Generative Learning:
✅ Key difference:
Discriminative: direct decision boundary.
Generative: models full data distribution for each class.
Likelihood:
P(D | θ) (How likely is the data given my model parameters)
Posterior:
P(θ | D) (How probable are the parameters given my data)
✅ Key idea:
MLE: purely data-driven.
MAP: combines data and prior.
p(heads | θ, D) * p(D | θ) dθ
✅ This is the Bayesian way: combine your model’s predictions with how well the model fits your data.