Lecture #9 Conditional Independence & Multinomial Distribution

📌 1. Understanding the Naive Bayes Assumption

🧮 The Core Formula

For Naive Bayes:

p(x | y) = ∏ p(xᵢ | y)

Key idea:

The model assumes that all features are conditionally independent given the class label y.
This makes the joint probability tractable.

⚡ Sometimes the Product is Bigger, Sometimes Smaller

✅ Example 1 — When the product is bigger:

Imagine predicting spam emails:
- If you have multiple spammy words like “Free”, “Offer”, and “Winner” that are independent signals, multiplying them increases the final probability.
- Each word boosts confidence that the email is spam.

✅ Example 2 — When the product is smaller:

If the features contradict each other:
- Suppose a word strongly associated with spam appears with a word that usually appears in genuine emails.
- Since Naive Bayes multiplies independent probabilities, this can decrease the joint probability compared to the individual conditional probabilities.

💡 Insight:

In reality, features are often not perfectly independent — but Naive Bayes is still effective because the independence assumption simplifies calculations and works well in practice.

🎲 2. What is the Multinomial Distribution?

✅ Definition

The Multinomial Distribution generalizes the Binomial Distribution to more than two possible outcomes.