Lecture #10 Naive Bayes Classifier: Advanced Concepts

1. 📊 Multinomial Naive Bayes

The Multinomial Naive Bayes model is well-suited for discrete feature counts, such as the word frequencies in spam detection.

🔍 Formula

We estimate the likelihood of a word w given a class c (e.g. spam or not spam) using:

θ_{α|c} = (count(α in class c) + 1) / (total words in class c + d)

✅ Why Add `+1` in the Numerator?

This is called Laplace Smoothing, which helps handle zero-frequency problems. Without it, a word that never appears in training data of class c would zero out the whole product of probabilities.

✅ Why Add `+d` in the Denominator?

d represents the total number of distinct words in the vocabulary. By adding d, we ensure the denominator reflects the adjusted total count including the smoothing.

✉️ Use Case: Spam Classification

Each email is treated as a bag of words.
The probability of an email being spam is calculated based on the product of conditional probabilities of each word.

2. 🌐 Gaussian Naive Bayes

Gaussian Naive Bayes is used when features follow a normal (Gaussian) distribution — suitable for continuous data.

🧪 Probability Density Function (PDF)

P(x_i | y) = (1 / √(2πσ²)) * exp(−(x_i − μ)² / 2σ²)

μ: Mean of feature values for class y
σ²: Variance of feature values for class y