The Multinomial Naive Bayes model is well-suited for discrete feature counts, such as the word frequencies in spam detection.
We estimate the likelihood of a word w given a class c (e.g. spam or not spam) using:
ฮธ_{ฮฑ|c} = (count(ฮฑ in class c) + 1) / (total words in class c + d)
+1 in the Numerator?This is called Laplace Smoothing, which helps handle zero-frequency problems. Without it, a word that never appears in training data of class c would zero out the whole product of probabilities.
+d in the Denominator?d represents the total number of distinct words in the vocabulary. By adding d, we ensure the denominator reflects the adjusted total count including the smoothing.
Gaussian Naive Bayes is used when features follow a normal (Gaussian) distribution โ suitable for continuous data.
P(x_i | y) = (1 / โ(2ฯฯยฒ)) * exp(โ(x_i โ ฮผ)ยฒ / 2ฯยฒ)
ฮผ: Mean of feature values for class yฯยฒ: Variance of feature values for class y