The Multinomial Naive Bayes model is well-suited for discrete feature counts, such as the word frequencies in spam detection.
We estimate the likelihood of a word w
given a class c
(e.g. spam or not spam) using:
ฮธ_{ฮฑ|c} = (count(ฮฑ in class c) + 1) / (total words in class c + d)
+1
in the Numerator?This is called Laplace Smoothing, which helps handle zero-frequency problems. Without it, a word that never appears in training data of class c
would zero out the whole product of probabilities.
+d
in the Denominator?d
represents the total number of distinct words in the vocabulary. By adding d
, we ensure the denominator reflects the adjusted total count including the smoothing.
Gaussian Naive Bayes is used when features follow a normal (Gaussian) distribution โ suitable for continuous data.
P(x_i | y) = (1 / โ(2ฯฯยฒ)) * exp(โ(x_i โ ฮผ)ยฒ / 2ฯยฒ)
ฮผ
: Mean of feature values for class y
ฯยฒ
: Variance of feature values for class y