Naive Bayes

What

Applies Bayes Theorem with the “naive” assumption that features are independent given the class.

P(class | features) ∝ P(class) × P(feature₁ | class) × P(feature₂ | class) × ...

Why it works despite the “naive” assumption

The independence assumption is almost always wrong, but the model often still makes good predictions because it only needs to get the ranking right (which class has higher probability), not the exact probabilities.

Variants

VariantFeature typeUse case
GaussianNBContinuousGeneral purpose
MultinomialNBCountsText classification (word counts)
BernoulliNBBinaryText (word presence/absence)
from sklearn.naive_bayes import MultinomialNB
 
model = MultinomialNB()
model.fit(X_train_counts, y_train)

Strengths

  • Extremely fast (training and prediction)
  • Works well with small data
  • Good baseline for text classification
  • No hyperparameter tuning needed