Naive Bayes
What
Applies Bayes Theorem with the “naive” assumption that features are independent given the class.
P(class | features) ∝ P(class) × P(feature₁ | class) × P(feature₂ | class) × ...
Why it works despite the “naive” assumption
The independence assumption is almost always wrong, but the model often still makes good predictions because it only needs to get the ranking right (which class has higher probability), not the exact probabilities.
Variants
| Variant | Feature type | Use case |
|---|---|---|
| GaussianNB | Continuous | General purpose |
| MultinomialNB | Counts | Text classification (word counts) |
| BernoulliNB | Binary | Text (word presence/absence) |
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train_counts, y_train)Strengths
- Extremely fast (training and prediction)
- Works well with small data
- Good baseline for text classification
- No hyperparameter tuning needed
Links
- Bayes Theorem
- NLP Roadmap — common application
- Evaluation Metrics