If you are starting out in machine learning, the confusion matrix is one of the first tools you must understand. It sounds intimidating, but it is simply a table that shows how often your model was right and wrong — and in what way. Let’s break it down with a clear example.
The four boxes
Imagine a model that predicts whether an email is spam. For every email, there are four possible outcomes:
- True Positive (TP): spam correctly caught as spam.
- True Negative (TN): a normal email correctly left alone.
- False Positive (FP): a normal email wrongly flagged as spam (annoying!).
- False Negative (FN): spam that slipped into the inbox.
A worked example
Suppose we test 100 emails. The model gives us: TP = 40, TN = 45, FP = 5, FN = 10.
| Predicted: Spam | Predicted: Not spam | |
|---|---|---|
| Actually spam | 40 (TP) | 10 (FN) |
| Actually not spam | 5 (FP) | 45 (TN) |
The metrics that come from it
- Accuracy = (TP + TN) / total = (40 + 45) / 100 = 85%
- Precision = TP / (TP + FP) = 40 / 45 = 89% — of all flagged spam, how much really was spam.
- Recall = TP / (TP + FN) = 40 / 50 = 80% — of all real spam, how much we caught.
Accuracy alone can be misleading — precision and recall tell you what kind of mistakes your model makes, which matters far more in the real world.
Go deeper with FirstVidya
This is exactly the kind of intuition we build in our AI/ML and Data Science courses — concepts first, then real projects. See our skill courses or message us on WhatsApp.
