Model Evaluation Metrics in Python — ELI5

Discover why asking 'how good is my model?' needs more than one number to get an honest answer.

Imagine your friend claims to be amazing at predicting rain. Every day they say “no rain” — and guess what, they are right 90 percent of the time because it only rains about one day in ten. Sounds impressive, but they have never actually predicted a rainy day. Their one trick is guessing the most common answer.

That is why you need more than one way to grade a computer’s predictions. A single “percent correct” score can hide big problems.

Think of it like school report cards. Getting an A in art does not tell you how someone does in math. Different scores tell you different things:

Did it catch the important stuff? If you are looking for sick patients, you want to find as many truly sick people as possible, even if you accidentally flag a few healthy ones.
Was it careful when it said yes? When a spam filter says “this is spam,” you want it to be right, because a real email in the spam folder is annoying.
How far off was it? If you are predicting house prices, being wrong by a thousand dollars is fine; being wrong by a hundred thousand is not.

Each question needs its own score. Using just one number is like judging a restaurant only by the appetizer and skipping the main course and dessert.

One thing to remember: A model that looks great by one score can look terrible by another — always check more than one metric before trusting predictions.

pythonmodel-evaluationmachine-learningmetrics

Model Evaluation Metrics in Python — ELI5

See Also

Related Topics