All metrics are wrong, but some are useful
George Box stated that:
"All models are wrong, but some are useful".
It primarily applies to statistical models, but this also applies to all scientific models as well.
The model, in statistical sense and general sense
A model, in general sense, is created to approximately describe a system or a natural process. The approximate nature of the model must always be borne in mind, Box said.
A statistical model is a set of equations that describe a process generating the observed data. One statistical model can be more accurately represent the data than others.
In the same sense, a map is wrong in terms of describing the world (lack of height and depth), but it’s useful for you to navigate around.
–
How about metrics?
A metric is a numerical measurement that is computed from the properties of the system. A metric is used to rate how well a process is working, and comparing systems to one another. Metrics are usually designed to be monotonic to be useful and easy to understand.
–
A metric is inherently insufficient
A metric can be thought of as an attempt to distill the knowledge of how well the system works down to one single number. It’s a lossy process. It gives a biased view of how the system works.
One machine learning model is optimized for accuracy will neglect the recall rate.
For example, we have a disease that occurs 1% on average. A machine learning model can archive 99% accuracy just by predicting NO all the time. As you can see, it’s useless (and reckless).
The F1 score can be used to judge the system if the cost of false-negative and false-positive is roughly the same. F1 blindsided the asymmetry of the cost of being wrong. One should consider the cost of miss-classifying into the metric.
Another example, using a single statistic like mean, median or mode to represent a collective is negligent of its variance/variability of measures.
–
But it’s useful
A metric is not perfect in describing the quality of the process or system, but in cases, it’s all that we have, due to the limitation of our analysis, our visualization, or even our mental capacity. One can gauge the distribution plots, but distill the whole thing fully remains hard.
A metric designed for machines should be aimed for precision and it’s numerical properties. A metric designed for human should be clear, intuitive and easy to understand. The one who uses the metric should know its strength and limitation, in order to not be fooled.