We are better at statistics than we normally get credit for. Greater statistical literacy is still important, since more data is being used to persuade, but our overall understanding has gotten better. One example of this improvement is that most people reading analysis will know that the arithmetic mean can be shifted by excessively large or small values. In response, it is common to report the median (the number that evenly divides the sample into values above and below it) instead. This is progress. Readers are now familiar with at least some of the characteristics of the measures they are being given and are aware of alternatives to overcome the limitations.
One difficulty now is that the arithmetic mean is probably being undervalued relative to the median. This article is more focused on games data, and so it is often the case that the arithmetic mean is misleading, but this is true of any statistic. Analysis has a rhetorical component in addition to calculation. It is often the case that the calculations are correct but are being used to support an argument that says something different than what the measure does. For example, the mean, median, or mode (most frequently occurring number in a sample) can all be correctly calculated but used in a bad argument that says, “your game is guaranteed to sell N units every day.” This article will consider the problems with the mean and how they can instruct us in how to find appropriate measures in our analysis.
Before addressing the mean, there are cases where median is less informative than we might think. Consider the following sample: [5, 7, 7, 11, 50, 300, 600, 601, 1000]. The arithmetic mean for this sample is 286.78, and the median is 50. Is the arithmetic mean or the median more informative in this case? If our goal is to find a representative measure, neither of these seems very good. 50 is closer to the lower group, so we might consider it a vote for the lower of the two measures, but this ‘representative value’ is almost 5 times larger than the next lowest value. Furthermore, while the appeal of the median is that it is a robust statistic, the observation of 50 could be replaced with any value from 11 to 300 and still be the number that divides the sample in half. In contrast, when calculating the arithmetic mean, each value gets an equal ‘vote’ in and so the change to the average over the same range is less pronounced. Put another way, the middle observation could go from 50 to 100 and the median would double, but the arithmetic mean would only change by about 5.56.
The example above is more for illustration, though in the case of a bimodal distribution like the one above, it is helpful to know this limitation. What is more instructive is how it forces us to think more carefully about what we mean when call an estimate representative. One definition would be to pick the estimate that minimizes how far off the estimate is for any given value. There are some good intuitions here. We know that an estimate is usually not going to be exactly correct for any given observation, but we can compare measures by how severe the errors were for everyone and say the one that minimizes those errors is the best one. The errors can be measured using the sum of squared deviations (obtained by subtracting the estimate from every observation, squaring the result, then adding all the squares together), and it can be shown that the arithmetic mean minimizes this number.
But this definition of representative is not entirely satisfactory, since minimizing how far the average is off for the unusually large or small values comes at the expense of more accurate measurements for the more typical cases. This is the reason why the median is more appealing, since dividing the sample into the upper and lower groups and choosing the number that defines that border is going to be more likely to fall among typical results (the example above notwithstanding). In the case of sales this can be a useful feature because the extreme cases are rare, and upward surprises are not perceived to be as bad as downward surprises (though people responsible for managing a sudden influx of players may have strong feelings on this matter). The sacrifice in total accuracy is justified because the resulting estimate will be closer for a larger number of people.
This isn’t a particularly bad solution, but it isn’t the only way of solving the problem. An alternative is to take the weighted arithmetic mean. Instead of every observation in a sample getting an equal vote on the final number, each observation is weighted, usually (but not exclusively) by its probability, and the final number reflects both the values and relative importance (the weight) assigned to those values. For sales, a weighted arithmetic mean would acknowledge that very large sales numbers are a reality, but an unlikely event. As with every other measure, the weighted arithmetic mean involves trade-offs. Weights, like any other kind of data, can be hard to come by. It may also be that the outliers are so large that, even after weighting, the estimate is still higher than what is useful for more common cases. Still, this alternative is appealing because it attacks the problem of the real but unlikely case of high sales directly, which is often the reason for preferring the median.
There is much more that can be written on this subject. In technical terms, the preference for the median stems from the fact that the distributions we are often working with are skewed. Skewness can be measured directly (it is what is known as the third moment, with the mean being the first and the variance being the second), but tends to provoke blank stares, and so comparing the mean and median makes the point instead. The arithmetic mean is one of three means known as the Pythagorean means (the other two being the geometric mean and the harmonic mean. The latter even gets used in Lars Doucet’s Game Data Crunch), and the others can be useful in cases where the arithmetic mean can be misleading. The point of this article isn’t really to advocate for the arithmetic mean, so much as to show how clear thinking about the problems we are trying to solve can lead us to find the measures that are appropriate. Like any skill, identifying the situations where our problem is beyond our current understanding lets us develop our skills and add more tools to our analytical toolkit.