Planet Money and the New York Times’ Ben Casselman teamed up to try to demonstrate that, “Mode, not average, is a better way to find the typical American.” They are wrong.
Here is how they set up the issue:
MALONE: My issue is with this thing that people talk about, the average American. …Who is average, really? …I think people don’t mean average American. I think what they actually mean is, who is the person – if I walk outside into America, who is the person I’m most likely to run into?
JACOB GOLDSTEIN: Yes, the person who there’s more of that person than any other person.
MALONE: That ain’t the average. It’s not the median… If you actually want to figure out what human beings exist outside your door, you need to run the mode.
Instead of demonstrating how useful the mode is, they actually do a pretty good job of demonstrating why nobody uses it: it sucks for describing most data.
What is the mode? It is simply the most common value in a set of data. For example, suppose you have a room full of people and their annual incomes are:
$1,000; $1,000; $30,0000; $40,000; $50,000; $60,000; and $100 million
The mode is $1,000 because there are two people with that income and only one person at every other income. The median is the most useful statistic for understanding the central tendency of this group’s income because it is the value in the middle, $40,000, which is the most central value. The mean is $14.4 million which is much closer to the one rich outlier than to anyone else, so it is also a poor measure of central tendency for the group.
The median does the best here because it is the least sensitive to outliers. If another person with $100 million walks into the room, then the mean would nearly double, and the mode would be both $100 million and $1,000. Both of those statistics are unrepresentative whereas the median would still be near the center of the numbers and hardly affected at all.
The mode is only the best measure for categorical data like color. For example, if we have seven colors:
red, red, orange, yellow, green, blue, ultraviolet
In this case, it is impossible to take the mean and if we don’t care about the order of the colors, then we cannot take the median either so the mode is the best possible measure simply because the other two options are impossible. The best you can do is say that red is the most common color which is the same thing as saying that it is the modal color.
But for quantitative data we can calculate the mean or median and the mode is rarely as useful as either of them. Planet Money demonstrated how ridiculous the mode is for numerical data when they determined that the modal income is $30,000 to $75,000.” That is such a big range, it is laughable. It ranges from the poverty line for a family of five up to the upper-middle class. Nearly a third of US households (29%) are in this “modal” range.
Why did they get such a huge range? Well, there are probably a very similar number of households that earn between $30,000 and $75,000 so the mode might be multiple measurements just as with the example above with two modes at both $1,000 and $100 million. There is always some uncertainty in data collection because measurements aren’t always perfect. For example, if a household says they earn $30,000, they might be a little off and they might actually earn $31,000 or $29,000, so the real number could be plus or minus a thousand or more. Given modest uncertainty in the data, there is usually a large range in the possible true value of the mode and perhaps in this case the NPR authors could not give any more precision than that the true modal values must be somewhere between $30,000 and $75.000!
The median income is much easier to pin down. In this case we would just find the observation in the middle, which was about $63,000 in 2018, and if there is uncertainty of plus or minus $1,000 then we would know that the true median is between $62,000 and $64,000. That is much more precise and useful.
UPDATE: In their methodology description, Planet Money explains that they just arbitrarily lumped households into four income groups. This is even worse than I had originally thought. With this methodology, it would be possible to say that the modal income is nearly anything. For example, I could say that the modal income is between $1 million and $1 billion simply by splitting up all the lower-income categories into smaller buckets.
Planet Money’s search for the modal American gets even more bizarre after that. They determine that, “The modal American, based on our criteria, is in fact, a child.” But then because they really wanted the modal American to be an adult, they threw away their criteria. Even though their methodology determined that the modal American is a child, they just changed their criteria to get what they had originally wanted all along.
So they threw out their statistical methodology and just looked at adults and then they arbitrarily divided Americans into buckets according to ethnic categories which happened to clump people from Portugal, Ireland, Russia, and many from Latin America into the “white” category so that they could claim that identity as being “central” to “average” America.
Then they arbitrarily decided that Gen X is more “central” to America even though Gen X is far from the mode. There are both more Baby Boomers and more “echo boomers” (millennials) than members of Gen X, but they don’t really care about the mode and wanted someone who is “middle age” which would be the median age (38). They lamented that this is “one of the least common ages” and instead of staying with the modal age (26), they just switched their criteria and picked someone from Gen X anyway, perhaps by arbitrarily arranging their age buckets to get what they wanted.
In the end, they seemed to arbitrarily pick what they had probably had as their preconceived notion of the average American. Someone who is:
- Gen X (even though the modal American is age 26)
- “upper-middle-class income. The household income is between $75,000 and $165,000 a year. ” (Even though this is way higher than either the modal income or the median and is closer to mean household income.)
- married (Even though the modal American is unmarried)
- male (Although they admit that the modal American is female, they only interview men and talk about the “average guy” and focus on the “homogeneous experience” of, ” white, Gen X men“)
- employed (Even though there are more Americans without a job than with a job)
- no college degree
- lives in the suburbs
- wears plaid shirts! (I think this part was their attempt at a joke.)
Although I only have minor quibbles with the last four criteria, if I didn’t know better, I’d think Planet Money started out with a picture of the kind of guy that they consider to be the most essentially American and then picked arbitrary categories and tortured the statistics until they got what they wanted. They completely ignored the modal value for most of their criteria.
This ‘analysis’ is a good demonstration why nobody uses the mode with this kind of data. Even when Planet Money tried to use the mode, they couldn’t stick with it because it didn’t give them the preconceived results that they were looking for and so they just kept changing their rules until they got… demographics that look like something pretty close to the median Trump supporter. You know, “real Americans“.