Don’t Get Snookered by Statistics with Arbitrary Ranges

The time frame for statistics are often edited to mislead the listener

I was watching a baseball game with a mathematician friend, during which the announcer said of a batter “He’s been on a hot streak, with 6 hits in his last 19 at bats”.

My friend said “Which means he has 6 hits in his last 20 or more at bats”.

Of course this was true, because if the batter’s hot streak went back farther than his last 19 at bats, the announcer would have extended the range of his statistic accordingly. The announcer was trying to make a point, and a 7-for-20 or 8-for-21 run of hitting sounds hotter than a mere 6-for-19.

Pundits often use this trick to make ominous sounding prouncements that are in fact content-light, for example when forecasting who will or will not get elected president. Hillary Clinton will never be in the Oval Office, by the way, because no candidate whose first name starts with an H has been elected President in the past 64.5 years.

Arbitrary ranges can also be used to make the case for a historical trend that may not actually be substantive. At the end of the 1980s, some Republicans crowed that Democrats had lost “5 of the past 6” presidential elections. Today, some Democrats brag that Republicans have failed to win the popular vote in precisely the same arbitrary timespan.

Politicians also snooker voters with arbitrary ranges. If Mayor Jones says that crime is down for 5 straight months, s/he may well be covering over the fact that crime was up in the months before that. Be particularly sceptical when a politician’s long-ago launched pet initiative is said to have “not really gotten off the ground” until the precise moment that the referenced time range of an upbeat statistic begins.

Obviously, range-based statistics have to have some starting point, and that’s fine as long as they are meaningful: A new person comes into office, a new law is passed, the fiscal year’s budget comes into force, a war begins or ends etc. If there isn’t a qualitative demarcation point like that at the start of a referenced time range, the person quoting the statistic is probably either fooling himself or trying to fool you.

Author: Keith Humphreys

Keith Humphreys is the Esther Ting Memorial Professor of Psychiatry at Stanford University and an Honorary Professor of Psychiatry at Kings College London. His research, teaching and writing have focused on addictive disorders, self-help organizations (e.g., breast cancer support groups, Alcoholics Anonymous), evaluation research methods, and public policy related to health care, mental illness, veterans, drugs, crime and correctional systems. Professor Humphreys' over 300 scholarly articles, monographs and books have been cited over thirteen thousand times by scientific colleagues. He is a regular contributor to Washington Post and has also written for the New York Times, Wall Street Journal, Washington Monthly, San Francisco Chronicle, The Guardian (UK), The Telegraph (UK), Times Higher Education (UK), Crossbow (UK) and other media outlets.

42 thoughts on “Don’t Get Snookered by Statistics with Arbitrary Ranges”

  1. Nate Silver has gone to ESPN and notes that while some sports pundits are hacks, nearly all political pundits fit the bill for that designation.

    The one guy who knows anything about how to make sense of political data has left an environment where he did not fit in.

    A non-hack writer observed that sports writers cannot get away with hackery as easily as political commentators because sports is a high feedback field of endeavor. A sports Karl Rove could tell his audience that the Houston Astros were a sure thing to win the pennant, but a glance at the League standings would quickly tell them what Rove was full of. Even if Rove told a Houston audience exactly what it wanted to hear, he would lose all credibility in a very short time.

    We see that some baseball announcers are prone to spout nonsense, but there are built-in constraints on their sophistry which are less operative when the effects of budget cuts during a recession are under discussion on Fox News.

    Nate Silver has voted with his feet for good reasons.

    1. And just for the teabaggers to get paranoid about: no black president has ever voluntarily relinquished power at the end of his second term.

      IIRC Rush Limbaugh, after last year’s election, was telling his dittoheads to worry about this.

  2. The all-time greatest example has to be global warming deniers counting the years from 1998. Since that was an anomalously warm year, we heard in 2004/5/6/7/8/9 “the global temperature has gone down over the last 6/7/8/9/10/11 years.”

    And they said it with such anger at the injustice of being called unscientific.

  3. Hillary Clinton will never be in the Oval Office, by the way, because no candidate whose first name starts with an H has been elected President in the past 72.5 years.

    I can think of one who was elected 65 years ago.

    1. We can always resort to insisting on the difference between “elected” and “reelected”, in which case we go back almost 85 years.

      1. @Warren: If you are trying to bail me out, I appreciate it, but Truman was never re-elected and Hillary would be running to be first time elected. So I will just have to cop to innumeracy on the 72.5 number.

    2. @bymotov: Of course you can, because I screwed up! Thanks for catching that — not sure how I did my sums wrong but there you go. I am correcting the post and giving you 10% of all future royalties.

  4. Invertebrate paleontologist and popular science writer Stephen Jay Gould wrote about the fallacy of “hot streaks”:

    http://www.nybooks.com/articles/archives/1988/aug/18/the-streak-of-streaks/?pagination=false

    Start with a phenomenon that nearly everyone both accepts and considers well understood—”hot hands” in basketball. Now and then, someone just gets hot, and can’t be stopped. Basket after basket falls in—or out as with “cold hands,” when a man can’t buy a bucket for love or money (choose your cliché). The reason for this phenomenon is clear enough; it lies embodied in the maxim: “When you’re hot, you’re hot; and when you’re not, you’re not.” You get that touch, build confidence; all nervousness fades, you find your rhythm; swish, swish, swish. Or you miss a few, get rattled, endure the booing, experience despair; hands start shaking and you realize that you shoulda stood in bed.

    Everybody knows about hot hands. The only problem is that no such phenomenon exists. The Stanford psychologist Amos Tversky studied every basket made by the Philadelphia 76ers for more than a season. He found, first of all, that probabilities of making a second basket did not rise following a successful shot. Moreover, the number of “runs,” or baskets in succession, was no greater than what a standard random, or coin-tossing, model would predict. (If the chance of making each basket is 0.5, for example, a reasonable value for good shooters, five hits in a row will occur, on average, once in thirty-two sequences—just as you can expect to toss five successive heads about once in thirty-two times, or 0.55.)

      1. Not necessarily. He might have had an exoskeleton. Did you palpate the dorsal aspect of his body?

    1. “Moreover, the number of “runs,” or baskets in succession, was no greater than what a standard random, or coin-tossing, model would predict.”

      A statistics professor here has a nice demonstration of how bad people are at this stuff intuitively. He leaves the room for a few minutes during which time one group of students flips a coin a hundred times and writes the resulting string of “H[ead]”s and “T[ail]”s on the board while another group of students just writes down a string of “H”s and “T”s that they make up themselves. When they’re done, he reenters the room and magically tells them which is which with superb accuracy.

      The trick being that if you flip a coin 100 times then you have a 97% chance of at some point having a run of at least five heads or tails in a row but the students will almost never have a run that long in the string that they make up because it doesn’t look random enough.

  5. A good example is the “No global warming for 15 years!” meme been pushed assiduously in the media.

    What is so special about 15 years?

    – 1998 was an abnormally warm year relative to what went before, so everything since then looks a bit tame (even though there have been slightly warmer years).
    – 15 “looks right” if you want to cherry-pick a period where the annual global average temperature has been roughly constant. The 2000s were still the warmest decade ever recorded.

    I think all climate scientists would agree the annual global warming rate has slowed since about 2003 or so, but no one assigns much significance to that. There is no reason to say it will not increase again, or that the decadal rate has changed. They would also look to a period of 25 to 30 years as being much more meaningful.

    You could also take a Bayesian approach here. No matter where you start the clock for a “streak”, you still have prior knowledge of the rate and your new data should be analysed relative to that.

    1. Oh boy – where to start?

      1) Yes, 15 years is an arbitrary window. But ANY time period selected for the purposes of time series comparison of variations in a naturally varying system is arbitrary. You yourself make a point of saying that the 2000s were the “warmest decade ever recorded” which begs the question of why exactly “decade” is a useful construct at all.

      2) 15 years does have the advantage of lining up roughly with a period in which about 1/3 of all the CO2 ever put into the atmosphere by human beings has been emitted. Or put another way – since 1998 there has been a 50% increase in the CUMULATIVE total of human CO2 emissions. It does seem curious then that there has been little to no rise in global temperatures in that period. Yes yes – ocean heat sinks, Chinese smog, blah blah blah. The “story” being pushed for years was that “the science is settled” and that “there is overwhelming consensus” that human CO2 emissions are causing the planet to warm up rapidly. A multi-year period in which CO2 emissions have gone up faster than climate scientists’ models predicted while global temperatures have gone up more slowly (indeed – just about not at all) than those models predicted. Putting on the Bayesian hat that you recommend would surely cause me to be less, rather than more, confident in the ability of climate scientists to accurately forecast the climate’s response to CO2 emissions. For several years now actual global temperatures have been on the very low end of the predicted range of “consensus” models from 10-15 years ago. Absent significant warming in the next 2-3 years we will see global temperatures that fall outside of the predicted range from those same models. Again – any “Bayesian” approach to the question would cause one to downgrade one’s confidence in those models.

      3) The point about the 2000s being the warmest decade on record is a spectacularly bad argument. Say I take a walk of 100 steps. The first 90 steps are up a hill with a 30 degree grade. The last 10 steps are across a plateau at the top of the hill. It is of course true that the last 10 steps of my walk were taken at the highest altitude of the journey. But that does not demonstrate that for the last 10 steps of my walk I was still ascending. Nobody denies that global temperatures went up sharply from roughly 1980 until roughly 2000 (give or take). But since then there has been little to no observed warming despite – again – a massive increase in cumulative human CO2 emissions. The prediction that surely I will end up higher if I take another 30 steps forward because my last 10 steps have all been taken at a higher altitude than the altitude of the start of my walk is about as compelling as it sounds.

      1. Yet if you look at a graph of, say, the past century, what you will see is periods of rising temps, periods of pause, followed by more warming. A bit like stairs. 1998 is cherry-picked. It was the top of the last step, so to speak.

        http://www.skepticalscience.com/graphics.php?g=47

        We’ll find out who is right fairly soon. But in the meantime, the merchants of doubt delay action. Which means that when action is finally taken, it will be more draconian. Great job!

        1. The graph you link to isn’t from the past century – it’s from the period 1970-2012. Within that range one need not cherry-pick a few periods that show a localized cooling trend to doubt the remarkably loaded conclusion that a best-fit line (the red line that toggles in and out of the image) implies. The vast majority of the warming in that period occurred between 1980 and 1998. You can force-fit a linear function to any s-curve. And if you look at a plot of noisy data for which the proper underlying “signal” is an s-curve, and then plot such a specious best-fit line on the graph, the unaided eye will immediately conclude that there is far greater linearity to the data series and thus a far greater rate of range at the tail end of the graph than is justified by the underling data.

          And again – the amount of CO2 released into the atmosphere by human beings from the start of the industrial revolution until roughly 1998 (a couple of centuries, give or take) is only 2X the amount released from 1998-2012 (15 years). If the relationship between CO2 emissions and global temperature were as tight as is implied by “consensus” climate models (i.e. if the sensitivity of the climate to CO2 were as high as is commonly assumed in such models), then its awfully curious indeed that there has been little to no discernible warming in that period. Yes yes yes – signal and noise. But at some point I have to make such heroic assumptions about the “noise” that is “temporarily” depressing the otherwise strong upward trend in the data that the Bayseian might be inclined to reassess how accurate the model that had be conclude that there was a continuing upward trend in the first place.

          It is true that delaying “action” until the climate is better understood means that that same “action” will be more expensive if we collectively conclude that it is necessary later on (if that is one assumes that technological advances between “now” and “then” don’t dramatically lower the cost of mitigation – which is a doubtful assumption to say the least). But said “action” would be enormously costly now and if it’s committed to in response a threat that ends up being much less severe than was hypothesized by the “consensus” of climate scientists roughly 10-15 years ago then a huge amount of human wealth will have been squandered. Wealth that otherwise could have been directed at improving standards of living in numerous ways. And I’m not talking about getting yuppies to drive Priuses instead of Land Rovers (which is rounding error in humanities global carbon footprint). I’m talking about telling the Chinese that they can’t make nitrogen fertilizer anymore and telling Indians that unelectrified village life is really rather charming when you think about it.

          1. I understand the point about growth, particularly in the undeveloped/developing world. I also hear you on the question of technological progress, though it seems to me that some judicious investment in specific areas is called for (me, I’d like to have the DOE funding some serious pilot tests of various next-gen nuclear reactor designs, fast-tracked to the extent possible).

            If the “concensus” folks are right and you are wrong, delay almost surely means more intrusive/harsh/whateverwordyouwantouse governmental responses down the road. Yes, technology will hopefully mitigate that somewhat, but I don’t really think it’s wise to plan on that.

            Like I said, we’ll see who is right. I *really* hope you are! But I don’t think you are.

          2. According to statistician Grant Foster, who blogs at tamino.wordpress.com, one usually needs 30 years of data to be sure that one is extracting changing climate signal from the noise of weather. He also argues, with actual data, that the global warming signal is so strong that the 30-year rule is overly conservative.

            As to the tightly-coupled issue, you are aware of these things called “oceans”, yes? Measuring just the temperature at the surface over land is convenient and useful, but it’s hardly the whole picture.

          3. The number of flaws in your reasoning is impressive. Francis points out that you’re ignoring a lot of the temperature data. You forget that there is time lag between carbon emissions and the effect on climate. And you argue that any discrepancies in constant temperature increases are just dismissed as “noise” without addressing the fact that the models often predict periods of slowed temperature increases as various processes play out.

      2. My argument (based on the UK Met Office, but since Stephen provided no links, neither am I) is that the global temperature is rising on the scale of decades rather than years. Hence “the warmest decade ever” recorded is significant, and the “streak” should be counted in decades, not years. The end of the Little Ice Age and the massive use of fossil fuels will do nicely as a starting point.

        Argument (2) depends on the expectation of an instantaneous response of the atmosphere to CO2, and no other influences like aerosols (for cooling, from anthropogenic sources or volcanoes). That is just bad physics, so cease embarrassing yourself. Atmospheric response to CO2 needs a time constant to be incorporated in the equations.

        Argument (3) is just a strawman.

        Unfortunately, we are not walking on a plateau – we are watching energy entering a system which our thermometer says is warming. The thermometer flat-lines – but the physics says that if energy keeps going in (and the energy imbalance at the top of the atmosphere is still there) then the system MUST continue warming according to the laws of physics. Your real problem is with physics, not with statistics.

        Besides, your plateau is not even flat – go here and you will find that the planet warmed by 0.088C/decade since 1999, according to the GISS records. So what do you deduce if your gradient drops from 30 degrees to 10 degrees? If you are a Bayesian, assume a normal distribution for the gradient, your new observation is 10 (or even 0) degrees – what is your posterior distribution for the gradient? “Climate models” are irrelevant.

      3. Actually, you have defeated yourself with your own argument.

        You set out to walk uphill in the dark, without knowing haw far you have to go.

        You spend the early part of the journey on a gradient of 30 degrees, which changes to 0 – you next steps are on a level plain.

        You can say “Well, I can make no expectation that I am going to hit a 30 degree gardient again”. Fine, but equally you can make no assumption that your plateau will last forever, which seems to be exactly what you want to do.

        A Bayesian approach is feasible – you have a prior of a 30-degree gradient, an observation of some time at 0 gradient, your posterior distribution should surely give a new expecation, somewhere between the two. Ok, the longer you stay on the level, the more your expectation will change. But to claim you know “everything” after an arbitrary period is bad statistics, not to mention contradicting the physical data.

        A lot of the work going on in cliamte science right now is taking account of a (possibly temporary) slower rate of warming – unfortunately, it gives the human race no better than 10 or 15 years extra to stave off serious climate disruption.

  6. It’s especially true of baseball. Just about anybody in MLB can do just about anything in ~100 at-bats. You will see slap hitters hitting for average and power (Iglesias on the Sox). You will see some of the game’s best hitters down by the Mendoza line (.200 average). Terrible players will have a good month, fooling some (Vernon Wells rejuvenated since trade to Yankees! Yeah, for a month. Since then, he’s been unspeakably awful, because Vernon Wells is bad at baseball).

    Following baseball, and more importantly the statistical analysis thereof (starting with Baseball Prospectus back when PECOTA was run by… Nate Silver), did a great deal to hammer some basic statistical wisdom into my head. I still should have paid attention to that college stats class I took, though. 😉

    1. It’s a market. People are exchanging a slice of a finite resource, their lifespan, for the benefit of being at a certain location at a certain time.

  7. I tell the students in my Data Analysis class that any statement of the form “between times X and Y, the Z [did something]” contains no useful information and is likely intended as propaganda. If you want information about
    Z look at a plot of the time series over as long a range as possible.

    I think there has been a longstanding journalistic prejudice that plots (as opposed to words) are for dummies. Fortunately this is fading.

  8. Suppose you are a baseball sportscaster, and the current batter is in fact on a significant hot streak; 6/19 is pretty lukewarm, let’s make it 12/19. Do you refuse to mention it as a matter of principle? Refuse to mention it unless you are allowed to say each time, “But of course, he doesn’t always hit that well?” Insist on adding every time, “Past results is not a guarantee of future performance?” Refuse to mention those statistics that could conceivably be misunderstood, which is likely all of them? One possibility would be just to say, “He is on a hot streak, 12 for his last 19,” and trust that most of your audience is almost as smart as you are, but I gather that is a non-starter.

    1. = = = and the current batter is in fact on a significant hot streak; 6/19 is pretty lukewarm, let’s make it 12/19. = = =

      The problem isn’t with the 6 number, it is the 19. Why 19? Why not 21, 31, 41? This is crystal clear with basketball broadcasts: despite Stephen Jay Gould’s analysis referenced above I have no problem with an announcer stating that Team Good is on a “13-0” run – emphasize the 0 (zero). But then they’ll start taking about a “15-2” run or even more nonsensically a “17-4 run”. Um, why didn’t the counter reset to 0-0 when the run of continuous unopposed baskets end?

      Cranky

  9. Stephen Jay Gould did acknowledge one genuine streak that he called the “greatest accomplishment in modern sport” (well, he was a baseball fan) – in 1941, Joe Di Maggio got at least 1 hit in 56 successive games.

    It’s in the essay “The Streak of Streaks” from the collection Bully for Brontosaurus.

    1. Joe DiMaggio’s feat is no doubt impressive. But as to sports records that are least likely to ever be broken, I would place Richard Petty’s 200 victories in NASCAR’s highest division at the top of the list.

    2. Yeah, that’s the article I’m linking to, above.

      http://www.nybooks.com/articles/archives/1988/aug/18/the-streak-of-streaks/?pagination=false

      This is what he really says:

      Among sabremetricians—a contentious lot not known for agreement about anything—we find virtual consensus that DiMaggio’s fifty-six–game hitting streak is the greatest accomplishment in the history of baseball, if not all modern sport.

      He’s stating this as the conventional wisdom. Then, as is his wont, he goes on to challenge some of the underlying assumptions.

      1. My impression is that he defended DiMaggio’s streak from it’s critics! 🙂

        I just re-read the essay, it is as good as anything Gould ever wrote, so let’s leave the conundrum to new readers as an exercise.

        1. Dr. Gould always liked to give his readers something to think about. It’s sad that he died at such a young age.

    3. Gould also wrote about why there are no more major league batters hitting .400 in recent decades. Ted Williams hit .406 in 1941 and is the last man to do so.

      Gould calculated the mean of major league batting averages and found that they had not changed over the decades, but the standard deviations had shrunk, so that the tails of the distribution were closer to the mean.

      Some people (including Ted Williams) thought that the disappearance of the .400 hitter had something to do with the “fact” that in those days, men were men and not the wimps they are today. Gould’s explanation was much more grounded in fact. It is the variance and not the average which accounts for the observation.

      A lot of the time people will think that the null hypothesis of a t test is that the sample means are equal. The null hypothesis is actually that the samples are drawn from the same population, meaning that the standard deviations are also equal (plus the skewness and kurtosis). The lack of major league batting averages at or better than .400 is a reminder that parameters other than the mean have influence over what we observe in real life.

      Like rachelrachel, I miss Gould’s essays.

Comments are closed.