What the numbers (don’t) say

“Let the data speak for themselves”? If you start hearing the data, seek professional help.

Sander Greenland, speaking today on the oft-heard demand of classical statisticians to “let the data speak for themselves”:

If you hear the data speaking to you, seek professional help.

There’s no such thing as a-theoretical data analysis. There’s always a model lurking there somewhere. All statistical analysis can tell you is how consistent the data are, or aren’t, with some set of parameter values for that model. If the model is wrong – if, for example, you’ve assumed away an interaction effect or a sample bias – the results will be equally wrong. Properly used, statistics is an aid to thought, not a substitute for it.

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact: Markarkleiman-at-gmail.com

22 thoughts on “What the numbers (don’t) say”

  1. Thank you Mark!!!!! I say this all the time, that data never simply speaks for itself. There's too many people in the applied research arena that run around talking about "evidence-based" approaches to this or that, but when you question their evidence they say "just look at the data, the data speaks for itself". No it doesn't! You interpret that data through a lens. For example, as you pointed out in a previous post Mark, it's clear that you and and I pretty much see most things as differently as night and day. We could both be looking at the same statistical "facts" but read them in very different ways when it comes to the practical policy implications. Everyone holds to paradigms, theories, and world-views. I just wrote a piece entitled "A Primer on Theory and Why It Is Important to Policy" in which I argued for this exact point, that policy-makers should care about theory because statistics never simply speak for themselves. Remember the old quote "there are lies, there are damn lies, and there are statistics". I'm a heavy quant guy so I don't discount statistical analysis at all, but I'm also a huge proponent of theory and of sorting through underlying assumptions.

  2. Greenland is a great epidemiologist for certain. In addition to reminding us that data must be interpreted in the setting of a model of some kind, he also makes crucial distinctions between measures derived from models and measures that are useful for making policy projections.

    For example, in basic epidemiology, every student is introduced to the concept of attributable fractions, which represent the amount of disease that would remain in a population if an exposure were completely eliminated. They are seductively easy to calculate once you have measures of relative risk and prevalence of exposure in a population. Greenland points out that attributable fractions can be highly misleading for formulating policy; he says, " Feasible interventions can rarely achieve anything near complete exposure removal, may have untoward side effects (including adverse effects on quality of life or resources available for other purposes), and may affect the size of the population at risk. Hence, intelligent policy input requires consideration of the full spectrum of intervention limitations and side effects, rather than just traditional estimates." (Am J Epidemiol 2004;160:301–305)

    In other words, not only do data require models in order to "speak" to anyone; the models themselves, even when problems of measurement error and selection bias are carefully taken care of, remain very limited (not always able, for example, to anticipate side effects).

    Some of the contentiousness in discussions over things like climate change may derive from unrecognized or unacknowledged limitations of even the best climate models as guides to policy formulation. Not all of the controversy is due to imbeciles like James DeMint. (Note: I'm not calling him an idiot; I know for a fact that he can go to the bathroom and get dressed all by himself.) Quantities that are seductively easy to calculate (once you have the software on your computer) may yet be only very imperfect guides when the time comes to make policy. Even if all the Yahoos could be eliminated from every public discussion of matters of science, our "attributable risk" models could still overestimate the benefits of their removal.

  3. Ed: "Some of the contentiousness in discussions over things like climate change may derive from unrecognized or unacknowledged limitations of even the best climate models as guides to policy formulation."

    Do you really think so? The people who build the models are the ones who recognize and state their limitations, and welcome other models with different approaches, strengths and limitations. That is why climate scientists long ago reached a qualitative consensus about the trends based on established theory, observations, and multiple runs of multiple models, and are now just narrowing the numbers. The IPCC conclusions are not based on any one model. The contention comes IMHO largely from people who don't understand the theory, the data, or the models, and challenge them randomly from a personal agenda: look at the comment threads on a climate site like Deltoid and judge for yourself.

  4. To say that theory is always present is far from controversial, and doesn't tell the whole story. When we decide to count the number of students of different races or ethnicities, or measure the amount of particulate matter in the air we breathe, those decisions are based on theories — after all, we don't include the eye color or shoe size of the students, so we must have some idea as to what is relevant — which is a theory. In fact, the measurement process is a political act, because when the government decides to measure A and not B it is tantamount to making allocation choices.

    But to me a "model" means something else. Too many of our colleagues don't look at the data (and thus "let the data speak for themselves"), but delegate the model-building to one of the 4 Ss (SAS, SPSS, Systat, Stata) regardless of the appropriateness of the assumptions inherent in these "models" — often linearity, independence, and/or normality. The modeling exercise then just becomes looking for an appropriate p-value.

    I'm not saying that Greenland is wrong so much as I'm saying that his aphorism is a bit too cute, and (to my mind) somewhat off the mark.

  5. James says of climate models, "The people who build the models are the ones who recognize and state their limitations, and welcome other models with different approaches, strengths and limitations."

    The same thing is true for epidemiology; the true professionals are always the first to point out the limitations of their studies. Sander Greenland was assuming that his readers know this as well.

    Mark's point was that the data do not "speak for themselves" but must be incorporated, sometimes successfully and sometimes unsuccessfully, into a model, and that many pitfalls must be avoided along the way. Trying to impose estimates from a normal distribution onto a phenomenon whose distribution in nature is lognormal, for example, means that the model will probably miscarry. Failure to account for confounders and for effect modifiers also results in model failure.

    I interpret Greenland's article as extending the caveat Mark cited to a higher level. Maybe you could say, college entrance exam style, data:model::model:policy formulation. Even when you have the world's best data, collected without bias and measured without error, you can still blow it if you expect the perfect data to speak for themselves. Similarly, even if you have a rigorous, optimally specified model, that model does not speak for itself when you need to make policy.

    Suppose that you have an excellent model relating cigarette smoking to lung cancer. You have relative risks with narrow confidence intervals for various levels of smoking duration and intensity, and well-validated estimates of smoking prevalence in the population. You can make a pretty good estimate of the attributable fraction of lung cancer in the population which can be attributed to smoking. Your model allows you to do that.

    But the model does not speak for itself in deciding what to do. Suppose you have a model that estimates how many deaths from lung cancer would be prevented if there were no cigarette smoking. But this does not mean that you could prevent that number of deaths by passing a law that made possession and sale of tobacco a felony. The model does not tell you how many deaths would occur as desperate smokers sought illicit supplies of cigarettes, how many deaths would result from battles between different mobster families to control the Chicago tobacco distribution markets, or how many people would die during the ensuing civil war as the army invaded North Carolina to plow under the tobacco plantations.

    My intended message was fairly simple. If you hear the data speaking to you, seek professional help, just as Greenland says. If you hear the model telling you how to make policy, you should also go back on your meds. Climate models are way above my head, but I still feel fairly confident in extending Greenland's caveat to their translation into most kinds of policy making. I hope this clarifies, or at least does not muddy the message.

  6. If I understand the prolixity correctly, shorter Ed Whitney:

    "This too I can use to support my man-made climate change denialism!!!!!!!!!!!!!!!!!!!!!"

  7. Properly used, statistics is an aid to thought, not a substitute for it.

    You mean we still have to think? WTF? The promise of all this statistical analysis and computer power was so that we could go all Parrot-Head and ignore the complex stuff.

  8. Venice nailed it. Knowing that climate change models are valid does not get us off the hook when it comes to the need to think about what to do about it. We need help from economists and sociologists and foreign policy experts and engineers of all stripes to plan the policy. Data requires much careful thought to translate into models; models require much careful thought to translate into policy.

    If that does not clarify it, I give up. I will confine further blog comments to http://www.dailykitten.com/.

  9. "Knowing that climate change models are valid does not get us off the hook when it comes to the need to think about what to do about it. "

    Knowing that wouldn't get us off the hook. Unfortunately, we're not even that far along, there isn't a climate model out there that's fully determined by the physics, they're all substantially reliant on fitted parameters for critical features such as cloud formation modeling. Then there's the issue of the actual data being much coarser than the models would actually require to function properly, leading to the models being driven by extrapolated data.

    Meaning it's proposed to make policy based on nested models…

  10. Just a few more transparent words around 'we shouldn't make any policy', 'tis all. Fluffing it up with polysyllaby doesn't hide it.

  11. Israeli author Avraham Burg, author of The Holocaust Is Over We Must Rise From Its Ashes, has people assuming that he is a Holocaust denier. We all depend on the ability of supposedly educated persons to read. I may have misread Sander Greenland's essay, but my posting was concerned with a relationship between statistical models and coherent policy making that resembles the relationship between raw data and coherent statistical models. If I have a chance to meet Prof. Greenland, I may ask him if this analogy constitutes a misreading of his article. My confidence in my own ability to read is cautious and provisional. We all misinterpret perfectly clear prose from time to time.

    The acerbic Dorothy L. Sayers lamented the misreading of her prose; http://www.worldinvisible.com/library/dlsayers/mi… gives a taste of what her pen could produce on this matter.

  12. "All models are wrong; some models are useful." George E.P. Box and Norman Draper, Empirical Model Building and Response Surfaces.

    When statisticians ask people to let the data speak for themselves, what they often mean is for the listener to ignore what he thinks he knows and consider the data. For a good example, look at the vaccination non-controversy. The vaccination causes (pick your favorite set of nasty adverse consequences) crowd know to a moral certainty that vaccines are bad. As a consequence, no amount of data showing otherwise can satisfy them. (By the way, this is not to say that there are not adverse outcomes to vaccination. I am saying that we don't have data that allow us to predict them: if they occur essentially at random, no such data can exist.)

    To put it another way, variation is the general norm in nature. The distribution is the reality, statistical summaries are artifacts. Thinking that the statistics are real and the distribution is the artifact is a serious error. Perhaps that sort of thinking is an example of a model that is both wrong and not useful.

  13. “The distribution is the reality, statistical summaries are artifacts.” This brings up one of the difficulties of “evidence-based medicine” and other enterprises to bring research models into the practical policy arena. Researchers may be attracted to outcomes which are easy to measure and happen to have numerical distributions that lend themselves to statistical hypothesis testing. Patients and doctors like outcomes that have something to do with what is actually bothering the patient. For rheumatoid arthritis, for example, laboratory levels of serological markers of disease activity have some advantages of the first kind, but it may happen that what is bothering the patient is overwhelming fatigue; this is more difficult to measure but more pertinent to the actual goal of the research.

    In other arenas as well, the same principle applies. In organizations, seniority is easy to measure, but leadership is difficult to measure. It is convenient to use the first as a surrogate for the second, but the substitution may not serve the goals of the organization very well. Even though you can measure seniority accurately to within one day of service, the numerical precision of the data should not be allowed to “talk to you.”

  14. "Let the data speak for themselves" is the earmark of a fool, admittedly.

    But "give the data a chance to disagree with you" is a path of wisdom.

  15. Researchers may be attracted to outcomes which are easy to measure and happen to have numerical distributions that lend themselves to statistical hypothesis testing.

    A lot of that depends on who's advising them. Hypothesis testing is all well and good, but it's a very limited way of looking at statistical inference. Ages ago, Frank Graybill said, "Define interesting parameters and generate appropriate interval estimates." It's always more useful to know that the relative risk is probably somewhere between 1.8 and 2.3 than it is to know that it's not 1. But again, this is a model-based inference. Even so-called nonparametric inferences have assumptions that underlie them, as Mark notes.

  16. Certainly Greenland and Kenneth Rothman want confidence intervals, but there are regrettably many p values still being published. Dr. Rothman will not publish p values, but they seem to hold on in journals that ought to know better.

    Making the numbers transparent is often difficult. Somewhere a story was told of a junior researcher who was studying interventions to improve the bowel habits of nursing home residents, describing the power of his study to detect a difference in proportions between treatment groups. A senior advisor asked him, "What are you going to tell the doctors, that your statistic represents the square root of the angle whose sine in radians is the proportion of the patients who shat?"

  17. The problem of transformations has been a plague for a long time. The advisor in your (apocryphal — I've heard that one a number of times placed in different institutions) tale should have told the grad student to get his confidence interval on the arc-sine scale and then back-transform to the probability scale. The transformations in this case are invertible over the range of interest, and there is a single parameter involved, so the confidence statement on the arc-sine scale carries across to the probability scale.

    Of course, your senior researcher was hung up on observed significance levels rather than parameters. Here's another apocryphal story, this time about R.A. Fisher. It's said that he was lying on his death bed when a young statistician was given a brief audience with Fisher. The statistician was told he could ask Fisher one question only. After thinking about it, he decided the right single question was, "What action do you most regret in your life as a scientist?" Fisher is supposed to have answered, "Ever saying anything about 0.05."

  18. In some of this discussion, it seems that the status quo is being uniquely privileged as being the only course of action that doesn't have any hidden unintended consequences.

  19. The status quo is uniquely privileged because it already exists, and so doesn't require massive interventions to arrive at.

Comments are closed.