Chuck Colson and the Starfish Principle

Before getting down to serious business regarding my Slate essay (*), three corrections on factual points:

1. The original version described the program as “fundamentalist.” It turns out that the Prison Fellowship is on the “evangelical” side of the fundamentalist/evangelical divide, a distinction I was aware of but don’t quite fully understand. Many of those who call themselves “evangelicals” regard “fundamentalist” as a term of abuse. I should have been more careful, and I apologize to anyone who was offended. The error has been corrected on Slate.

2. The document I quoted about faith appears in the King James Version as “The Epistle of the Apostle Paul to the Hebrews.” But apparently modern scholars, and more contemporary Bible translations, reject that traditional attribution. I’m happy to be corrected on that point.

3. The difference between the experimentals and the controls isn’t statistically significant, so saying that the experimentals did “somewhat worse” can’t really be supported. All one can say with confidence is that the Penn study does not provide support for the claim made by PF and the White House that IFI reduced crime among its participants.

Most of the correspondence from my Slate essay that wasn’t merely vituperative concerned the issue of selection bias. The objections came in two forms, equivalent statistically but not emotionally.

Several of my pen-pals used the parable of the starfish: On a beach where several thousand starfish lie stranded after a storm, a little boy is picking them up, one by one, and throwing them back into the water. A grown-up says to him, “What you’re doing is very nice, but it can’t possibly make a difference to all these starfish.” The boy nods, picks up another starfish, throws it into the ocean, and says, “Made a difference to that one.” If IFI helped some prisoners, why criticize it for not helping other prisoners?

The other form of the same objection is that, since no treatment works on those who don’t get it, it’s logical to measure results on completers only rather than all attempters. (One of those making this objection was, rather frighteningly, an MD engaged in cardiology research.) Actually, the medical analogy, which several of my correspondents invoked, is a good one, and perhaps a numerical example will help:

Imagine a disease, and a proposed treatment for that disease. We want to know whether the treatment works. What experiment should we do, and how should we interpret the results?

Take 2000 people with the disease. Randomly select 1000 of them as “experimentals,” leaving the other 1000 as “controls.” The experimentals get offered the treatment; the controls we just observe.

Now say that, of the 1000 controls, 100 get better. That’s a recovery rate of 10%. That’s the target the treatment has to beat to convince us that it works.

Assume that half of the experimentals accept the treatment and follow through to the end. So we have 500 “completers” (or “graduates,” in the IFI context). The other 500 are “drop-outs.”

Now imagine that 75 of the 500 completers recover. That’s a recovery rate of 15%. The treatment worked!

Wait. Not so fast. We need to look at the dropouts. If 50 of them recovered, the same rate as the control group, then we can say the treatment effect was real: 125 of the experimentals, but only 100 of the controls, got better. But what if only 25 of the drop-outs recovered? Then what would we say?

If we look at the whole group of 1000 people who were offered treatment, 100, or 10% of them, recovered, the same as in the control group. So being offered the treatment didn’t do anything to improve the recovery rate. Something’s wrong here. We’d have to say that something caused the people in the experimental group who were more likely to recover to also stick with the treatment. (Perhaps people who start to feel better have more energy to continue.)

It’s not that we’re blaming the treatment for the bad outcomes of the dropouts: it’s just that we’re noticing that its seemingly higher cure rate came from cherry-picking its participants rather than actually curing them.

The only other possibility, assuming that the original randomization succeeded in producing similar groups, is that the treatment helped some people and actually hurt others. That’s possible in the medical context. I suppose it’s conceivable that IFI actually made some of its participants worse, but it’s not obvious how that would happen. Selection effects are a much more likely explanation for the actual pattern of results. Anyway, how happy would we be with a program that hurt more people than it helped?

[One sophisticated correspondent suggested that the volunteers for IFI might have included a higher proportion of manipulative inmates, and that therefore the two groups weren’t really matched. That could be true, though at first guess you’d think that people who volunteered would be disproportionately those who really wanted to turn their lives around. But at best that “negative selection effect” theory means that we’re not sure the program failed; as mere speculation, it can’t justify a claim that the program succeeded. The same applies to claims that it might lead to better outcomes after the study period, or to gains other than reductions in crime. They’re all conceivable, but there’s no evidence for them, and the claim made was that the program was a proven method of crime control.]

If IFI had succeeded according to “just-one-starfish” rules — helping some while not hurting others — it could reasonably claim success. But it didn’t, according to the numbers its advocates put out.

Several people asked, rather huffily, if UCLA uses all its matriculants, rather than only its graduates, when it advertises the success rates of its students. Good question. I probably don’t want to know the answer.

Many of the extreme claims made for the job-market benefits of higher education, and especially of elite higher education, are merely applications of selection bias. (One of my colleagues at Harvard used to claim that the institution’s operating principle was to select the best, get out of the way while they educated one another, and then claim credit for their accomplishments.) Scholars who have worked hard to overcome the selection-bias problem report that, even correcting for that, higher education still has a respectable rate of return in financial terms.

I’m still skeptical, because some of those increased earnings for graduates presumably come at the expense of the people they beat out for jobs; the social rate of return must therefore be less than the private rate, and it’s conceivable that the marginal social rate of return to higher education, in purely financial terms, is negative. Whether the non-financial benefits of higher education are large enough to compensate would be a different, and even harder, question.

But, yes, an honest study of the effects of higher education would have to look at drop-outs as well as graduates.

I was also accused of “hypocrisy” for liking literacy programs despite the absence of true random-assignment studies. There are good logical reasons to think that boosting reading scores improves job-market opportunity, and job-market success reduces recidivism. There is no doubt that adults with low reading scores can be taught to read better at relatively low expense. So I have fairly good confidence that prison literacy programs work, and are cost-effective compared to other means of crime control.

But note that my conclusion was not that we should launch a massive program of prison education, but that we ought to run the experiment. If it came out negative, on a true random design including the dropouts, I’d be disappointed and a little surprised, but I’d have to say either “I give up” or “Back to the drawing-board.” I wouldn’t tell fairy-tales about how it really worked, if you only look at the people it worked for.

Some of my readers wanted to take this as a matter of perspective, or of opinion. Sorry. It’s not. It’s a matter of black-letter statistical method, something that could be on the exam in any first-year methods course. Each of us gets to choose a viewpoint, but we all have to work from the same facts.

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact: Markarkleiman-at-gmail.com