Judging Defendants Who Defy Stereotypes

Evidence matters more in criminal cases when the defendant fits the juror’s stereotype of the typical offender

New research by Blake McKimmie and colleagues (abstract here, full article paywalled) suggests that evidence matters more in criminal cases when the person accused of the crime fits the stereotype of the typical offender.

Mock jurors were asked to determine the guilt or innocence of a defendant accused of robbing a gas station at gunpoint. Half of the mock jurors read a description of the case that relied on weak evidence of the defendant’s guilt (e.g. vague eyewitness accounts) while the other half read a description of the case that included strong evidence (e.g., detailed eyewitness accounts, DNA evidence).

As one would expect, the jurors were much more likely to vote to convict when the evidence of guilt was strong than when it was weak. But this effect only held when the defendant was male and therefore fit the stereotypical image of an armed robber. When the defendant was female, the strength of the evidence was unrelated to jurors’ judgments of guilt.

The authors speculate that when a defendant does not fit a stereotype, jurors attend to the mismatch instead of the specific facts of the case. In the study example, even though women as a whole were not on trial, the jurors tended to focus on the question “Would any woman really be capable of robbing a gas station at gun point?”. This apparently lead them to focus mainly on the defendant’s characteristics relative to the variable that didn’t fit their stereotype (e.g., “Are the signs that she is more aggressive than the average woman?”). This allocation of attention and cognitive effort had the effect of distracting jurors from something which is supposed to be influential in assessments of guilt: The strength of the evidence in the particular case being tried.

It’s an ingenious demonstration of a principle that the psychologists Amos Tversky and Daniel Kahneman explicated so well: When asked a question, our minds will often a substitute a different question before giving the answer, without us being aware of it.

Author: Keith Humphreys

Keith Humphreys is the Esther Ting Memorial Professor of Psychiatry at Stanford University and an Honorary Professor of Psychiatry at Kings College London. His research, teaching and writing have focused on addictive disorders, self-help organizations (e.g., breast cancer support groups, Alcoholics Anonymous), evaluation research methods, and public policy related to health care, mental illness, veterans, drugs, crime and correctional systems. Professor Humphreys' over 300 scholarly articles, monographs and books have been cited over thirteen thousand times by scientific colleagues. He is a regular contributor to Washington Post and has also written for the New York Times, Wall Street Journal, Washington Monthly, San Francisco Chronicle, The Guardian (UK), The Telegraph (UK), Times Higher Education (UK), Crossbow (UK) and other media outlets.

17 thoughts on “Judging Defendants Who Defy Stereotypes”

  1. “…detailed eyewitness accounts…” as “strong evidence”? Really?

    I guess I’d throw up some caution signs right there. While much is often made of eyewitness accounts in court, their reliability is surprisingly low. If people assume that “details” in an eyewitness account make it reliable, they are sadly mistaken.

    Memory is a surprisingly fungible thing. Simply recalling an event can reconstruct a memory and alter it. Rather like the phenomenon in physics where the act of measuring a tiny object is enough to alter it. Details are just details and any defense lawyer worth his/her salt should vigorously remind jurors that eyewitnesses may be honest in their memories — and also completely wrong.

    1. You raise a side point, whether it is correct or not does not threaten the validity of this study. There was a manipulation check in the study, subjects considered the evidence stronger in the specific eyewitness + DNA example (And note also you left the DNA out).

  2. “detailed eyewitness accounts” as “strong evidence?” Really?

    And DNA. Yes, eyewitness testimony by itself is pretty dubious. If it’s backed by physical evidence that points in the same direction, then it’s not carrying the same load.

  3. DNA evidence for a gas station holdup?? Am I out of touch, or has someone been watching too much TV?
    This question of verisimilitude is not relevant to the main point of the study or to Keith’s general point (which should also make us cautious about survey research, especially one-off surveys). But it does make me curious about the possible CSI-ification of public perceptions of what cops do. I was once a jury in a gun-possession case. Several of my fellow jurors were surprised that the police hadn’t taken fingerprints from the gun. We acquitted.

    1. Hi Jay: IIRC the robber had on a mask and sweater. The police find the sweater and it has a hair on it that matches the DNA of the accused.

      DNA may have been over the top in this case, but of course strength of evidence was one of the key independent variables and you want as much variance as possible (a rule once described to me as “If you are going to shock a white rat, shock the shit out of it”). If the subjects had not seen one of the conditions as stronger evidence than the other, then the experiment could not have been conducted. So they created a set of conditions in which subjects clearly felt that the evidence was much stronger in one of them than in the other.

  4. This sounds to me like a fairly straightforward application of Bayesian reasoning. If your prior probability is low enough, it’s going to take a terrifically sensitive and specific set of tests to raise the posterior probability to “beyond a reasonable doubt”. Otherwise the likelihood of a false positive is still too high.

    You can argue that the estimates of prior probability are all wrong, but that’s a different part of the reasoning process.

  5. I’m not sure that these studies indicate what the authors think they do, and I’m even more skeptical about seeing them as supporting Kahnemann’s thesis (with the caveat that I haven’t been able to read the actual paper; none of the universities I have access to are subscripted to the journal).

    Consider the hypothetical case of a DA who, over the years, prosecutes an approximately equal number of men and an approximately equal number of women for robbery and that juries vote to convict an approximately equal number of men and women. What this would mean is that (in America) about 90% of the convicted women would be victims of a miscarriage of justice (because about 90% of robbers are male).

    There appears to be an unspoken assumption that the prosecutor is bringing a strong case (where he or she is at least convinced of the guilt of the defendant), but the evidence provided hardly matches that assumption (even the supposedly “strong” evidence). The problem, from what I am seeing, is not that the jurors were substituting one question for another, but that in the case of the female defendant, the jurors were more likely to (correctly) presume innocence, while in the case of the male defendant, they allowed themselves to be sold some questionable goods by the prosecution (or, at least, that would be one reasonable interpretation, given the limited facts I have available).

    NB: Because Kahnemann’s book keeps being mentioned, I’d like to note that I’m not quite as impressed by it as a lot of other people appear to be.

    1. @Katja:

      the jurors were more likely to (correctly) presume innocence

      One other useful bit of info I probably should have put in the post: The jurors rated the woman defendant’s likely guilt between the other two conditions, i.e., more likely to be guilty than the man accused with weak evidence, less likely to be guilty than than man with strong evidence against him. Make of that what you will.

      There is no assumption that the prosecutor is bringing a strong case, there is an empirical test of whether the jurors thought it was stronger than the weak case, and the test showed that they did.

      NB: Because Kahnemann’s book keeps being mentioned, I’d like to note that I’m not quite as impressed by it as a lot of other people appear to be.

      If only someone would write a guest blog post on that topic….

      1. Keith: One other useful bit of info I probably should have put in the post: The jurors rated the woman defendant’s likely guilt between the other two conditions, i.e., more likely to be guilty than the man accused with weak evidence, less likely to be guilty than than man with strong evidence against him. Make of that what you will.

        That does not seem to be compatible with the assumption that Jurors substitute a question in their mind for another, though.

      2. Keith: If only someone would write a guest blog post on that topic….

        My critique of Kahnemann’s book would not make for a good blog post; I approached the book the way I would approach a conference or journal paper I was given to review. This does not create a critique that’s a coherent narrative, but a long list of largely independent notes. Worse, on the internet a recommendation of “accept with corrections” can easily be misunderstood as “reject” by people who aren’t familiar with typical peer review procedures.

        I’ll give you an example of an excerpt out of Kahnemann’s book that I found problems with:

        The power of random anchors has been demonstrated in some unsettling ways. German judges with an average of more than fifteen years of experience on the bench first read a description of a woman who had been caught shoplifting, then rolled a pair of dice that were loaded so every roll resulted in either a 3 or a 9. As soon as the dice came to a stop, the judges were asked whether they would sentence the woman to a term in prison greater or lesser, in months, than the number showing on the dice. Finally, the judges were instructed to specify the exact prison sentence they would give to the shoplifter. On average, those who had rolled a 9 said they would sentence her to 8 months; those who rolled a 3 said they would sentence here to 5 months; the anchoring effect was 50%.

        That is certainly dramatic, though the result is, on the face of it, not all that surprising. German judges do have a fair amount of discretion, and an error of three months does not seem particularly outlandish in this context.

        However, the description raises two questions. A mathematician will wonder how exactly you can load a pair of dice so that they can come up either 3 or 9 (it’s impossible). And for anyone familiar with the German criminal justice system, the description does not make any sense at all.

        First of all, shoplifting charges, like most de minimis offenses, are typically diverted under §153 or §153a of the Criminal Procedure Code. It would have to be a very serious case of shoplifting to call for a prison sentence.

        Second, sentences of six months or less are to be awarded only in exceptional circumstances under §47 of the Penal Code. The German Criminal Justice system strongly prefers fines over short prison terms, because they are comparatively effective as a deterrent, but do not have the negative effect on further criminalization and recidivism that prison terms have.

        Third, when a prison sentence of one year or less is awarded, §56 of the Penal Code encourages the court to suspend this sentence.

        These questions raised, I went and looked up the original study. This resolved some of the questions I had, but also showed that Kahnemann had misreported parts of the study.

        First of all, the judges had to “find a sentence in a fictitious shoplifting case concerning a woman who had stolen some items from a supermarket for the 12th time”. Okay, so we’re not just dealing with a shoplifter, but with a multiply recidivist shoplifter. This explained why a prison sentence was considered (though, as the paper explains, it’s still a suspended sentence).

        Further problems arose, though: The judges in question did not have an average of fifteen years of experience on the bench (that was for the immediately preceding study), but “were junior lawyers from different German courts who had recently received their law degree and had acquired their first experiences as judges in court”. As it turned out, there were two sets of dice: One was loaded to produce a result of three all the time, another one was loaded to produce a result of nine all the time. Furthermore, “[a]fter the dice had been thrown, participants were instructed to calculate the sum of the two dice and to fill in this sum as the prosecutor’s sentencing demand in the questionnaire. Participants then worked
        on the sentencing questionnaire, which consisted of the same questions that were used in Study 2.” This is a lot less dramatic than what Kahnemann describes, and there are apparently a number of steps associated with the questionnaire that had to be taken before determining the sentence, and it’s not very clear how aware the judges were of the randomness of the prosecutor’s sentencing demand by the time they had worked their way through it. (The study doesn’t seem to address why six months per §47 of the Penal Code doesn’t serve as a third anchor.)

        So, there’s a lot of dramatization and simplification going on (and even an actual mistake, though that appears to be an honest one, mixing up two very similar studies in the same paper). Nothing that’s outlandish for a pop science book (and that’s what it is), but not matching the rigor I’d expect of an academic work. It is, as far as I can tell, not typical of the book, but not unique, either. As a result, I’ve come to treat the book with a bit of caution, assuming that there may occasionally be simplification and dramatization involved.

        1. Katja: You raise problems with the example (the dice loading is truly challenging — can’t figure that out either) but not the principle. The same effect has been shown by spinning a roulette wheel and then asking people how many nations in Africa are in the UN. Despite the lack of connection between the wheel and the truth, answers cluster at where the roulette ball stopped. The principle hardly depends on one example, one study or one team of investigators.

          As for “Pop science”…well, he has a Nobel Prize in Economics, and the original classic papers (from Science I believe) that have been cited so many thousand times are in the text at the end. It’s an explication of hard core science for a general audience true, but pop science covers Eric Von Daniken, Dr. Oz etc and I think he deserves a bit more credit than those sorts.

          1. Keith, your response illustrates pretty much perfectly why I didn’t want to turn this into a blog post. 🙂 It’s just not suitable for this kind of forum [1]. Plus, as you get in the nitty-gritty details of it, it can easily be read as “this book sucks!”, which I most definitely don’t think.

            As I said, I don’t have a problem with the result, but with how the underlying story is narrated; i.e. for effect, rather than in an attempt to correctly describe the results of the underlying study. Weak methodology can still yield correct results, but reduces confidence in the results all the same.

            More importantly, this was just an example of a problem I found and that cannot easily be generalized. It is not representative of my (hypothetical, yet to be written) critique, and I have difficulties finding an example that is. There’s no coherent story to my notes that I could turn into a blog post.

            If I had to give my criticism a theme (i.e., gun to the head, and accuracy be damned), I’d have to say that there’s too much of a focus on mono-causal explanations and corroborating over contradictory evidence (e.g., the part on loss aversion, where there’s a much larger body of research than you’d think after reading it, a fair amount of which is critical of cumulative prospect theory). But this strays into criticizing the book for not being something that it doesn’t aspire to be (kind of like criticizing an introductory math book for omitting the Peano axioms), and thus would be a bit unfair. On the other hand, it makes the book limited in value as more than an introductory text.

            Also, I didn’t mean to use “pop science” as a derogatory term, either (and in my opinion, von Däniken already fails at the “science” part, hard; what he does is pseudo-science, mythology wrapped in some of the external trappings of science, just as young earth creationists do). Pop science books make science more accessible, often (necessarily) by reducing the complexity so that average people can understand it. There’s nothing inherently wrong with that; such books are genuinely valuable. I am still a bit uncomfortable with the degree of dramatization that Kahnemann sometimes uses, but that may have been the result of cuts in the editing process, too.

            [1] Related: Why peer review being anonymous and not public is generally a good thing.

          2. Keith, your response illustrates pretty much perfectly why I didn’t want to turn this into a blog post

            Oh dear, did I play this one wrong! If I renounce my view entirely, will I get a guest blog from you?

    2. Katja,
      I think you drew too simple an inference from your hypo. If a DA prosecuted the same number of men and women for robbery, and got equal conviction rates, I think it would be just as easy to assume some kind of individual or institutional sexism. Sort of like black folk and drug possession. I don’t think that prosecutors are imprisoning innocent black people on possession–just differentially charging guilty black people.

      But in any case, I agree with you and Olof and paul–Bayesian reasoning is a strong alternative hypothesis. Although good Bayesian reasoning always allows for adjusting priors in the face of strong contrary evidence. And most people think that DNA evidence is stronger than it is.

      And you should take Keith up on his offer to post on the Kahnemann book. I’m not particularly impressed by Cass Sunstein’s attempted applications of framing, but I was impressed by the book. I’d love to see a strong voice to the contrary.

  6. I think this just might explain why we answer “yes” to the question “Would you throw a switch the send a train down a track where it will kill one person, in order to save five?” but answer “no” to “Would you push a fat man off a bridge in front of a train to save five people?” We change the second question to “Is it really possible to stop a train by pushing a fat man in front of it?” And our answer to that question is “I think not.”

Comments are closed.