The trouble with twin studies

Twin studies claim to factor out the influence of environment on behavior, leaving a purely genetic residual. But twins spend nine crucial months together.

Kevin Drum is right to say that studies of separately adopted twins have long been considered the gold standard in research on the heritability of IQ. But I’ve never understood why. Each twin in a pair spends nine crucial months in precisely the same environment, and an environment whose quality varies strongly with social class.


Just to be clear: I’m not an IQ denier, a viewpoint with as about as much scientific cred as global warming denial. Of course there are quantitatively measurable differences among humans (and in the distributions among population groups, even imprecise ones such as “race”) in cognitive capacity and personality traits. And of course some of those differences relate to the genetic constitution of individuals and groups. (And of course the overall human gene pool, including the parts of it with large cognitive/behavioral impacts, has been determined by Darwinian processes, but that’s the topic of a different denial.)

But the political question, as opposed to the scientific question, is how much of the disadvantage suffered by poor people generally and by particular ethnic groups specifically in the United States is due to immutable personal characteristics, and how much to potentially mutable social conditions. If poor people are poor mostly because they’re stupid due to genetic deficiencies, and if black people are disproportionately poor because they’re disproportionately subject to those genetic deficiencies, then it would seem that poverty and racial disadvantage are, as Aristotle might have said, “by nature” rather than “by convention.” I know the defenders of The Bell Curve deny that Herrnstein and Murry meant to say that, but I know too that the book was a best-seller because people who wanted to read it as saying that could and did.

(I say “it would seem” that genetic determinism justifies inequality, because even that argument depends on the belief that it’s fully just to distribute rewards according to social contributions, no matter how unequal the result, and on the further belief that Ken Lay and Ann Coulter make larger social contributions than do schoolteachers and social workers.)

Insofar, however, as those “innate” human characteristics are the product of social conditions &#8212 insofar, for example, as poor prenatal environments help transmit disadvantage from generation to generation &#8212 then the disparities we see are partly the product of social choice rather than of unvarying natural law, and the question arises whether we ought to make different social choices (which of course requires, among other things, changing the behavior of today’s disadvantaged groups in the interest of their descendants, for example by discouraging smoking among poor pregnant women).

Now I would be for greater equality of income and of social and political power even if I thought that we lived in a pure meritocracy and also that intergroup differences in cognitive skills were the product of genetic variation. (Similarly, I’d be against discrimination on the basis of sexual orientation even if I thought that male homosexuality was mostly chosen rather than innate.) But lots of voters will make the link I reject, so it matters what they think about the underlying scientific question, even though only a few percent of the population actually knows enough on the question to be entitled to an opinion.

Just as it would be an offense against the reality principle to shape one’s opinions about human psychology to one’s preferred political outcomes, it would be naive to ignore the political implications of scientific controversy. The more human inequality is taken to be natural rather than conventional , the better for the haves in their age-long struggle to continue to oppress the have-nots with a clear conscience.

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact:

38 thoughts on “The trouble with twin studies”

  1. Your point is completely obvious, so obvious I can't figure out why I never thought of it myself.
    So: if we divide the "inputs" of IQ into genetics, environment in utero, and environment in "life", twin studies should be able to tell us about the role of environment in life, holding environment in utero and genetics constant. But now that you've brought it up, it seems to me like we could actually run reasonable experiments on the role of in utero environment on IQ. You can't assign well-off people to drink, smoke, and eat poorly, of course, but you could choose some treatment group of pregnant poor/low-class/unhealthy women and give them extra hospital care, more visits with nurses, more instruction on how to take care of themselves and their babies, that sort of thing.
    Maybe some of these studies have already been done — after all, we do empirically test prenatal intervention programs, don't we? I'd imagine the focus would tend to be on health and education outcomes, but there's no reason not to track IQ as well. Is this something that IQ researchers have studied?

  2. That is indeed a crucial point. Lots of people treat "innate" and "genetic" as synonymous, but they aren't. You can see that even in such a simple example as cats' fur. A cat and her clone have exactly the same genes, but they don't look the same.

  3. Bingo, we have a winner. In one sentence Mark has rewritten the map of the discussion.

  4. How, exactly, does the womb differ for developing children based on "social class?" Your analysis is so off as to be ridiculous at its unspoken central belief, which would appear to be that some "lower class" of people have less-smart children because of their "class status."
    First off, what are the "classes" in US society?
    Second, if a "lower class" woman who doesn't smoke, drink, over-exert and eats healthy, how is her womb less well off than a "high class" woman who smokes, drinks, exercizes and eats poorly?

  5. Ah, yes, the poor undernourished unborn. This must be why so few professional athletes and other top performers come from poor backgrounds.

  6. Surely William Young isn't seriously asking whether the prenatal environment affects fetal development. I realize he's never been pregnant, but doesn't he know anyone who has? Hasn't he noticed all of the medical advice mothers get, all of the ways in which they're supposed to watch themselves?
    And surely he isn't trying to suggest that there's no correlation between prenatal care, nutrition, consumption of alcohol, tobacco, and other drugs, and social class. None of this is even slightly controversial.
    If he has ever met someone who has been pregnant, surely he has noticed how much you think these things matter when the child in question is your own. Why insist that they must be irrelevant in the agregate?

  7. You've raised an interesting _possible_ objection to twins studies without providing any evidence whatsoever that it is a _practical_ limitation on drawing conclusions from such work.
    Yes, today poor women are more likely to smoke and drink during pregnancy, and less likely to have access to pre-natal medical care than middle class and upper class women. But it seems quite likely to me that the effects of pre-natal toxicity, nutrition and medical care are threshold effects.
    In other words, the difference between a truly poor woman who receives little to no medical care and a middle or upper class woman who does receive medical care are probably meaningful around the margins in determining her likelihood of giving birth to a child who has developed to his or her full genetic potential pre-birth. But it seems highly unlikely that there are statistically meaningful differences between a blue color woman who gets pre-natal care at a community hospital and a wealthy woman who gets care at a world-renowned academic medical center.
    For the most part, not smoking (much), not drinking (much), taking pre-natal vitamins, seeing an OB/GYN every few weeks and getting adequate sleep are all that is neccessary to make sure a pregnancy is healthy. These things are sometimes out of reach to the truly poor, but not at all to even the lowest rungs of the middle class ladder.
    To show that twins studies are invalid you'd have to show that any genetic differences yield in intelligence disappear from the analysis completely if truly poor birth mothers were excluded. This is possible, but its an empirical question – and one that you haven't begun to answer.

  8. I see others have beat me to it: the womb is probably one of the more egalitarian environments available, although probably not perfect.
    But, there are many twin studies that are designed to show differences in outcome or traits between identical and fraternal twins, and thus do control for the potential impact of a suboptimal gestation. I think some of the intelligence studies cited tried to do that.

  9. As I've been arguing over at bitchphd for the last few days, heritability has almost no relevance to what little is known about IQ, let alone how we ought to shape policies or choose individuals for certain programs or favors. The best overview of this debate, with bookoo background on what heritability actually means (as opposed to what people think it means, you'll likely be surprised) and how it is misinterpreted is this oldy but goody:

  10. sd said: "To show that twins studies are invalid you'd have to show that any genetic differences yield in intelligence disappear from the analysis completely if truly poor birth mothers were excluded."
    Here ya go, sd. Wearing skirts is 50% heritable. In fact. It is also ~100% genetically determined. Despite this, it is neither innate nor genetic. Any study that measures heritability is fundamentally flawed when you try to interpret it outside its agronomic roots. Farmers trying to decide what strain to plant in which field and how much fertilizer to use will find some utility in measuring heritability of crop yields. People studying human behavior will find no such utility. See the Lewontin example I linked to above.

  11. You're somewhat wrong to, in that identical twins can have fairly different in utero environments. Things like Twin to Twin transfusion can significantly affect birth weights, and other preemie conditions, such as intercranial hemorraging. My twins were born with a mild case of TTTS, but had a birth weight disparity of almost 300g, or 20%.

  12. What has always puzzled me about twin studies is how often does it happen that 1) there are twins who 2) are raised separately and 3) included in studies? There can't be thousands, can there?

  13. Connie, there's that, and recritement, and adoption. I don't have the sources handy, but I encountered some things during the debates on 'The Bell Curve'. To wit:
    Twin studies rely on twins being separated at or shortly after birth, but also being both locatable. This is far from a random thing.
    Twin studies use adoption in a manner similar to randomization in a randomized experiement. Again, not a good assumption.
    The Minnesota twin studies 'proved' very high heritability in almost everything; IIRC, even whether somebody is Catholic vs Lutheran. The obvious question is how adoptions are being done For example, if they're done from one part of an extended family into other parts, there'd be a social/environmental linkage between the twins.
    When other researchers questioned these matters, the researchers in Minnesota were unwilling to hand over data.
    In the end, it comes down to: twin studies are very useful, but only in the hands of ethical researchers; 'Bell Curvists' will produce only junk science and propaganda.

  14. CalDem is correct, that these problems have been well known (though seldom believed, to judge by the number of such studies that still get published) for many, many years. The twins share their uterine environment and have very similar postnatal environments. Despite this, even monozygotic twins raised in the same household have never shown 100% heritability for any personality or behavioral traits. That is the big fat hairy deal that no one talks about. If twins reliably showed similar personalities etc. when raised in similar environments, we might be willing to believe that similar traits over different environments had similar "genetic" causes.
    But twins aren't identical. 40% heritability is considered a large effect. I'd call it very small and silly, relative to what the folk psychology says about the role of genetics in behavior.

  15. @ Barry:
    The link I provided above is THE definitive refutation of the Bell Curve. Chex it out.

  16. Without a way to go straight to the DNA and compare genes, twins were the best that could be done – until now. It is now feasible to compare the similarities of genomes directly, for example in this paper ( ) which looks at the heritability of height – and, by the way, gets results that are entirely consistent with the old-fashioned twin studies.

  17. @ tc:
    The problem is that heritability doesn't measure "genes" or "genetic programs" or "genetic effects." Heritability is a statistical method, not a statement about causality. Heritability is the percentage of a trait related to genetic *variance* out of total variance: H = Vg / Vtot. Total variance is genetic variance plus environmental variance: Vtot = Vg + Ve.
    So, H = Vg / (Vg + Ve)
    It follows that in any experiment where you measure heritability, you are making statements about *variance* (a population measure), and to infer anything at all about "genes" per se, you have to hold the environment constant, or limit its range to known values. The thinking behind twin studies is that you have monozygotic twins with identical or different environments, and dizygotic twins with identical or different environments, so you can make accurate measures of Vg and Ve. As we know, the Ve for these studies is actually quite narrow, so any measure of "heritability" is really a measure of Vg in a normal, upper middle class white neighborhood. We also know that, on an individual scale, noise and individual history are far more important than genotype alone. Hence, no monozygotic twins actually have identical personality or behaviors.
    So, yeah, twins were the best that could be done, but the studies don't tell you what you think they're telling you. For instance, you can measure IQ at age 7 and again at age 15, and the heritability calculated will be different at each age.
    The study you cite, for example, would show a very different heritability for height if it was conducted in 1906 rather than 2006, as nutrition has increased and (hopefully) the variation in the environment has decreased. Heritability != genes or "gene effects." If it were, why can we measure heritability for wearing earrings? Heritability is a statistical method, not a statement about causality. See the link I provided above, for a very good example from RC Lewontin on measuring heritability in corn plants; plus this one:

  18. So, H = Vg / (Vg + Ve)
    It's worth expanding on this point slightly.
    If Ve is narrow, as with twin studies (and the height study you mention; I doubt there were many low income families in the study) then the statistic H will be an overestimate of the true heritability. So the best twin studies have Ve set too low by default (kids don't get adopted into crack homes, on average), so no matter how good your estimator is for Vg, the output statistic is a vast overestimate. Add the confounding effects of instability of heritabilities across age, and you are left scratching your head about why anyone uses these stats outside agronomy.
    When you can define the total Ve across your front and back 40 acres, an estimate of H for your corn crop is meaningful. When you try to plant that corn in the next state, your estimate is purely bogus. The same is true for any study that relates heritability to public policy. We have no idea what effect rearranging the environment will have on IQ, since our estimate of Ve has such high bogosity.

  19. Thank you for the link No Nym. I want to add something. In the last two centuries, mass production and the economies of scale have thrown masses of people into lower social strata with concommitant trauma. Being a farmer in European/American society required a considerable degree of skilled knowledge. You had to coordinate taking care of domestic animals and knowledge of crops — this involved quite a bit of coordination and planning as tasks can only be done at certain times of year. Likewise there were many craftspeople with specialized skills as carpenters, rishermen, sailmakers, craftsmen, blacksmiths, shoemakers, weavers and the like. Modern industry — standing on an assembly line, does not require this amout of skill and whole generations of people have been forced to take lower skilled work. This is still going on in rural areas, where the only jobs are now in construction, working at McDonalds, or garbage hauling. The white collar sectors are only beginning to feel the effects of this technological unemployment — and I imagine the effects on the self-respect of younger generations will be as devastating as it has been in the blue collar sectors.

  20. I forget where I've heard this (I didn't make it up), but in the spirit of No Nym's posts here's a great example demonstrating that "heritability" doesn't always mean what you'd think it means.
    If you look at the number of fingers a person has, you'll find that almost everybody is born with 10 fingers. The variance is nonzero, but it's extremely small. Some number of people, however, lose fingers through the course of their lives — amputations, accidents, etc. So the "heritability" of #Fingers is almost 0: it's nearly 100% "environmental".

  21. I'd love to know what's supposed to be "reality based" about thinking poverty in America is a problem of "oppression". That must be one damned weird definition of what constitutes "oppressing".

  22. The thing that has always amused me about these nature/nurture discussions is that in my observation the vast majority of human beings use at most 33% of their potential "brainpower" (for lack of a better word). When put in a very challenging do-or-die environment just about everybody can and does do better, often by factors of 2x or even 3x.
    So if IQ is a measure if potential brainpower it doesn't mean much, because the vast majority of people (including the smug Bell Curve readers) don't user anywhere near their potential. And if IQ is a measure of _current_ brainpower, it doesn't mean much because just about anyone /can/ stretch their brain by at least 2x if properly motivated.
    Either way, it don't mean much.

  23. Cranky, it does, in the same sense that how fast you can run for your life is not how fast you'll run in daily living.

  24. "So the "heritability" of #Fingers is almost 0: it's nearly 100% "environmental"."
    Posted by Alex F
    Not only that, but the "heritability" of #Fingers changes with age. It'd be higher among newborns than among 70-year olds.

  25. @ Barry and Alex F:
    yeah, heritabilities are nonstationary, and that is a major problem with their interpretation. If we measure height among wealthy 8 year olds, we'll find high heritability. If we measure it again when they are adults, it will be lower. The same is true of reading ability, IQ, etc.
    The policy implications of reading comprehension being "only" 50% heritable at age 10 are enormous: environment is overwhelmingly important (since we know the study likely understimates Ve). What's odd to me is that people look at the number of minority X in career Y, and say "oh, that's 50% heritability, so it's destiny. Um, the only way it can be destiny is if it's 100% heritability.

  26. @ Brett:
    if the thesis that anyone in America can work their way out of poverty is true, social mobility and income inequality should either remain constant or shrink.

  27. > fast you can run for your life is not
    > how fast you'll run in daily living.
    The difference between brainpower and physical exercise, at least in my anecdotal experience, is that once a person cranks the brain up to a higher level it stays there without much effort (which isn't to say zero effort), and each subsequent step gets easier not harder. In fact at a certain point the improvement becomes self-sustaining. If this is true of the exercise bike I haven't noticed it yet 🙁
    Again, most people are so far from using the ultimate potential of their brains that any talk of setting categories in stone is ridiculous IMHO.

  28. Crazy thing I discovered yesterday: wealthier, better educated, older, and white women are all MORE likely to drink during pregnancy, at least in CA:
    I bring this up to to illustrate that many of us (including me) have certain prejudices that assume everything poor women do is worse. Meanwhile, a prenatal factor that is not as widely known is stess:
    Prenatal stress, which is arguably more common among the poor can actually impact the baby's development. Moreover, it can impact that child's ability to cope with stress later in life, which implies a rather disturbing cycle.
    In summary, yes, prenatal environment matters. However, making assumptions about the prenatal environment based merely on socioeconomic class is dicey.

  29. "if the thesis that anyone in America can work their way out of poverty is true, social mobility and income inequality should either remain constant or shrink."
    A conclusion which would not logically follow even if there were no distinction between "can" and "will".

  30. Excellent comments out there on the what is actually indicated by "heritability". I just want to add one more example. If we were not talking about intelligence, but rather visual acuity then it would be easily seen that we could just get glasses (an environmental effect) for the disadvantaged group. Even if the heritability of intelligence is as a high as for near sightedness, there is still the possibility that an environmental effect exists that would eliminate performance differences.
    As for environmental differences in utero, I belive the problem is likely to be related to things like the higher probability of heavy metals and other pollutants in the mother's body among the poor compared to the well off.

  31. How can it be the same environment when each twin's environment (is in large part) the other twin?

  32. No Nym has made the right point about heritability — it just doesn't have the social implications that people think. In the Jim Crow South, the ability to vote was almost 100% genetically heritable. R-squared was pretty close to 1 since almost all the variance in ability to vote could be explained by genetic factors.
    However, I think Mark has unfarily maligned the science. A number of studies compare fraternal twins with identical twins precisely because both groups share the same pre-natal environment without sharing the same number of genes.

  33. One thing that I'd add to your article, No Nym, is that the black-white IQ gap is not stable. IIRC, it's dropped by 7 points over the past 30 years – roughly a generation. That suggests that it'll be gone by the third generation born after the end of Jim Crow.

  34. A word of correction on the Bell Curve: Murray and Herrnstein specifically point out that if all differences in IQ were genetic (a conclusion that they reject), one possible social policy conclusion would be that the government should do ALL THE MORE to help poor people (who bear no blame for their genetic endowment).

  35. Brief googling suggests the black-white IQ gap may have narrowed 5.4 points in about 30 years — or it might not have, according to other people.
    At any rate it can't necessarily be assumed the trend will continue in the future.

Comments are closed.