Simple problem, simple solution

What do you do when a researcher reports that someone is cooking the data? Fire the complainer and move on.

When a researcher at a heavily-funded biomedical research lab reports that his team-mates are cooking the data, the solution is straightforward: fire the whistleblower, and keep moving.

Of course I have no competence to judge whether Daniel Yuan is right, and I have only limited confidence in mass-media reporters to grasp what’s going on. (Why not ask some people in the field to review the paper and Yuan’s criticism and say whether he’s on to something?) But the absence of a quote from someone senior at Hopkins saying “We’ve checked this over, and Yuan had it wrong” seems telling.

All money corrupts, and big money corrupts big-time. In the Middle Ages we had corrupt Church officials; today we have dodgy scientists. And the business model of the grant-funded parts of universities – and most of all of the medical schools – means that losing your funding means becoming a former scientist. There couldn’t possibly be more pressure to come up with something publishable, whether it’s accurate or not.

It seems to me that every university needs a research ombudsman, to whom a researcher with concerns about integrity can go and get an arm’s-length adjudication of his claims, with protection from retaliatory job action. If I were running NIH, I might want to make that mandatory for the top 100 grant recipients.

Author: Mark Kleiman

Professor of Public Policy at the NYU Marron Institute for Urban Management and editor of the Journal of Drug Policy Analysis. Teaches about the methods of policy analysis about drug abuse control and crime control policy, working out the implications of two principles: that swift and certain sanctions don't have to be severe to be effective, and that well-designed threats usually don't have to be carried out. Books: Drugs and Drug Policy: What Everyone Needs to Know (with Jonathan Caulkins and Angela Hawken) When Brute Force Fails: How to Have Less Crime and Less Punishment (Princeton, 2009; named one of the "books of the year" by The Economist Against Excess: Drug Policy for Results (Basic, 1993) Marijuana: Costs of Abuse, Costs of Control (Greenwood, 1989) UCLA Homepage Curriculum Vitae Contact:

22 thoughts on “Simple problem, simple solution”

  1. presents an interesting perspective on some of these issues. John Ioannidis is the best known authority currently researching the problems of non-reproducibility of medical research, but in this article he is joined by an economist who provides perspectives from auction theory, oligopoly, artificial scarcity, and economic uncertainty. The example of getting published in Nature falls under artificial scarcity in the era of the internet where the limitations of column-inches of printed page space do not apply.

    Having two medical experts team up with an economics expert yields a fruitful discussion.

  2. Areas in the University where there is huge money are very vulnerable to lies and deception. Big time university sports (Mike O’Hare, we are calling out to YOU here) is another example! And the UCs have had some dreadful incidents of nepotism in their administration, and remarkably high salary levels for jobs which when I was a pup would have been compensated at, maybe, twice or three times the salary of a journeyman teacher in a public high school. An ombudsman is one way to provide scrutiny. Some transparent mechanism for competition for grants, with the granters required to justify their choices? Maybe designate Chico State or Cal State Hayward as places where, instead of doing third rate me-too research, people get rewarded for checking other people’s work and taking shots at them if can’t reproduce?

  3. But would an ombudsman actually be free from pressure from the university to look the other way? I’d been under the impression that universities benefit from the grants and prestige of research that their professors do. Having it known that someone in their faculty has been falsifying data is going to give the school a black eye, so coverup will still be the first impulse.

    I think the only real check is making the raw data available to others, for them to confirm findings.

    1. Ditto. Data sharing is an absolute minimum. If the researcher won’t make their data available, the presumption should be fraud.

      And if the research is in any way government funded, they shouldn’t have any choice in the matter; Keeping the data from the public, the people funding the research?

  4. I’m not familiar with any of the people involved or the details of the case, but there are some clear warning signs there – about Yuan. You’re not meant to be a postdoc for ten years (half that is considered a more normal maximum), and Yuan was not publishing well at all (was in fact barely even publishing – he’s second or third author on a number of high-profile papers, but second or third authorship counts for little). The bit where Yuan didn’t put his PI (Principal Investigator) on a paper is a major violation of custom and expectation, and it’s essentially unforgivable that he’d submit a paper based on his work in the PI’s lab without the PI’s knowledge – and it’s not an important paper, and not in a prestigious journal (indeed, it’s a journal specifically created to combat the idea of prestigious journals). Yuan claims he was blackballed for subsequent employment; based on what’s in the article and in PubMed, Yuan was probably unemployable, even if all of his criticisms were correct – and I’m familiar enough with the general area of genome-wide studies both to believe Yuan’s basic criticism and to think that he somewhat overstated it, and massively overstated it when he alleged criminal defrauding of the granting agency.

    More generally, I’m not sure that the track record of inquisitors charged to expose research fraud is encouraging for your notion. See the Iminishi-Kari case, for example; people charged to investigate fraud are liable to destroy lives when personal tensions give rise to baseless allegations. Fraud does happen in science, sometimes on an appalling level, but it tends to be exposed. When exposed, it does tend to be hushed up fairly effectively; I could name a couple of disgraced offenders I’ve met personally, who have paid for their crimes against science with their careers and in one case even with a federal fraud conviction – and I suspect few here would ever have heard of them. But I think that wrongdoing on the level that Yuan alleges or at least implies by seeking federal prosecution is vanishingly rare, and is largely self-correcting within the scientific community.

    1. 2nd’ing Warren’s conclusions for the most part. Allegations of research fraud without immediate, indisputable proof are often very difficult to adjudicate because they usually come hand in hand with interpersonal problems, mental health issues, etc. It’s very similar to how sexual/racial discrimination claims end up playing out in practice, but without decades of legal precedent at all levels of the private and public sector to provide a framework for hashing out claims. Like discrimination claims, if a researcher hints at suspicions of fraud the situation escalates very very quickly. Poor social skills amongst scientists and science administrators usually result in delaying action and ultimately ignoring problems.

      The best way to combat this doesn’t really start with better mechanisms to deal with research misconduct. Hopkins has some options available and they don’t really work. It’s much more important to attack funding/compensation/publication mechanisms that incentivize poor research design and statistical analysis. Take away the pressure to publish all-encompassing research “stories” and I think you will see these problems reduced substantially. Of course, this is not easy to do. I think self publication of research on the web (and subsequent review by the public) might become more accepted soon and this may help. In a system in which peer review functioned properly, the concerns in this article would not matter as there are no allegations that I see of behind-the-scenes misconduct; it seems that, if there is a problem in this paper, the allegation is that it should be clear to anyone having the right training who reads the paper.

    2. If you read the WaPo article linked above, he was “He was demoted in 2011 from research associate to an entry-level position.” That’s not spending 10 years as a postdoc, that’s spending 9 years as a research associate before the politics started. I’m not as familiar with the med school culture, but I’ve known people to make a career of being a research associate in engineering and considered it myself; there, at least, it’s a reasonable route for somebody who doesn’t expect to succeed on the tenure track but wants to stay in academia.

      1. The poster example (all right, a great exception) for the perpetual student as unappreciated genius is of course Francis Crick, the co-discoverer of the genetic code.

      2. Another correction: he is 1st author on an NAR paper which is a rather good journal. Also, “whether Yuan should have asked Boeke if he wanted a byline on a paper” is unfortunately ambiguous. It may mean what Warren said (that Yuan submitted a paper himself), that Yuan was a secondary author on another submission w/out involving Boeke, or that Yuan thought he should’ve been an author on a paper from the lab but was left out.

        Also agree that long postdocs turning into quasi-permanent research gigs is quite common.

        Building on my earlier comments, I think research misconduct of this nature would be best addressed in two ways:
        1. Require deposit of raw data and description of analysis sufficient to reproduce statistical claims in paper. This is already being done to some extent, but unfortunately commercial trends are going in the opposite direction with companies selling devices coupled with analysis software; researchers never see raw data and have no familiarity with how it is processed.
        2. Much more difficult: require experimental designs to be submitted to journals (or, better, a centralized repository) before labs conduct experiments. These can/should be kept confidential, but will be opened to reviewers to check whether labs are reporting statistics properly. This need not be mandatory; reviewers and journal editors could just use it as a tool and be much more picky about asking for independent experimental replicates otherwise.

        1. 1) We disagree about how good NAR is, and methods papers (as this one is) are generally not high-profile. Also, it’s from 2005, meaning there’s another half-dozen years without any first-author publication.
          2) Yuan has a single-author paper in PLoS One, which I assume to be the paper from which he excluded Boeke (he doesn’t seem to appear on any contemporaneous multiple-author papers without Boeke). It was clearly work done in Boeke’s lab, and while I agree that the Washington Post text was a bit ambiguous (“[a] disagreement over whether Yuan should have asked Boeke if he wanted a byline on a paper”), I think the far more likely interpretation is that Yuan should have asked Boeke if he wanted a byline – otherwise the reporter should have written “Yuan should have asked Boeke for a byline on a paper”.

      3. That’s not spending 10 years as a postdoc, that’s spending 9 years as a research associate before the politics started

        There’s an awful lot of ambiguity in job titles. Given his history (he published a lot of papers in graduate school), it is likely Yuan was hired as a Postdoc, rather than as a Research Scientist (positions of the latter sort are far less common in academic science, and are treated very differently, socially and career-wise if perhaps not administratively). The difference has to do with career path: a “Ppostdoc” is expected to progress in their career within a relatively few years, ideally to a faculty position (somewhere), while a “Research Scientist” expects to work at a high but still subordinate level within the labs of other PIs for the rest of their career. Despite occasional attempts by the NIH to make “Research Scientist” a more viable career (there being many fewer faculty jobs than aspirants), “Research Scientist” positions are very rare. “Research Associate” is a job title consistent with either position, and it is likely the job title has to do with how Yuan’s job was funded more than anything else. A “demotion” likely also has to do with a change in how his position was being funded, rather than being an explicit judgment about his merits. Changes in the source of his funding (and consequent changes in his title) would be extremely normal as his PI tired of supporting him in his position (after nine years!), and also as a mechanism to move him out the door, by shifting him onto a funding source that was more clearly time-limited.

      4. I don’t know about medicine myself, but “career research associate” (modulo the idiosyncrasies of theBritish system) is exactly what I’ve been doing for the past years (as a computer scientist). Admittedly, in my case the reason was that it was much easier to reconcile with being a mom, but even aside from that there are distinct benefits: for example, while I work only halftime and my husband is a full time professor, he’s not doing much more actual research (especially the hands-on variety) himself than I do; too much of his time is tied up in teaching [1] and doing administrative stuff. And I know of colleagues who have essentially been doing the same full-time (both in the US and the UK).

        The downside is of course lower pay and lack of tenure; but there are countervailing benefits that can make a career as a research associate very attractive, especially now that we have a Ph.D. glut and a comparative lack of tenure track positions. Universities are generally attractive employers (especially if you have family) and many positions come with a considerably higher degree of autonomy than comparable positions in the industry.

        [1] I actually enjoy teaching myself, but there are plenty of researchers who don’t.

    3. As a person somewhat familiar with both the people and the work involved, I can say that Warren Terra’s assessment of the situation is pretty much spot-on. Sadly, Yuan’s view of his time and treatment at Hopkins is very distorted and his motives in this story are not pure, no matter how much he may have deluded himself (and the WaPo reporter) that they are.

  5. Very helpful, Warren. Still, when a paper’s methods are challenged and the first author’s response is to commit suicide, it seems to me the balance of probabilities shifts rather dramatically.

    1. As I’ve said elsewhere, I’d not be surprised if (maliciously or naively) Lin was insufficiently stringent in his data analysis and use of statistics (conversely, Yuan is excessively concerned that any given candidate to constitute a positive interaction might be wrong; that’s just how these genome-scale statistical analyses go). But I don’t think we can see into the soul of a despairing and now deceased individual, certainly not to the degree that we can know whether the intent to defraud was ever there, possibly not even to now whether he in fact did anything wrong. Nor do we know what else was going on in his life. We know he was in great emotional pain, but we can’t assume anything else – and note that it doesn’t appear Yuan accused him of actually altering data.

  6. Yuan has a Hopkins M.D. and apparently board certification in pediatrics. So, he should not have to drive a cab. However, whistleblowing largely forecloses a subsequent research career, even if eventually vindicated. A Ph.D would have fewer options; hence more pressure to keep quiet. An ombudsman wouldn’t change these incentives.

    1. I don’t know whether unambiguously justified whistleblowing forecloses a subsequent research career; in one of the cases I know of, the offender was a professor turned in by their own postdocs when those postdocs realized they couldn’t account for preliminary data the professor was claiming to have in a grant application. You see, their subsequent careers depending on their publishing papers that would be believed and on their having mentors whose recommendation would matter, and they felt the need to speak out against their boss as expeditiously as possible, as sticking with a fraudster boss would preclude both of those conditions being met. They were respected within the institution for their stance, and there was an effort across the institution to help find them new labs to work in, their current one having imploded.

      On the other hand, as Omar says, if you come against the king, you best not miss. Any denunciation will consume significant time and energy, and if you make trouble for the person on whose recommendation your later career depends without being clearly vindicated by later events, you likely are screwed. Everyone in science has merely muttered and gossiped about instances of questionable conclusions and overwrought claims, because the costs, risks, and rewards don’t balance to to justify doing more. But mutterings and gossip matter in a peer-driven social enterprise such as science (as with many others), and reasonable suspicions that data had actually been faked or suppressed (rather than overenthusiastically misinterpreted) might well lead to a much stronger response.

      In that vein, it’s important to note that (at least as reported in the Post) Yuan didnt accuse Lin or Boeke of either concocting or hiding data, but rather of incorrectly analyzing their data so as to draw connections and inferences that Yuan felt were not sufficiently statistically significant (I haven’t checked whether the raw data are available online, but they will be available on request, as a standard condition of publication). This becomes much more of a grey area. In theory, if the peer reviewers selected by Nature were conscientious and appropriately qualified, they will have properly considered the questions Yuan raises prior to publication. There is no reason Nature shouldn’t consult them now about exactly this question, which it can do without disrupting their anonymity (whether peer review should necessarily be anonymous is anther question).

      1. Warren:
        Raw data online? How does that work? Many it the time I wish I had the raw data, stripped of identifiers, so that I could check out some things not reported in the published report. Do authors willingly give up their raw data if they have plans to use the same data set for future publications? Do very many journals in fact require the availability of raw data as a condition of publication?

        Generally, the best you can get is an occasional online data supplement to an article, containing different tables and figures that summarize study data; I cannot think of when I have seen actual data files downloadable for journal readers to torture and force to confess.

        If raw data is not too difficult to come by, that is a great thing.

        1. It depends on the sort of data, of course, but most genome-scale datasets are basically large digital files of signal intensities at each point in a very large array (for microarrays) or of millions of individual short sequence reads (for high-throughput sequencing). The Lin et al paper is the former case (it includes a collection of microarray data), and their raw data were deposited with the NCBI here. It hasn’t always been the case that these large digital datasets were publicly deposited as a matter of course, but we’re getting better; the fact that they are large, discrete digital files helps enormously. Providing access to all of the gels, blots, photomicrographs, notebook scribbles, etcetera would be extremely difficult even over the course of an extended one-on-one interaction (difficult both to compile them and to exclude items not germane to the project at hand, items not made liable for sharing by the publication of the project), and so far as I know there are no efforts to publicly deposit the total accumulated data from any projects (supposedly representative images or tabular data for the most important experiments are in the paper or in the online supplemental data).

          As to using the raw data for future publications: sure, you can do this, but so can anyone else. You have the advantage of knowing it’s there, knowing its significance, and knowing the field – but the simple effort of having carefully collected the raw data contributes significantly to the perceived importance of that first paper. You don’t get that same credit the second time you describe the dataset, and you can’t get a second paper simply for re-analyzing the same data unless you come up with interesting, ideally testable insights from that re-analysis. This does happen – but it also happens that outsiders using publicly disclosed datasets sometimes have similar successes. Mind you, those outsiders publishing a re-analysis of publicly available data would have a similarly high bar to clear because their proposed publication also wouldn’t involve the contribution of a new source dataset. In either case of re-analysis, the quality of the new insight and the extent of additional experiments to test the new insight and to extend the original work would matter greatly.

          Mind you, the public release of a dataset when you publish conclusions from analyzing it means that there’s an incentive to suck the dataset as dry as possible before publishing anything that would require its disclosure. Still, it’s likely that the samples in the datasets were chosen so as to be highly relevant to the researchers’ own particular field of inquiry, and less so to most possible competitors.

Comments are closed.