Deming, thou should’st be living at this hour

Sharon Otterman reviews the failing, flailing efforts of New York City to improve teaching the easy way.  The story, and the management it describes, is all about trying to get quality by incentives and personnel, in particular, sorting teachers into good, bad, and indifferent bins by observing student test score changes over a year.  Presumably, those in the good bin get more pay, or something, and we fire the bad ones. The implicit model is that teaching is an irremediable trait among bad teachers, and purely a matter of incentives and motivation among all the others, and a positional arms race is just the ticket to make teachers really want to do a good job, right?

There’s so much wrong with this it’s hard to know where to start, even though test score increases is among the useful information that should be collected and analyzed, mostly to direct attention to teacher practice that seems to work and would reward analysis and discussion. But attributing outcomes to the teacher is nuts, as Deming demonstrated years ago.  First, the instrument is very noisy; the article begins with some anecdotes of teachers omitted from the system, teachers scored in years when they didn’t teach, and teachers given the wrong scores.  Second, stuff happens; every teacher knows that a class takes on a personality early in the year on the basis of unobserved or random events, or just the luck of the draw in student selection, that seems to be refractory to what a teacher does.  Some years have lots of snow days, some years there’s a shooting in the school, and on and on.  So the teacher effect is going to pick up correlations, spurious and real, with all the variables that are not observed.

Finally other stuff happens: Deming – brilliant, tough-minded, and humane -  demonstrated that if you reward individual workers for performance, you are going to be rewarding random variation a lot of the time, with poisonous effects.  Right away, when the top salesman among twenty gets a trip to Hawaii with his wife, the response of the other nineteen is not to emulate him (and how could they, if they don’t see what he does, which is the case for teachers in spades), but to be pissed off and jealous, which is, like, really great for collaborative enterprise.  Next year, regression toward the mean sets in and he is only number five, or ten, so he looks like a slacker, coasting on his laurels. Even his wife starts giving him the fisheye; don’t be surprised if his lunch martini count starts to go up.

It is a universal, desperate, desire of lazy or badly trained managers to find a mechanistic device you can wind up like a clockwork, loose upon the organization, and go play golf. Like testing and firing to get people to do good work.  Please, Lord, show me the way to manage without any actual heavy lifting!  But many desires, no matter how desperately we cleave to them, are not fated to be fulfilled, and this is one.  Teaching, like any complex production process, will get better when teachers watch each other work and talk about what they are doing, why, and how it works; what to watch is usefully indicated by statistical QA methods.  Period.  It was true in Smith’s pin factory, it was true in the opera staging master class at the New England Conservatory I sat in on back in the day, it is true in Toyota and, finally, GM factories, and it’s true in schools.  Deming was right, but as long as we’re afraid to admit it (one of his 14 principles is, ironically, “Drive out fear”) we will continue to leave value on the table, absurdly and tragically.

Author: Michael O'Hare

Professor of Public Policy at the Goldman School of Public Policy, University of California, Berkeley

25 thoughts on “Deming, thou should’st be living at this hour”

  1. One point – "the luck of the draw in student selection" is already a political process in schools of today, without test scores being tied to pay or tenure. The number of 'inclusion students' is strictly controlled, and adding fear of firing for taking on a difficult child would ruin the classroom.

  2. Another of Deming's principles is #3: Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.

    We currently rely on high-stakes testing (an inspection procedure, because the tests are not intended to further learning in any way) to evaluate students. And we're moving this into evaluating teachers. Unfortunately, the measurement is specious. I am suspicious of its utility for assessing students, and I'm sure it's of almost no use for evaluating teachers.

    The best single predictor of aggregate student scores (classroom, school or district — doesn't much matter where in the hierarchy you're looking) is the fraction of students eligible for free/reduced price lunch. The (partial) slope coefficient is negative: more students eligible for food aid means lower mean test scores. This is a proxy variable, and it's standing in for socioeconomic status. So the most important component of variation in these test scores is the student's economic class. There are easier ways to measure that than testing the student.

    We need more teachers, and we need better teachers. Once upon a time we had a large supply of able people who were limited (mostly) to being teachers, secretaries or nurses. Freeing women from those ghettos is simply fair. We haven't figured out how to replace the skilled teachers we lost when other opportunities opened up for them.

    Teaching is the art of larceny, finding good ideas and making them yours. One way to encourage this sort of beneficial larceny is to encourage communication among teachers. Another is to figure out how to reduce our reliance on high-stakes testing. As long as politicians insist on so-called objective measures of achievement we're apparently stuck with some forms of testing.

  3. There was something in the Otterman article that I could not understand; maybe someone can explain. The article say that the “average confidence interval” was 34 points for the ranking of math teachers over three years. (The following sentence says that this is only 95% certain, so I am assuming this means a 95% confidence interval.) This suggests that 17 points equals two standard errors. What is the sample size? If 17 points equals two standard errors, then 1 standard error is about 8.5 points. If (for a scale of 0-100) the maximum possible standard deviation is 50 points, then, with a standard error being the standard deviation divided by the square root of N, the maximum possible sample size would be about 34 teachers.

    Confidence intervals that wide suggest a very small sample size, is my point of confusion. What did I miss?

  4. I have always wondered whether we might not reduce citizen complaints about police officers if we made a big bonus pool every year, to be split equally, that went down with the number of justified complaints. I thought it might work because it would give officers a non-namby-pamby-seeming reason to speak up on the occasion that a colleague went awry. Telling someone they're costing you money might make the situation feel less personal. I'm not sure why I thought of this now. I guess because I do think the unspoken factor of all this "teacher improvement" talk is the fact that many of them are still women. And groups of women do not behave the way economists think they should, IMHO. And anyone who becomes a teacher to begin with, male or female, probably fits economist expectations less than average too. Another way of saying, I think a lot of these "reformers" are simply high. Maybe we should measure two things: how well the child reads and writes, and whether or not they like to learn. (Or think that they do/don't.) I might even let the math scores slide, in the short term, if the child can explain clearly what they do and do not understand about the problem they're getting wrong. And the point of all this is supposed to be to build citizens, anyway. I guess the reformers aren't so high after all, since they all seem well right of center.

  5. @ Ed

    This is a complex hierarchical structure, so the Statistics I formulas may not apply (depending on what parameter the interval is being placed on). Additionally, I question the appropriateness of a confidence interval in this context. I can't discern from the article what the sampling (or randomization) structure is. Without one or the other, computing a confidence interval amounts to arithmetic masturbation.

  6. Diane Ravitch (Death and Life of Great American School System) slices and dices this stuff into wisdom. For every measure pointing to A, there's another pointing to -(A). The numbers speak loudly in New York – a loud dial tone. Teaching to the test – all reading and math, and still nothing to show for it. NYC has invested billions more under Bloomberg for schools with performance that vanished once a non-Mayoral controlled (academic, testing organization, consulting firm) actually counted. Meanwhile, language, arts, geography disappear. The damage done, an ex-Chancellor now boasts oddly that he shuttered a hundred schools, and opened 400 new ones! Gates, Broad other foundations have stumbled badly. And now, ladies and gentleman, Cathie Black, Schools Chancellor extraordinaire – at least the billionaire boys club, as Ravitch calls it, admits women. That's a step forward, I guess.

  7. The nonsense about children's education has been around for a long time.

    There is at least one state or country that is head and shoulders above all of the other US states in educating their young. Find the highest performing state or country and copy it. No BS about "we never did that before." None of the nonsense about "we have special circumstances." Copy the best. Fire the rest.

    This is not rocket science.

  8. "Another of Deming’s principles is #3: Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place."

    I'm all in favor of that, but we're not quite up to genetically engineering children to be better students yet.

  9. Thanks, Dennis; I figured that the analysis was probably pretty complex, but still the uncertainty in the estimates seems pretty large, suggesting that they may have been based on small numbers. I didn’t expect the Times to have a Methods section, but the reporting seems a bit vague.

    Brett: Don’t worry, Bokanovsky's Process is very reliable and can be expected to be in general use within a year or two.

  10. Point being that schools don't HAVE control over the incoming "materials"; They can't reject a pallet of students as out of spec, or return this year's fourth graders because parental contributions aren't capable. Schools aren't a manufacturing industry, they can't be run according to Deming's principles. It's rely on inspection, or NOTHING. And nothing isn't acceptable.

    Now, I will grant that, given the inherent variability of the inputs, judging teachers by the performance of their students is fraught. But judge teachers we must. Sometimes you've just got to make the best of a bad situation.

  11. @ Brett

    But some schools (the ones often being held up as models, parochial schools and other, even more selective private academies like Punahou or Andover) do have control over the incoming "materials." They can (and do) actively choose which students they will accept and which they will take a pass on. So don't tell me public schools are "failing" but everything would be hunky-dory if we just ran them like St Judes or Andover, until you make Andover accept a full class of inner-city ESL minority kids. If those kids come out of Andover (or Punahou or Albuquerque's Menaul School) as world-beaters, I'll listen more carefully. Until the experiment is done, I don't want to hear apples compared to pomegranates.

    Some Catholic schools do a great job with lower SES kids (the Christian Brothers of LaSalle come immediately to mind, because a former student of mine is a Brother) others don't (most notably the Jesuits) because they don't even try.

    When the best predictor of a group's composite performance is a proxy for SES, judging teachers by the performance of their students isn't fraught (by the way, fraught with what?), it is just bullshyte. You have to be able to adjust for the "quality" of the "incoming material". In case I haven't mentioned it, I'm a card-carrying Statistician. People would like you tell you that they can use covariance methods to "adjust" for those differences. The thing they either don't know (or ignore) is that the adjustment process is likely to increase the uncertainty in the estimate under these conditions. So you take something that's already pretty uncertain, and increase the uncertainty. That sounds like a great idea for deciding who to keep and who to fire.

    When I cited Deming's third principle, what I was suggesting was that we significantly reduce the amount of testing we do and reduce the reliance on test results as the end-all and be-all of assessment. If you want to know how good a job a primary school is doing, determine how well their "product" performs in secondary schools. If you want to know about secondary schools, find out how good a job their "product" does in post-secondary education and in the workforce, and so forth. Difficult to do? Absolutely. But it's the only really meaningful measure of success. The program I teach uses this sort of "assessment." It's very helpful in modifying our curriculum, but it's only possible because this is a small program and we track our students carefully enough that we can contact their employers (or doctoral program for those who go elsewhere after a Masters) to do this.

    There's no reason schools can't do the same sort of thing (on a sampling basis rather than doing a Census), except that we'd rather pay through the nose for private interests to waste school time with meaningless tests.

  12. Dennis, have you seen this site – ? It demonstrates your point in dramatic fashion.

    Just about every model for school reform relies on improved teacher performance. That's a fine goal in and of itself, but it will only ever have marginal effects on the achievement gap. Looking at the maps link I posted above, it's obvious that "good teaching" isn't magically happening in all the affluent areas of town.

    If we're really serious, we'll move towards a means testing model of education provision. Such that the poorest neighborhoods will get spending that's 3-4x that of the most affluent. Parent outreach will start at birth, and nurses will make regular home visits to monitor child welfare. Networks of social services will be coordinated so that parenting, health, cognitive and behavioral growth will be supported with managed scaffolding.

    I know all of this is a libertarian nightmare, but the bottom line is that the achievement-gap is being almost entirely driven by lack of human and social capital, something neither the market nor existing social and cultural norms is capable of addressing. I think if one is truly concerned with liberty, they'll see that our current "hands-off" approach is a dismal failure: we wait until the child is school aged and then offer a small, uncoordinated smattering of services to supplement the classroom teacher with a 30/1 ratio who we expect to somehow make up for vastly lower levels of human and social capital. So kids have high absentee rates, experience incredible stress at home, lack proper cognitive and emotional stimulus, poor nutrition, health problems, etc., etc.

    To me, this is what real reform would look like. The current liberal/conservative consensus model is garbage.

  13. What if we raised teacher salaries and working conditions to the point where lots of highly qualified, smart, effective, motivated people entered and stayed in the profession, and the lousier candidates couldn't get into the field? Wouldn't that take care of the bad-teacher problem?

    Am I missing something? No, really? I claim no expertise whatsoever. If anyone knows what I'm missing, I want to hear about it.

  14. I'd say that what you're missing is that raising pay, and improving working conditions, attracts poorly qualified, stupid, ineffective, lazy people, just as much as it does highly qualified, smart, effective, motivated people. And there's absolutely nothing about doing that which would somehow keep lousier candidates out of the field. So it's no substitute at all for actually finding some way to judge the merits of individual teachers.

  15. OK, that premise (what I missed) makes sense enough.

    Next step — I would like to see how it tests. Is there any school system where the teacher pay and conditions are quite good, that can be compared to a system where the pay and conditions are quite bad? And other things equal enough to make a comparison? Then, how do they compare? Brett or anyone, care to provide a possible example?

  16. Betsy, if you see my previous post I obviously disagree with your premise that bad teaching is central to the problem.

    But let me expand it to address your questions. Many "good" teachers are only good because they teach at relatively nice, affluent schools where kids are prepared and supported at home. At poor, disadvantaged schools, they would fail miserably. On the other hand, many "good" teachers at poor schools might fail at more affluent schools. Although my guess is that there are fewer such teachers – by far the turnover rate flows away from poor schools, not toward it.

    The reason for this is that different school environments require different skill sets. Most teachers are simply from a different world than poor students, as few poor students are on a college-trajectory. It also just takes a certain type of individual to withstand the conditions at poor schools, where the constant stress of behavioral/emotional issues, disinterest, absenteeism, etc. are never-ending.

    I'll just point again to my previous link to demographic/academic mapping as evidence that SES has more to do with "good" teaching than anything else.

  17. "In April, 1999, the Wall Street financiers at Merrill Lynch published a 193 page “In-depth Report” titled “The Book of Knowledge, Investing in the Growing Education and Training Industry.” Early in the report they noted: “The K-12 market is the largest segment of the education industry with approximately $360 billion spent annually or over $6,500 per year per child. Despite the size, the K-12 market is the most problematic to invest in today. Entrenched bureaucracies and personal and political interests contribute to the challenges facing this sector.”

    Public schools HAVE to fail in order to crack open this egg and give these financiers access to the $360 billion they are after (estimates are that it is around $700 billion today). No matter what logic you use to explain the problems or successes of public education, it will be of no avail: public schools HAVE to fail. Whatever it takes."

    (from… sorry the link didn't take properly.)

    Yes, there are teachers who shouldn't be teaching, just like there are doctors who shouldn't be doctoring, lawyers who shouldn't be lawyering, etc. Every field has its incompetents, and we'd all be better off if there weren't. But this panic about incompetent teachers is framing/propoganda put out by the other side. It's essentially not much different than the panic about Social Security. You haven't fallen for that, have you?

    Think the quote above is overblown? Who would have thought colleges could be one of the most profitable sectors? Well, as a group, Kaplan, Phoenix, etc. are, and have been for years now. Just think how much the profit from K-12 schools could dwarf that. And while you're thinking about that, you'll only be a dozen years behind the authors of the Merrill Lynch report.

    And everything Eli said. But there's no private profit in ending poverty.

  18. @ Eli

    I wasn't familiar with that particular site, but I'm happy to have the link. I've done some work with colleagues in our College of Education on studies peripheral to this debate. You can't analyze any of the data without adjusting for SES. If you try to, all you find are SES effects and everything else is in the weeds.

    @ Ohio Mom

    Yeah, Kaplan and U of Phonies are highly profitable. And even though they are among the more reputable of the for-profit higher ed sector, they're still (at best) marginally reputable. U of Phonies had to invent its own accrediting agency: none of the existing accrediting bodies would touch them. Some things are too important or too morally dangerous to allow investor profits to enter the equation. Among these things are education, health care and incarceration. And yeah, Brett, I think we'd be better off without private publishers selling textbooks at grossly inflated prices.

    Textbooks are a racket worthy of their own set of postings.

  19. "Textbooks are a racket worthy of their own set of postings."

    Won't get any argument from me on that. Or from anybody who's had to buy the new edition instead of picking up a 2nd hand copy of the previous, because of changes made just so that the page numbering wouldn't match.

    I will agree that bad teachers are not the *primary* problem with the education system. They could become a big enough problem, though, if we really did abandon efforts to determine which teachers were bad.

    The primary problem with the education system is, as it must be, with the parents. With sufficiently good parenting, you don't NEED an education system; The top educational performances continue to be on the part of home schooled students, year after year. Conversely, with bad enough parenting, no teacher can successfully teach a student who either won't be present, or won't value learning.

    Teachers, the whole k-12 education system, are there to assist the parents, not substitute for them, and certainly not to counter them.

  20. Paying teachers better would certainly help. More important, IMO, is training them like professionals, rather than as glorified baby-sitters. Consider a first-year engineer or lawyer or any other professional: they'll be doing minor things, closely supervised by a senior engineer/lawyer, getting regular feedback on their work, and have regular opportunities to observe senior workers. As years go by, they get more autonomy, but still get regular oversight and are almost always working in some kind of team, eventually becoming the team leader and trainer of new engineers/lawyers. Evaluation and raises are almost never based on a single statistic, but on the judgements of supervisors (and team members) who actually see the employees work.

    Constrast with teaching: nearly universally a first-year teacher has exactly the same responsibilities, oversight and on-the-job-training as a 30 year veteran. Would you willingly travel over a bridge built by an engineering firm that ran itself that way? Or trust your wealth and freedom to a law firm ran that way?

