–Using high-stakes tests to reward and punish schools and their staffs encourages cheating.
–The relatively cheap (on a per-student basis) tests that have to be used if testing is to be done on a census, rather than a sample, mean that the tests measure only a subset of what we want the students to know and to be able to do, which is likely to distort curricular decisions.
–Even accepting what the tests test for as a valid reflection of educational performance, sheer measurement error makes it hard to distinguish signal from noise in year-to-year variations. (Doing sample rather than census testing may improve validity, but it increases sampling error.)
–And now the largest study of the actual results of high-stakes testing programs suggest that they boost scores on the tests used, but actually reduce performance on nearly every externally validated measure. [Study by Audrey Amrein et al. at Arizona State. Report in today’s New York Times.]
The critiques of the study by the proponents of testing, including Chester Finn, are pretty pitiful; that suggests that, despite its funding by a coalition of teachers’ unions, the study must be methodologically sound. (Finn is reduced to suggesting that the other educational “reforms” in the state-level packages that included high-stakes testing must be at fault.)
So what we have here is a policy that won’t work in theory and fails in practice. Why, exactly, are we supposed to be for it?
These results put the proponents of high-stakes testing in what ought to be an inescapable rhetorical box: a dilemma in the proper sense of that term. If trying something out, measuring its results, and acting accordingly is the right thing to do, then having tried out high-stakes testing, measured its results, and found them to be bad, we ought to dump it, or at least fundamentally redesign it. If trying, measuring, and responding isn’t the right thing to do, then what’s the argument for high-stakes testing in the first place?
I have a very strong prejudice for managing by the numbers, especially in an area such as education where the non-quantitative theorizing is so wooly and our knowledge of the underlying processes so inadequate. (How to produce high-quality research in a field where the relevant university units engage mostly in training for a poorly paid, low-status profession is a different problem.) So I’d be inclined to strengthen the testing regime by broadening the base of knowledge and skill tested for and by making aggressive use of sampling, rather than just dumping the whole thing and letting the education establishment vapor on about how every child is different, every teacher is a skilled professional, and therefore nothing can be measured.
But there is now no case whatever for continuing to combine high stakes with low measurement quality. Been there, done that, got the T-shirt. Stinks. Next case, please.