Imagine that I told you with great excitement that I had flipped a coin “repeatedly” and it came up heads 100% of the time. You’d probably be surprised and wonder what could explain such an unlikely event: Was the coin two-headed? Was it of unbalanced weight? Did I had some weird method of flipping a coin such that it only came up heads?

But then imagine I told you that by “repeatedly” I mean that I flipped it twice. It came up heads both times so that’s 100% heads. You’d be exasperated that I was excited about something so trivial because clearly such a result is a million miles from rare: 100% of only two coin flips coming up heads is at best yawn-inducing.

Hold that example in your head and then consider the fact that whenever schools are ranked on student performance, small schools are always over-represented at the very top of the list. If you are a proponent of small schools, you might explain this as due to the fact that such schools are more home-like, that the teachers give students more individual attention, and that the kids have a sense of community. If you’re an opponent of small schools, you might argue that small schools do well because they tend to be exclusive places that screen out kids with disabilities and expel kids who pose behavior problems.

But if you knew the brilliant work of Wainer and Zwerling, you would recognize that the over-representation of small schools at the high end of performance is exactly as impressive as flipping a coin twice and having it come up heads 100% of the time, i.e., not at all.

The link between the two examples is small sample size. Small samples are more prone to extreme scores. Even though a fair coin will come up heads 50% of the time, having it come up heads on 100% of only two flips is common. In contrast, flipping a coin 20 times and having it come up 100% heads would be shocking. The larger the sample of coin flips gets, the closer the result is to the boring old true score of 50% heads.

Small schools by definition have fewer students than big schools. That means they will be more prone to extreme scores on any measurement, whether it’s academic performance, shoe size or degree of enjoyment of pistachio ice cream. The average test scores of 50-kid schools will be more likely to be very high than are the average scores of 500-kid schools *even if the students in the two types of schools are perfectly identical in terms of ability*.

Not incidentally, the tendency toward extreme scores in small samples is bidirectional. That’s why small schools are also over-represented at the very bottom of the list when schools are ranked on performance measures. As Wainer and Zwerling point out, The Gates Foundation was one of many charities that invested in small schools after looking only at the top of achievement lists. If they’d looked at both end of the distribution and seen all the small schools with extremely poor scores, they would have known that there was nothing more complex at work than small samples being prone to extreme scores.