Another Demonstration that Smaller is Better. And Worse Too.

Small countries, like small schools, and better and worse

No one can master the substance of every single public policy area, but it is possible to learn analytic rules that can be applied to almost any issue. I will be forever grateful to Wainer and Zwerling for teaching me one such rule, which jumped into my mind when I read Christopher Caldwell’s analysis of the economics of small countries.

Caldwell’s case for smaller countries having economic advantages rests on his noting — accurately — that small countries are well-represented at the top of the list of per capita GDP. He cites Qatar, Norway, Brunei, Singapore and Luxemborg as examples of top-ranking countries which have small populations.

Enter Wainer and Zwerling, who documented how many educational reformers came to believe that small schools are best because they often appeared at the top of rankings (e.g., on reading scores). But what these reformers forgot to do was look at the bottom of the same lists, where small schools were also over-represented. The reason is not substantive but statistical: Small samples are more prone to extreme scores.

Caldwell triggered my “Wainer and Zwerling reflex”, which led me to investigate further. Guess what? Many of the countries at the bottom of the list of GDP per capita also have small populations: Central African Republic, Burundi, Liberia and Togo for example.

I don’t intend this observation to invalidate Caldwell’s whole article, which has some good points to make. Consider it more an endorsement of an analytic tactic for critical reading: When someone tells you that the small are clustered at the top of some ranking list, check out the bottom of the list before being convinced that something substantive is at work.

A confession to the Reverend Bayes and a modest proposal

In which a blogger sees the Bayesian light.

A footnote (154) in Daniel Kahneman’s great book Thinking, Fast and Slow had me puzzled.

Consider a problem of diagnosis. Your friend has tested positive for a serious disease. The disease is rare : only 1 in 600 of the cases sent for testing actually has the disease. The test is fairly accurate. Its likelihood ratio is 25:1, which means that the probability that the person who has the disease will test positive is 25 times higher than the probability of a false positive. Testing positive is frightening news, but the odds that your friend has the disease …

(Before clicking to the jump for the answer, make your own estimate.) Continue reading “A confession to the Reverend Bayes and a modest proposal”

A Primer on Outlier Polls

Sometimes a single poll diverges from the pack of results generated by everyone else. How can you tell when the pollster is doing a better job of picking up a new trend versus simply being wrong?

Peter Kellner offers an educative account of how these events occur, using as an example a poll that shows two political parties in a neck and neck race while everyone poll has one party ahead. It’s a UK example, but that doesn’t matter, the value of the essay is its discussion of how to weigh poll respondents who say they have no preference, how small samples affect conclusions and the like.

The article is worth reading both for its clarity and its modesty. Kellner’s own polling firm disagrees with the outlier poll, but he remains balanced and gentlemanly throughout his critique.

Don’t Get Snookered by Statistics with Arbitrary Ranges

The time frame for statistics are often edited to mislead the listener

I was watching a baseball game with a mathematician friend, during which the announcer said of a batter “He’s been on a hot streak, with 6 hits in his last 19 at bats”.

My friend said “Which means he has 6 hits in his last 20 or more at bats”.

Of course this was true, because if the batter’s hot streak went back farther than his last 19 at bats, the announcer would have extended the range of his statistic accordingly. The announcer was trying to make a point, and a 7-for-20 or 8-for-21 run of hitting sounds hotter than a mere 6-for-19.

Pundits often use this trick to make ominous sounding prouncements that are in fact content-light, for example when forecasting who will or will not get elected president. Hillary Clinton will never be in the Oval Office, by the way, because no candidate whose first name starts with an H has been elected President in the past 64.5 years.

Arbitrary ranges can also be used to make the case for a historical trend that may not actually be substantive. At the end of the 1980s, some Republicans crowed that Democrats had lost “5 of the past 6” presidential elections. Today, some Democrats brag that Republicans have failed to win the popular vote in precisely the same arbitrary timespan.

Politicians also snooker voters with arbitrary ranges. If Mayor Jones says that crime is down for 5 straight months, s/he may well be covering over the fact that crime was up in the months before that. Be particularly sceptical when a politician’s long-ago launched pet initiative is said to have “not really gotten off the ground” until the precise moment that the referenced time range of an upbeat statistic begins.

Obviously, range-based statistics have to have some starting point, and that’s fine as long as they are meaningful: A new person comes into office, a new law is passed, the fiscal year’s budget comes into force, a war begins or ends etc. If there isn’t a qualitative demarcation point like that at the start of a referenced time range, the person quoting the statistic is probably either fooling himself or trying to fool you.

Chuchito Valdés and regression towards the mean

Chuchito Valdés is the grandson of the late Bebo and son of Chucho, Cuban jazz royalty.  The short version of my rule for music is that African + Iberian = Proof of God’s Infinite Love, so I listen to a fair amount of this stuff.

Tonight we caught Chuchito at Yoshi’s. I was playing hooky from a talk by Theda Skocpol at our annual Wildavsky Forum, and if all my colleagues had done as I did, it would  have been a Bad Thing, categorical imperative time, but they didn’t, and I got to hear one of those evenings of music I’ll remember all my life. Unfair, I know, but seriously, up there with The Doors live at the Fillmore East on my short list of times.

Valdes had a minimal band (percussion, timbales, bass and a trumpet player whose name I missed), so every note counted and you could hear them all.  He played boleros and son to break your heart, mambos and rhumbas to wake the dead and restart the universe.  In the middle he did a medley of A Train and Satin Doll for which Edward Kennedy Ellington and William Strayhorn came down from heaven and spoke to us in Spanish.  If the horn guy gets the “last trumpet” gig for the apocalypse I want to be in the front row and will die happy, especially because at that time I will also see the wretched Florida Cuban reactionaries and their Republican toadies going to hell for keeping us from this glorious music for decades.

About regression towards the mean?  The statistics lesson for tonight is that the principle, normally sound, apparently has exceptions.

[Free extra, not worth a post by itself: the joropo (Venezuelan/Colombian cowboy music) group Cimarrón has an 2011 self-titled album that I just came upon. If you don’t know them, check it out.]

 

 

Waiting in line

Among the quantitative models I enjoy teaching, my favorite is probably queuing theory because of its ratio of practical-benefit-plus-surprising-insights to effort required to ‘get it’ (and use it).  Alex Stone has a nice op-ed on queuing in the NYT today, which begins with an example of thinking outside the box that I always use in class:  briefly, your plane does always arrive at the far end of the finger, not randomly. This is partly because (i) it minimizes aircraft taxiing in and out time and getting stuck in apron traffic (airplanes queue too, and Stone misses this one) and partly (ii) because maximizing your walk to the baggage carousel, past as many distractions as possible, shortens what feels to you like ‘the wait for your bag’.

I am mystified, given the batch of interesting and useful psychology Stone packs into his article, that he fails to provide the invaluable, priceless, life-altering secret with which you can never again wait in line, for the rest of your life.  I have up to now only shared this secret, imparted to me by the late Edith Stokey along with a lot of other wisdom, with my students, but you can learn it after the jump: Continue reading “Waiting in line”

Sampling footnotes

Harold’s nice post about sampling populations who spend varying spells in the state you observe reminds me of some other contexts in which the same principle sheds some light. But first, a puzzle that you can skip if you know “the one about the guy with the two girlfriends and the subway” :

One of my students was greatly in love, simultaneously, with a  girl who lived in Orinda and another who lived in the Mission in San Francisco.  He was pretty sure he wanted to marry one of them, but not sure which, so he decided to roll the dice of life.  Learning that inbound and outbound BART trains strictly alternated, first a westbound, then an eastbound, at the Rockridge station near his house (where trains in both directions stop on opposite sides of the same platform) during the relevant times of day, he carefully randomized his departure times from home, took his bike to BART, and got on the next train whichever way it was going, visiting the girl it took him to.  After a few weeks of this, he was astonished to find he had had five times more dates with the girl in the city and was engaged to her. Continue reading “Sampling footnotes”