In today’s WSJ, Bill Gates writes about the power of data collection and analysis. Â He writes; “In the past year, I have been struck by how important measurement is to improving the human condition. You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal …”
I have been told that before Gates dropped out of Harvard that he audited first year Ph.D. econ classes there. Â If he takes his goal seriously, he will need to grapple with some frontier research questions. Â All micro econ students are taught the production function. In the typical boring example, a firm hires labor and capital to produce pizza. Â The professor stands at the board and writes; pizza = f(L,K) and talks about the demand for inputs where f(L,K) is the production function. Â What makes this boring is that the firm knows f(L,K) and inputs are homogeneous. Â All workers (L) and all robots (K) are identical.
But, diversity and uncertainty are the spice of life and of modern economics! Â For a taste of this, read Sherwin Rosen’s AEA Presidential Address. Â What Gates’ article is really about is how from observing some noisy measures of output, and inputs do we infer what the production function f() is? Â Gates doesn’t care about “pizza production”. Â He cares about healthy kid formation and human capital formation but on some level it is the same question. Â Consider the following example:
Suppose that we want to rank doctors with respect to their “value added” of saving patients’ lives. Â The research nerd observes whether a given patient survives, and observes some coarse observables such as age, zip code, ethnicity. Â The researcher also observes which doctor was assigned to the patient. Â Suppose the researcher assumes that doctors are randomly assigned to patients while the truth is that the best doctors are assigned to the sickest patients. Â Note the asymmetry of information here. Â The hospital recognizes the diversity of patient types and doctor types but the naive statistician does not. Â Once the data nerd crunches the data, he will falsely conclude that the best doctors are the worst doctors because on the performance criteria (dead patients), they will have a high share. Â To explicitly address this nasty self selection on unobserved attributes challenge requires an economic model of how doctors and patients differ and how doctors are assigned to patients. Â What is the data generating process? Â Â If this interests you, you should follow the work of Jim Heckman. Â You can see that he is well cited.