Innovative Education: Probability as a reality show

Teaching probability and statistics: a world class.

By Arnold Barnett

Word Play Barrett

When I boarded New York’s 8th Avenue subway at 34th Street, I wasn’t sure what to do. I was going downtown, and the travel time was three minutes shorter on the express than the local. But the express was scheduled to run every 10 minutes, while locals ran every five. Because expresses and locals leave from different platforms, I had to make a choice. But which platform should I choose?

The answer would depend on the probability distributions for the times until the next express and the next local. If the trains are assumed to operate on schedule (and a schedule I didn’t know), then the time until the next express arrival would be uniform on the interval (0,10), while the corresponding time for the local would be uniform on (0,5). (Lacking information to the contrary, I would treat the two random variables as independent.) But given the high level of operational randomness that afflicts the New York subway, it might be more realistic to assume that downtown express trains reach 34th Street under a Poisson process with rate six per hour, while locals do so at rate 12 [1]. Does the answer depend on which distribution is assumed?

In any case, what was my “objective function” for the trip? To minimize average total time in transit? To maximize the chance of choosing the train that arrives downtown first? To maximize the chance that total journey time did not exceed (say) 15 minutes?

I use this example in my introductory course in applied probability, because it raises many important issues in an unobtrusive way. We need a realistic probability model of the uncertainty that attends a problem. Once we’ve chosen a suitable probability distribution, we need to know the details about how to use it. And we need to have a specific objective in mind if we are to work with the model to make a decision. In short, we have to look at the problem the way an operations researcher/management scientist would.

Or consider a very real problem I pose:

The last huge earthquake on California’s southern San Andreas Fault occurred in 1857. Based on historical patterns, geologists believe that the times between huge earthquakes on that fault line are normally distributed with mean 160 years and standard deviation 30 years. Given these circumstances, what is the chance that the next decade will see a huge earthquake on the southern San Andreas?

Conditional Probability

This problem illustrates vividly why conditional probability is such an important concept. Clearly, the fact that it’s been 158 years (1857-2015) since the last huge earthquake is relevant to the chance that another one is imminent. And it’s necessary to combine the normal distribution for X – the time between huge earthquakes – with the knowledge that X ≥ 158. What a splendid advertisement for Bayes’ Theorem, which offers a straightforward way to find the chance that X is between 158 and 168 (i.e., that calamity strikes in the next decade) given that X ≥ 158.

In my experience, the vast majority of students (and all Californians) care greatly about the numerical answer to this problem. And when there is genuine curiosity about a problem, there is a corresponding interest in the method by which the problem is solved. The world offers endless interesting settings where probability can be illuminating, and I think they should be central to the pedagogy in an introductory course. Too many introductory books and courses, in my view, suggest that the most interesting applications of probability concern coins, dice and balls picked from urns. Some such examples have educational value, and I do not avoid them, but we have to avoid any impression that probability is more frivolous than practical.

I also believe that intuitive explanations are essential if students are genuinely to understand probability. Like instructors everywhere, I cover the classic birthday problem, in which calculations reveal that birthday overlap in a small group is far more likely than one might expect. But if that finding is to be more illuminating than unnerving, it is important to conduct a “post-mortem” and make clear how – if viewed from the right perspective – the result is not really surprising. I offer such a post-mortem and, to stress the point that the birthday problem is more than fodder for a party game, I present an exercise based on my experience at a paper mill in Wisconsin. The manufacturers were dismayed at what seemed like an uncanny tendency of small random tears in the paper to cluster, thereby reducing paper strength to an unacceptable degree. The explanation for this pattern arose from a direct analogy to the birthday problem.

Sometimes, I try to introduce a concept informally before turning to rigorous math. Consider the following dialogue between Mendel and Minerva, the two “main characters” in my course:

“I don’t get it,” Mendel complained to Minerva. “In the morning, the time it takes me to get from my home to the freeway varies a lot from day to day. Once I get on the freeway, the time it takes to reach my office downtown also varies a lot. Yet the total time to get from home to work is practically the same every day. How can that be?”

“Well, I get it,” Minerva responded. “Because of my probability course, I understand the concept of correlated random variables.”
The explanation for the apparent paradox is that travel times on the two segments of Mendel’s journey exhibit sharp negative correlation. (I explain to the students why that pattern could be realistic.) Once they grasp the general concept, the students are more motivated to study the distribution of a sum of two random variables.

Same for Statistics

I use the same “reality-based” approach to teaching introductory statistics that I use for probability. For example, I introduce Fisher’s Exact Test by describing a study about manhole covers in Manhattan, undertaken by a team headed by Cynthia Rudin [2]. Using machine-learning methods, the researchers had identified the 1,000 manhole covers (out of 51,213) they considered at the greatest danger of exploding in the near future because of cable ruptures below. Over the next year, 44 Manhattan manhole covers exploded, five of which were in the Rudin team’s “top 1,000.” Does that outcome, I ask the students, offer convincing evidence that the Rudin scheme was better than sheer guesswork at identifying dangerous manholes?

Manhole

Was the machine-learning method better than sheer guesswork at identifying dangerous manholes?

To motivate the Chi-squared test, I turn to a study by Ed Kaplan and co-authors about in vitro fertilization, which appeared in Management Science [3]. I start with the following description (based on an actual conversation):

The couple was concerned that in vitro fertilization would not work for them. Especially because their insurance company wouldn’t cover the procedure, they wanted to know whether they should stop if they didn’t succeed in (say) the first two attempts. They went to speak with a doctor at the hospital where they would undergo the procedure.

“There’s no reason at all to be discouraged with two failures,” the doctor responded. “All couples have the same chance of success, and that chance is the same on every trial.”

Turning to the man, he said, “You’re an engineer, and you know that getting three heads in a row when you toss a coin doesn’t change the chance of tails on the next toss. Same thing here. If you keep at it, sooner or later you’re bound to succeed.”

The engineer was skeptical, and wondered whether early failures of in vitro fertilization really said nothing about the chances of later ones. So he decided to review the research literature about success patterns with the procedure.
I am not among those who believe that “statistics for engineers” is inherently different from “statistics for business.” The underlying principles are the same and, if my experience is any guide, the students do not demand “instant gratification” in the form of examples that relate directly to their major fields. A regression analysis about finance can be of just as much interest to the chemist as to the economist, and students recognize that what is discussed in one context can be generalized to others. Beyond exploding manholes and fertilization techniques, the examples above relate to the broader issue of model validation. Students in all fields should find that issue of great importance.

This being the 21st century, I recognize that students need to know about modern statistical methods that were all but inconceivable only decades ago. For that reason, I discuss at length the ingenious computer-based method of bootstrapping, which assigns margins of error to parameter estimates in settings where traditional theory is of little use. One such example arose in a consulting project in Chicago in which I was engaged. We were asked to estimate what fraction of cars parked at meters at (say) 1 p.m. were in violation of the meter at that time. In theory, one could have chosen a random sample of Chicago’s metered parking spaces, gone to each selected space at 1 p.m., and recorded whether a car was parked there and, if so, whether it was in violation. But that scheme was not feasible: The first space chosen might have been three miles south of the downtown Loop, while the second was nine miles north of the Loop. In deference to reality, we selected a limited set of blocks at random (e.g., Dearborn Street, between Madison and Monroe Streets), and the total number of parked cars and total number in violation were recorded for each. If one estimated the citywide “scofflaw” rate by the ratio “cars in violation/cars at meters” for all the blocks combined, what level of sampling error attends the result? That is a fairly straightforward problem with bootstrapping, but quite an intractable one without it.

And, as often happens, it can be less than obvious how to formulate a statistical problem from its initial description. I illustrate that point with the following passage, which arises from a true story:

The older workers at the high-tech factory were grim and angry. Their employer had laid off five of the 33 workers in their division, and all of them were old. They told the lawyer that the pattern was not a coincidence and that they wanted to file an age-discrimination suit against the company. The lawyer asked them a few questions, including whether they had any non-statistical reason to believe that the company had focused its layoffs on older workers. When they reluctantly said no, he said that he was terribly sorry but they had no case.
More statistical analyses are being performed now than ever before, with the arrival of both “big data” and ever more powerful statistical computer packages. But I am convinced that a large number of these analyses are performed by people who do not even know the mathematical assumptions of the models they are using, let alone whether those assumptions are satisfied in the case at hand. My goal is to give students enough insight into statistics that they do not find themselves applying techniques that they understand only hazily and regard rather warily. The first course in statistics can only do so much, but it can offer a good start in making statistics a congenial and lifelong traveling companion.

Final Remark

My message is simple: At a time when student interest in probability and statistics is not just reviving but outright flourishing, we should exploit that circumstance to expose students to the penetrating perspective of operations research and management science. Northwest Airlines used to advertise that “the world is going our way.” The world is indeed going our way, but it is up to us to harness its richness and fascination so that a great opportunity does not pass us by.

Arnold Barnett (abarnett@mit.edu) is the George Eastman Professor of Management Science and Statistics at MIT. He is the author of the new textbooks, “Applied Probability: Models and Intuition” and “Applied Statistics: Models and Intuition” (2015, Dynamic Ideas Press), from which the examples mentioned above were drawn.

Notes & References

  1. At some New York City bus stops, the transit authority has stopped posting schedules and simply records the average interval between buses.
  2. C. Rudin, R. J. Passonneau, A. Radeva, H. Dutta, S. Ierome, and D. Issac, 2010, “A Process for Predicting Manhole Events in Manhattan,” Journal of Machine Learning, Vol. 80, pp. 1-31.
  3. E. Kaplan, A. Hershlag, A. DeCherney and G. Lavy, 1992, “To Be or Not to Be? That is Conception: Managing in vitro Fertilization Programs,” Management Science, Vol. 38, No. 9, pp.1217-1229.