Did Nate Silver beat the tortoise?

New York Times blogger vs. Real Clear Politics: Which forecaster made the best political predictions?

elecetorial

By Arnold Barnett

Nate Silver uses a statistical model that is subtle, sophisticated and comprehensive. Real Clear Politics uses a shallow approach to forecasting which could have been devised by a statistical Forrest Gump. But which forecaster better predicted the results in the 2012 U.S. presidential election? Did the intellectual tortoise hold its own against the hare?

From a conceptual standpoint, it should have been no contest. In an approach that would make statisticians shudder, Real Clear Politics (RCP) estimated the Obama/Romney difference in a given state by the simple average of differences in recent polls. Differences in sample sizes were ignored, the word “recent” was defined differently in different states, undecided voters were simply excluded, and evidence that some polls skew toward Republicans and others toward Democrats got no weight. The 538 model, in contrast, avoided all these limitations, and took account of correlations among outcomes in similar states and the demographic makeup of each.

From Theory to Practice

But how did the final state-by-state predictions under the two approaches compare in accuracy? RCP only made forecasts in 30 of the 51 states (including the District of Columbia), but these included all swing states and all large states. Table 1 identifies these 30 states and presents Obama’s victory margin over Romney in each one, as projected by 538 and by RCP and as actually arose. It provides the basis of the analysis here.

electorial table 1

Before turning to any actual results, we benefit by creating a succinct summary of the relation between the 538 forecast in a particular state and the forecast by RCP. Simple linear regression on the cross-state data yields the linear approximation:

  • MO538 = 1.00*MORCP + 1.53 (R2 = 0.99; both coefficients highly significant) where MO538 = Projected 538 difference between Obama and Romney vote share MORCP = Projected RCP difference between Obama and Romney vote share

In words, this equation implies that a good way to approximate 538’s estimate of Obama’s victory margin in a given state is simply to add 1.53 percentage points to the corresponding RCP estimate. (Note that the slope estimate for MORCP is literally one to the nearest hundredth.) Nate Silver was therefore forecasting that Obama would systematically outperform his standing in the pre-election state polls that RCP averaged together, and on average by about 1.5 points. But did Silver’s adjustments of the RCP estimates yield better predictions?

This question inexorably raises the issue of how we should compare the accuracy of the two sets of predictions. The most obvious dimension for comparison is the bottom line: Did the forecast in a given state correctly identify the winner there? By that standard, both methods did very well: In 29 of the 30 states, they agreed who the winner would be and that candidate actually won. In Florida, neither forecaster made a correct forecast: RCP erroneously projected a narrow Romney victory (1.5 percentage points), while 538 projected an exact tie (and thus abstained from forecasting). In the event, Obama carried Florida by 0.9 percentage points. We can say, therefore, that 538 scored a partial victory over RCP in one of 30 states, but that is hardly a decisive advantage.

Other informative comparisons concern the absolute forecast errors in individual states. Table 1 reveals that the 538 projection was more accurate in 19 of the 30 states, the RCP prediction was more accurate in 10, and the two methods tied in one state (Washington). Thus, 538 did have an edge. But that edge was not overwhelming for two reasons.

 For one thing, the edge was not statistically significant. One simple standard for equal accuracy in the long run is that, except when the two forecasters make exactly the same forecast, the chance is 50 percent that the RCP forecast is more accurate and a 50 percent chance that 538 would be so. Under an independence assumption, outcomes in the 29 states in which RCP and 538 differ should behave like 29 tosses of a fair coin. If a fair coin is tossed 29 times, the “average split” is 14.5/14.5, but binomial calculations show that the probability is 14 percent that the head/tails split is at least as lopsided as19 to10 (either way). To reject the “equal accuracy” hypothesis under usual standards, the probability would have to fall below 5 percent that the result observed would arise if the hypothesis were true. Because that did not happen here, 538 does not emerge from the comparison as “significantly” superior to RCP.

More importantly, the 30 states in which comparisons were possible are disproportionately blue states: Mitt Romney only carried eight of them. His 27 percent proportion of victories (8/30) in 2012 is considerably lower than his national proportion of 47 percent (24/51). That circumstance is especially relevant because, as Table 2 shows, RCP did far better than 538 in Romney states, while 538 did far better in states that went for Obama.

electorial table 2

Extrapolating the results in Table 2 to the 21 states not part of the initial comparison, we reach the approximation that ΥRCP, the national proportion of states in which RCP would have outperformed 538 might plausibly have followed:

  • YRCP ≈ (24/51)*(.75) + (27/51)(.20) = .46 (In the states that Obama won, we are giving RCP half-credit for its tie with 538 in Washington.)

In other words, had the 30 states in which RCP and 538 been compared been a random sample of states rather than a sample that was unusually pro-Obama, RCP might have prevailed in roughly 30*.46 ≈ 14 of them. A 16-14 edge for 538 is considerably less impressive than 19-10.

What about comparing the sizes of forecast errors in a state rather than simply asking which was larger? As Table 3 shows, the mean absolute forecast error over the 30 states was 2.87 percentage points for RCP and 2.25 for 538. But an adjustment for “blue state bias” – which weights the errors in the states Romney won by .47 and in those that Obama won by .53 – yields a mean absolute error of 2.57 points for RCP and 2.33 for 538. This revised difference of one-quarter of one percentage point is hardly decisive.

electorial table 3

These various comparisons do not demonstrate that RCP suffered greatly relative to 538 because of its theoretical limitations. And while it is easy to describe RCP’s approach as simplistic, its sheer simplicity is one of its virtues. Everyone understands what RCP is doing, and gets to see all the polls that entered its calculation. The situation is quite different at the 538 blog: Nate Silver is not especially secretive, but his 538 methodology is as mysterious to most visitors to his website as the Coca-Cola formula. One could conclude that RCP more than survived an empirical test of its effectiveness: In a direct comparison with the gold standard, RCP turned in a performance that was (forgive the pun) far better than silver.

Hold on, however. There is another way to interpret the 2012 data.

On the Other Hand

In the abstract, nothing is wrong with the analysis presented thus far. Out of the abstract, however, the analysis can be faulted for its obliviousness to the central dynamic of the 2012 election. Given the realities of the Electoral College, the candidates and everyone else recognized that the outcome would be determined by what happened in about a dozen “swing states” that either candidate could plausibly win. In the other states, the winner was a foregone conclusion so there was little campaigning and little interest in polling results.

Under the circumstances, a comparison between RCP and 538 should focus primarily if not exclusively on their accuracy in swing states. It hardly matters, after all, which method more precisely specified the landslide by which the winner took New York or Texas. RCP identified 11 states as “toss up” just before the election: Colorado, Florida, Iowa, Michigan, New Hampshire, Nevada, North Carolina, Ohio, Pennsylvania, Virginia, Wisconsin.

If we return to Table 1 and concentrate on these 11 states, we see something sharply different from the pattern for all 30 states. As Table 4 indicates, 538 outperformed RCP in absolute forecast accuracy in all but one of the swing states (Ohio). Both forecasters were on average more favorable to Romney than the actual voters, but the net “bias” was only 0.76 percentage points for 538 over the 11 states, as opposed to 2.44 points for RCP. That difference of 1.68 (2.44-0.76) points is especially noteworthy because the regression formula relating 538 to RCP indicated that 538 raised the estimates about Obama’s performance by about 1.5 points relative to RCP. Again and again, this adjustment was vindicated by the swing-state results: Not only did 538 correctly predict that Obama would outperform the last polls in those states, but Table 4 shows that it closely identified the extent to which he would do so. (Its mean absolute error in swing states was only 1.45 percentage points, as opposed to 2.62 points for RCP.)

Table 4: Relative and absolute accuracy of RCP and 538 forecasts in swing states and other states.

 Under a simple binomial test on the question “who did better?” 538’s victory in 10 out of the 11 swing states was statistically significant. But such a test entails an independence assumption, and Nate Silver would be the first to agree that his state-by-state forecasts are correlated. In effect, he made an all-or-nothing bet on the premise that the polls underestimated Obama’s strength in swing states: had this premise been wrong, his 11-1 victory over RCP could easily have been a 12-0 defeat. Yet uncertainties about how to define statistical significance cannot obscure the fundamental point: 538 did extremely well in 2012 in those states where accuracy was most important.

Table 4 compares RCP and 538 in not only the 11 swing states but also the 19 other states for which both methods generated 2012 forecasts. It shows that RCP and 538 were closely matched in the 19 states without close contests, which is why earlier statistics based on all 30 states yielded less disparity between RCP and 538 than the swing states alone. It is tempting to speculate that Obama outperformed the swing-state polls on which RCP relied because of the major voter-turnout drives that his campaign undertook in those states, which brought many people to the voting booths whom pollsters had not counted in their tabulations about “likely” voters. In the other states, the Obama campaign may not have waged such efforts, so no comparable “surge” occurred.

Final Remarks

So how does it all add up? Under the Occam’s Razor principle, there is a clear starting preference for simple models over more complicated formulations. A complex model must justify its intricacy by offering more accurate information than a simpler counterpart; moreover, this added information should arise in places where it is most needed. In the present setting, the question is whether Nate Silver’s 538 model outperformed the straightforward RCP method to an extent that makes 538 the wiser choice, even if the less transparent one.

Readers can reach their own judgments but, because of the results in the swing states, the author believes that 538 met the test for superiority just posed. While the tortoise catches up with the hare in the nursery stories, it seems here that the hare won hands down. But the outcome does not contradict Aesop’s fable because, far from being lazy, the 538 hare ran the race as hard as it could. And, if the evidence is any guide, it is very much a world-class runner.

Arnold Barnett (abarnett@mit.edu) is the George Eastman Professor of Management Science at the MIT Sloan School of Management. His research specialty is applied mathematical modeling with a focus on problems of health and safety. An abridged version of this article appeared in the January/February issue of Analytics magazine.

Data Sources

  1. Five Thirty Eight Presidential Election Forecast@ http://fivethirtyeight.blogs.nytimes.com/ as updated at 10:10 a.m. on 11/6/12, Election Day.
  2. Real Clear Politics State Averages, Presidential Race@ http://www.realclearpolitics.com/epolls/2012/president/2012_elections_electoral_college_map.html (as of 11/6/12, 10 a.m.).
  3. Election 2012 President Results, (as of 11/29/12) @ http://elections.nytimes.com/2012/results/president/big-board