Election analytics

Student-driven STEM learning lab’s election forecasting website predicts presidency and Senate races.

By Wenda Zhang, Jason J. Sauppe and Sheldon H. Jacobson

The 2016 presidential race between Hillary Clinton and Donald Trump is unlike any other election. Source: Election Analytics

Numerous political pundits believe that the 2016 presidential election will be unlike any other election in the past. It is not because of the prevalence of social media or the wall-to-wall coverage from cable news organizations, both of which have already played significant roles in the past several elections. Rather, the 2016 election is unique in the amount of contention generated by the two major party candidates even before the general election kicked off.  The word “unprecedented” has been employed many times to describe how repeatedly surprised people have been over the course of last year; and no matter what the outcome will be come Nov. 8, this election almost certainly deserves its own chapter in the history books.

The “unprecedented” nature of this election means that people are even more curious about election forecasts. National polling usually offers only limited insight into the odds of a candidate winning the White House, but state-by-state polling can be used to create more powerful predictions using advanced analytics. The Election Analytics (electionanalytics.cs.illinois.edu) team, housed in the Department of Computer Science at University of Illinois, has a wealth of past experience in doing just that. In 2008, the team first introduced their election forecasting website. The model behind their forecasts makes use of all state-level polling data to compute probabilities of each candidate winning each state; these probabilities are then used to determine how many electoral votes each candidate will receive. In 2008, their model correctly predicted the outcomes for 50 out of 51 states (with the District of Columbia included).

In 2012, the team launched a new and improved website to predict national elections, focusing on the presidency and the Senate. More components, such as a trend chart that showed predictions over time, were added to the website in order to present information regarding the races and forecasts more clearly. On Election Day, the model was able to correctly predict the outcomes for 50 out of 51 states (District of Columbia included) for the presidential race, as well as the outcomes for 31 out of 33 Senate races.

2014 saw another big update for the website, with a complete redesign, incorporating modern web design concepts to make it streamlined and responsive, with all past prediction data readily available for viewing. For the Senate races that year, the model was able to correctly predict 35 of 36 races.

In anticipation of the 2016 election cycle, the Election Analytics team has added several new features for the website. One significant addition is forecast customization, which allows users to modify the existing forecasts in various ways. These custom forecasts still rely on the same mathematical model for constructing candidates’ probabilities of winning, but allow a user to experiment with some additional assumptions regarding the polling data itself, providing a means to conduct sensitivity analysis based on such data.

First, the customization options allow a user to exclude polling data from certain sources. By default, each forecast is constructed using all available polling data. This includes polls from sources that a user may consider to be biased; by excluding these sources, the user can get a forecast that they consider to be more accurate. Another customization option allows the user to control the impact of undecided voters on the election. By default, the undecided voters are assigned evenly to the major party candidates; however, users can now shift these percentages toward either the Democrat or the Republican candidates, if they believe that these undecided voters will break one way or the other on Election Day.

This is also the year that independent candidates are positioned to have noticeable impacts on the final results. Both Libertarian Party candidate Gary Johnson and Green Party candidate Jill Stein have been polling above normal (around 9 percent for Johnson and 3 percent for Stein, as of mid-September) in national polling averages, indicating that their effect on the election cannot be ignored. This is particularly true for Johnson, who will be appearing on the ballot in all 50 states; he also has the potential to reach 15 percent in polls, which would grant him a spot in the presidential debate. Should he succeed in doing that, he may even (albeit unlikely) be able to win electoral votes. [Editor’s note: Johnson did not qualify for the first debate.]

The more likely scenario is that he will alter the results of some states, and hence, possibly the final result for who gains the White House. Given this, another customization option allows for the users to construct forecasts using polling data that either includes or excludes Johnson and Stein.

Hence the word “unprecedented” appears justified. At the time of this writing (mid-September), the Election Analytics forecast gives Clinton 304 expected electoral votes, with a probability of 0.53 of winning at least 300 electoral votes. By adding a very strong Republican lean to the undecided voters, Clinton’s expected electoral votes drop to 286, with a probability of 0.15 of winning at least 300 electoral votes. Clearly, Clinton (at the time of this writing) has the lead, though this lead has been eroding since mid-August. Also at the time of this writing, the Election Analytics forecast gives the Republicans a probability of 0.87 of retaining control of the Senate. Any number of unpredictable events, domestic or international, could resulting in these leads shrinking or widening.

All these factors suggest that it will be a challenging election to predict. At the same time, it is also a great opportunity for the Election Analytics team to test operations research and advanced analytic methodologies on a real-world application. Election Analytics is a student-driven STEM learning lab, providing a unique opportunity for students to transition their classroom knowledge into a practical tool that draws widespread national interest. The goal is to demonstrate, once again, that through careful data analysis and analytics, even the future is not completely obscured from the eyes of the population.

Wenda Zhang is a graduate student at the University of Illinois at Urbana-Champaign. Jason J. Sauppe is an assistant professor at the University of Wisconsin-La Crosse. Sheldon H. Jacobson is a professor at the University of Illinois at Urbana-Champaign. Their web site (electionanalytics.cs.illinois.edu) provides daily forecasts for the presidential and Senate races. The University of Illinois at Urbana-Champaign students involved in the design, execution and updating of Election Analytics during the 2016 election cycle are Siddhartha Duri, Rishi Jain, Niraj Pant and Victor Jarosiewicz.