Analytics penetrates deeper into politics

Meet the number-crunchers who helped take down House Majority Leader Eric Cantor of Virginia in the primary, using big data and big computation – and old-fashioned pattern recognition. Are more surprises in store this November?

By Douglas A. Samuelson

Targeting voters and motivating turnout is crucial in tight elections.

Targeting voters and motivating turnout is crucial in tight elections.

The old adage that “politics makes strange bedfellows” was never more evident than in Virginia’s 7th District earlier this year when a team of left-leaning analysts helped a tea party-backed primary candidate upset an entrenched conservative Republican who just happened to be the house majority leader. To put that into perspective, no sitting house majority leader had ever lost a primary.

On June 10, Eric Cantor, majority leader of the U. S. House of Representatives, lost his bid for re-election in a primary election despite his huge name recognition, seniority and outspending his opponent more than 10 to one. Pundits promptly came forth with a variety of theories about how this happened, ranging from failures of messaging to ideological shifts to possible fraudulent votes. The real explanation, however, has to do more with predictive analytics – and may portend more election surprises in November.

In 2010, Cantor won re-election in a three-way race. Although the year was generally very good for Republicans, Cantor’s win was closer than analysts predicted – he was held below 60 percent of the vote in a general election for the first time in his career. The Democrats’ issue polls indicated, as former Democratic campaign manager Brian Umana recently put it, that “Cantor’s support was a mile wide and an inch deep” – in other words, much of Cantor’s broad support lacked enthusiasm. Some of the Democratic analysts, in particular Umana and Jonathan Stevens, a political and media consultant who was a political director and polling guru in 2010, formed working friendships with their counterparts on the third-party candidate’s campaign. When some of the third-party candidate’s analysts decided to get heavily involved in this year’s 7th District Republican primary challenge by a tea party Republican, those operatives used targeting lists and strategies that Umana and Stevens shaped for them in prior years.

“We saw no harm in mentioning strategies that tea partiers might use to reach sporadic Republicans and far-right ‘independents’ who were less likely than other Republicans to support Cantor,” Umana explains. “We shared data-science techniques for voter targeting and for evaluating the relative cost of earning the votes of different types of voters. What we did in terms of data science would not necessarily seem high-tech to OR/MS analysts. It was principally a matter of a SQL database, getting and cleaning State Board of Elections (SBOE) data and enriching that data with helpful supplementary information.”

Personal Polling

According to Umana, the “supplementary information” – gathered throughout the campaign by paid staff and volunteers – was critical to the analytic work. “Many questions were asked, and many of them yielded helpful data,” Umana says. “A simple one of key value was a series of questions along the lines of: How likely do you consider yourself to cast a ballot on November 2? Options would be given with a numeric scale of likelihood. Whom do you support in the election? Strongly support dem, lean dem, undecided, lean Republican, strongly support Republican or support third-party candidate?”

The information could be collected by robo-call or by human callers, but Umana notes that human callers are much preferred, particularly if there is any chance it can become a persuasion call.

“Sometimes you have pre-ID’d voter preferences by looking at whether they routinely voted in Democratic primaries in the past or routinely and exclusively voted in Republican primaries in the past,” Umana says. “Sometimes, however, there are voters out there who do not vote in primaries, but you do see from the SBOE record that they have voted in past general elections, and they have not answered survey questions in the past. You know they are a voter – either a consistent general-election-only voter or a sporadic general election voter – but you don’t necessarily know anything about their preferences. Gathering information on these voters could sometimes, in limited ways, be useful to the Democratic Party campaign – and we did some of this. But consider the strategic implications.”

Turnout is Critical

Party turnout is a big factor in congressional elections, particularly for a first-time nominee, and it’s something that Umana notes is not completely beyond a campaign’s control. “It is pretty easy to get a good idea of who are sporadic Democratic voters, who are Democratic voters who only come out in presidential election years, who are Democratic voters who – even when they come out in a Democratic election year – might be at risk of an under vote; i.e., they vote for the top of the ballot, but do not vote farther down the ballot for the candidates whom they have not heard of or whom they feel like has insufficiently ‘asked’ for their vote. Look at any losing congressional candidate in a district that was won (or even nearly won) by a presidential or senatorial candidate from the same party, and chances are that the congressional candidate could have won if she or he had sufficiently addressed the risk of an under vote.

“In 2010, there was the strategic difference that we were the top of the ballot – no presidential election, no U.S. senate seat up for election. Yet some of the same principles still applied. How did we outperform the major tracking polls’ prediction of how a Democrat would do in the district, and how did we manage this in a terrible year for Democrats?”
Umana points to five strategies:

  1. make sure the base was aware of the candidate and did not feel taken for granted;
  2. find Democrats who were sporadic or inconsistent congressional voters and were very likely to support President Obama – usually because they would just tell you they did, but also sometimes based on demographics, i.e., age, race, etc., and explaining that a Democratic congressional vote would show support for the president’s agenda;
  3. find Democrats who were sporadic or inconsistent congressional voters and had some dissatisfaction with Obama and with the government as a whole, and explaining the importance of showing their opposition to Cantor regardless, and positioning our candidate as a dissatisfied outsider like them;
  4. the usual get-out-the-vote stuff, offering rides to the polls, reminder calls and so on; and
  5. reach into the pool of voters who responded in polls that they were independent or likely Republicans who identified themselves as undecided, et cetera.

Better Targeting

Asked what he means by “strategic implications,” Umana explains: “We knew based on general neighborhood characteristics, and the fact that the Obama campaign had done so much voter ID of likely dems in 2008 – we often knew whom we needed to go to for our own Democratic campaign. If there was no data on a person’s preferences, there was some chance they were just very private – and so they could be valuable to our campaign to contact when we had the resources – but there was a much greater chance that this would be a person of value to a third-party candidate.

“Remember, the average voter in most neighborhoods of this district already has a greater than 50 percent chance of being conservative. Add to that some other factors, and you have a likelihood that this is a person of value to a third-party candidate or a right-wing outsider. Any person who identified himself or herself to us as strong Republican while also identifying themselves as undecided was an indication of someone who was dissatisfied with Cantor and might support a right-wing challenger. This was our own data, and we knew it had value to us, and that it could potentially be even more valuable to others.”

The Umana team conducted its own polls – partly because they could not afford expensive pollsters, but primarily because, as Umana puts it, “We knew that we could design a poll even better than they could; we just wouldn’t have their brand name and appearance of outsider objectivity in presenting poll results to the media.”

Umana also notes that the improved polling required larger sample sizes and repeated iterations, refining questions to hone in on issues the poll-takers had noted as important to the respondents. The better-known standard polls correct raw response numbers using models of turnout that may be significantly off for the current election. In any case, such turnout models simply assume away the question of the extent to which turnout can be influenced by different campaign tactics. “Those ‘likely voters’ models are shaky,” Umana says.

Media and political consultant Stevens amplifies: “The best way to model the electorate for turnout purposes is demographic. Turnout is correlated with income. And we were able to re-draw the domain: pro- vs. anti-establishment, not just party registration. Traditional polling methods that rely heavily on party registration to stratify the population tend to miss other issues.”

Stevens criticizes the analyses most pollsters do, as well. “No pollster ever got fired for doing a LOESS curve-fitting,” he laughs. “But the part that demands cleverness is determining the terms and tools of data analysis and manipulation rather than fine-grained facility and expertise with one particular mathematical tool or piece of software.” In short, mathematical and computational sophistication is a poor substitute for careful and creative detailed data analysis. People still see complex patterns more readily than computer algorithms. The analyst must then choose the appropriate tools and expressions to characterize the patterns formally and repeatably.

Advising anti-Cantor activists, Umana says, brought other qualitative factors into play. Geographically, where would right-wing outsiders likely be organizing? If they could be persuaded and needed additional organization help, who could most easily reach them? Chesterfield and Henrico counties are near Richmond and have many suburban conservatives who were likely to feel ignored by Cantor. Hanover County is very rural, very conservative and heavily Republican, but not necessarily trustful of the Republican establishment. “There are differences between rural conservatives and suburban conservatives,” Umana adds. “Basically, what we did was find a congressional district that was used to comparatively non-competitive elections.”

More Effective Tactics

Given Cantor’s long and seemingly unbeatable reign, no one had looked deeply into the detailed voting patterns in the 7th District in a while. “What this meant,” Umana elaborates, “was that a lot of the political activists were relying on techniques that weren’t that different from those of the 1950s. For example, yard signs. Literally just setting out yard signs. Well, yard signs don’t vote. Yard signs may actually be unhelpful in terms of getting volunteer help, because someone might feel like they’ve put out a yard sign and don’t need to do anything else. Yard signs can help with boosting name recognition, potentially, but they are not very good at driving people to a website or getting them to learn more about a candidate. They mostly matter to people who are already tuned in to an election and have their minds made up anyway.”

As an example, Umana says that having people hold out signs that say “Vote Today” on Election Day in a suburban Republican stronghold is an idea that would have been foreign to right-wing outsiders.

“So,” Umana sums up, “we’re talking about a region without modern campaign mechanics being put into use, a region where the perceived wisdom was that the district was noncompetitive and that it didn’t matter what you did there. To analogize politics and strategic data work to another mechanical realm, it was like we brought a Model T to an area where people had pretty much only seen horse-drawn carriages. Four years later, with additional data science tools in politics, if I were running a campaign now it could be like driving a Ferrari into regions that have just gotten excited about the Model T.”

Lessons for Other Campaigns

Today, many more analysts are, indeed, putting new methods and insights to work in current races. Does this mean that one party is likely to surprise the other in a multitude of contests in November? Stevens assesses that as unlikely. House districts, in particular, are generally drawn to be mostly safe for one party or the other, so they do not change hands often. “Incumbent retention is above 85 percent,” he states, “even in ‘wave’ elections.”

And some states have, as Stevens put it, more “elastic” electorates than others. New Hampshire, for instance, has had many election races with considerable party switching. North Carolina and Virginia, in contrast, are less elastic, and the campaign process often simplifies to turnout. This, too, means that “redrawing the domain” as the analysts did in the Virginia 7th, may not be widely possible.

“There are a couple of interesting Senate races,” Stevens notes. “That Mitch McConnell in Kentucky and Pat Roberts in Kansas are having trouble is unusual. Maybe this anti-establishment sentiment we saw in the Virginia 7th is looking like a movement around the country.”

Also, as OR/MS Today reported in early 2013, the Obama campaign developed the best databases anyone had seen, along with new methods for analyzing the data. The Obama campaign was reluctant to share this information with anyone, including Democratic congressional campaigns, in 2010, while they were still focused on Obama’s 2012 re-election campaign. Since then, however, they have been more willing to share not only data and methods, but also analysts – this was a key factor in the Democratic sweep of the races for governor, lieutenant governor and attorney general in Virginia in 2013. As Stevens notes, “Getting good data early is critical. Every few weeks without data or the money to make use of it hurts your campaign.”

So even if major polls seem to be trending one way or the other, there may be a few surprises on Election Day, especially in close Senate races, where targeted-issue appeals and get-out-the-vote efforts could be decisive – and unexpected by the opposing campaigns. Remember this when pundits who are unfamiliar with analytics begin “explaining” those results.

Douglas A. Samuelson (, a contributing editor of OR/MS Today, is president and chief scientist of InfoLogix, Inc, a small R&D and consulting company in Annandale, Va., and senior operations research and systems analyst for Group W, Merrifield and Triangle, Va., another small analytics company, supporting defense applications. Earlier in his career, he worked professionally for several political campaigns, including some voter targeting work in the 1970s.

Brian Umana works for Illumina Consulting Group, also known as ICG (, a Maryland corporation that serves government and commercial clients addressing complex analytics problems. He is no longer doing political campaign analytics and not currently politically involved.

Jonathan Stevens was a staffer on Barack Obama’s 2008 campaign before becoming a political and media consultant running campaigns for candidates, political groups and private institutions.


  2. Douglas A. Samuelson, “Analytics Keys Obama’s Victory,” OR/MS Today, February 2013.
  3. Brian Umana, “I’m a Democrat and I Helped the Tea Party Unseat Eric Cantor,” The Washington Post, June 13, 2014.