Forum: The data science challenge

“Charlatans” claim to be doing traditional statistics with little evidence for their claims.

“Charlatans” claim to be doing traditional statistics with little evidence for their claims.
Image © Thinkstock

Is a new movement against conventional approaches a threat to OR/MS as we know it and how might the profession best respond?

By Douglas A. Samuelson

There is a specter haunting OR/MS – the specter of data science. All the powers of old OR/MS have united to exorcise this specter.

Many readers may recognize that the preceding paragraph is a rephrasing of the beginning of the “Communist Manifesto.” In both cases, an established political, social and intellectual order faced a challenge from a new way of looking at things. In both cases, as in many other such instances in history, the reaction was first to dismiss the new approach as unimportant, then to attack it when it didn’t go away. And the point is that such a response generally helps the new approach to succeed, at considerable cost to everyone.

So, it is timely to look at some evidence of whether OR/MS is, in fact, endangered by another discipline, and how the OR/MS profession might best respond.

The Adversaries are Coming

Peter Bruce, president of the Institute for Statistics Education at Statistics.com and co-author (with Galit Shmueli, Nitin Patel and others) of one of the most successful books on data mining [1], is a well-positioned observer of the situation and a longtime supporter of OR/MS. “Look at the job announcements,” Bruce says. “Companies want to hire data scientists. Statisticians, not so much, and operations research analysts even less.”

In Washington, D.C., the Data Science DC Meetup group, started in 2011, typically draws about 150 to 200 to its not-quite-monthly programs. This is more people than the membership, to say nothing of meeting attendance, of any INFORMS chapter including WINFORMS, the Institute’s Washington, D.C., chapter. Ten years ago, WINFORMS had around 300 members; today it has less than 100 according to a recent check of INFORMS records. Is the market telling us something?

No doubt there are many factors at work here, but Bruce points to one: “The development of data science coincided with the social media movement. Suddenly you didn’t need a professional society to have meetings.” And this lack of traditional structure seems to have facilitated going in new programmatic directions.

Brenda Dietrich, a former president of INFORMS, retired at the end of 2017 after a long and distinguished career at IBM. She now holds an endowed faculty position at Cornell University – in operations research, she proudly points out. But her title for the last few years at IBM was “data scientist.”

“IBM was defining job roles with data science in the title,” Dietrich says, “and there was disagreement over database manipulation versus statistical and mathematical content. A query against distributed data isn’t data science, but some people think that’s all there is to it. Good data science practice includes deriving and checking coefficients. If the model looked right when you developed it, does it still look right after more data come in? O.R. has always focused on decisions or actions. Data science often stops at having found interesting phenomena. Good analysis requires you to tease out the decision system that underlies the data.”

This is not a trivial distinction. Randy Bartlett, an acclaimed practitioner of both applied OR/MS/analytics and applied statistics, and author of one of the most popular recent books about data analysis [2], is blunt: In blogs and other fora, he has repeatedly denounced “charlatans” who claim to be doing something as good as traditional statistics with little evidence for their claims. Bill Luker Jr., author of another popular book on data analysis [3], is also scathing, writing in his blog [4] about big data analytics “failing to deliver the information goods.” In spurning all theory, both methodological and subject based, in the name of “just letting the data tell us what’s there,” he notes, serious context gets lost. “The focus remains on managing – collecting, sorting, cleaning, searching and…setting [the results] aside, for a future day, when someone can analyze it,” he adds. “Yet that day never comes.”

Some Background

On closer examination, data science appears to be the latest and possibly most threatening manifestation of a desire by people from other disciplines to get the benefit of OR/MS and applied probability without having to learn much mathematical statistics. Fuzzy set theory and data mining can trace their origins to this desire. “Non-statistical” quantitative methods in other sciences show similar foundations in distaste for traditional statistics. For example, it is not difficult to find textbooks on survey research methods that promise to stick to “just the calculations you need” without having to consider the theoretical justifications for those calculations – or how to check whether the assumptions underlying the analysis are defensible.

And, in fact, such discord exists within the OR/MS and statistics professions, as well. The embrace of “analytics” as a more practice-oriented view of OR/MS, for example, has not taken place without controversy, and the conflict over “how much theory is too much” persists. As long ago as the late 1970s, OR/MS giant Russ Ackoff warned that excessive insistence on complicated models, devaluing simpler but effective solutions, would result in OR/MS analysts being relegated to out-of-the-way locations in organizations. These increasingly isolated specialists would be occasionally brought in for calculations by other people who would replace OR/MS analysts as the trusted advisors of decision-makers.

In statistics, the established view as of 40 or 50 years ago was “frequentist” statistics. Bayesians have largely taken over the highest positions in the profession, but “frequentist doctrine” is extremely well entrenched in many places. In the last few years, challenges to conventional wisdom have reached the point of the American Statistical Association (ASA) holding a conference last September, apparently planned to be the first of many, reconsidering the role of statistical significance. ASA even issued a statement warning against overuse and misuse of the concept of statistical significance. Among the reasons is a tendency, which this reporter has personally experienced, of doctrinaire statisticians refusing to consider interesting analyses because the presenting analyst had not “proven” statistical significance. A claim of “this is preliminary, but it seems there might be a pattern worth looking into here” had no place, all too often to the detriment of all concerned.

In one instance, about 30 years ago, a group of physicians had visited refugee camps in Turkey. They examined about 300 people who had what appeared to be chemical burns on their hands and faces and told fairly consistent stories of Iraqi planes flying over their villages spraying a yellow liquid. The doctors submitted their findings to the Journal of the American Medical Association, only to have it rejected forcefully by the statistical reviewer: “Your number of subjects is too small, and you didn’t specify your control group.” Fortunately, the physicians eventually found an unorthodox OR/MS analyst who was able to recast the analysis in Bayesian terms and provide a more persuasive explanation of why classical clinical trials methods were inappropriate for this problem.

Clearly the statistical reviewer was misusing standard methods in a nonstandard case, but the fact remains that he or she was this distinguished journal’s reviewer of choice for this article – probably not just some hack who had taken a couple of statistics courses. One can easily imagine this sort of rigid, stultifying doctrinaire thinking prevailing in other analyses and fueling an “anti-theory” rebellion.

Key Issues

So, the OR/MS profession, and its sister professions such as statistics, may soon be compelled to devote serious study to how much methodological doctrine is appropriate to good analysis, and what good doctrine would look like. In Dietrich’s view, “Tools like Hadoop are good for shallow analysis of vast amounts of data – counting, sorting, ‘bucketizing’ – but not nearly as good on deeper analysis. Now we’re looking for predictive signals, and that’s where statistical techniques and machine learning are needed.”

Douglas W. Hubbard, who has made a franchise out of his “How to Measure Anything” books, noted many years ago another weakness of superficial analysis. How often, he asked, do decision-makers and analysts go back and measure how well their analyses did in improving decisions? His answer, based on survey research and in-depth follow-up with several dozen organizations: not often and not well [5]. He derides “analysis placebos” that make decision-makers feel more confident without demonstrably improving decisions. The growth of the data science movement threatens to take matters further in the wrong direction.

Worse yet, the methods of data science threaten to make the data harder to use. Dietrich noted, “If you’ve done a superb job of ingesting vast amounts of streaming data, so it’s basically in time series format, and then you want to analyze locations, the structure of the stored data can make that extremely difficult.” One particularly disturbing “improvement” in big data methods is “sharding,” which means breaking incoming data into small chunks to store the data more efficiently – often stripping away relationships and metadata that could greatly aid later analysis [6].

Dietrich elaborated, “Big data was all about storing it all, then maybe using software like MapReduce to find patterns. People from that IT-centric background assume that huge numbers of compute agents can find anything. But it’s a lot easier to use a phone book if it’s alphabetical.”

Next Steps

So, the OR/MS profession has some important decisions to make. Dietrich asserted, “We found a reasonably happy marriage with business analytics. In contrast, OR/MS didn’t have much say in how enterprise resource planning (ERP) systems were built, so they ended up with some serious omissions, such as dealing with stochastic demand. We need to work with data science as we did with analytics. We can help make better sense of it and support better decision-making.”

There have been other precedential challenges. Thirty years ago, the best-seller “In Search of Excellence” [7] disparaged quantitative analysis, as many of the top-performing managers interviewed claimed that good business judgment was paramount, perhaps sufficient. The gloomy predictions by Russ Ackoff, that O.R. would be relegated to a highly technical support role, seemed to be coming true. And then along came Tom Davenport’s “Competing on Analytics” [8], reporting that many of the senior managers with “good business judgment” were, in fact, relying heavily on quantitatively expert subordinates. The most powerful criterion for deciding what methods to use had triumphed again: whatever works better in practice tends to prevail. It just takes some time.

And it takes effective responses. The analogy to the Bolshevik Revolution is not frivolous; massive economic and military failures made the Russian Revolution possible, while other, more industrialized countries – the ones where Marx thought the revolutions would happen – found better responses to popular discontent. Most important is properly disseminating information about what works. As Dietrich put it, “We don’t do a good enough job of teaching people to question whether the answer is reasonable. Data science is better with O.R. O.R. is better when using data science.”

As Hubbard, Bartlett, Luker and others have so strongly and cogently asserted, the profession can do a much better job of documenting what doesn’t work. “We can’t let the charlatans get away with claiming their fluff is as good as our analysis,” Bartlett put it, bluntly. This might mean that some OR/MS analysts have to get much more proficient at making the profession’s case via news media, including social media.

INFORMS leadership is aware of the challenge. Tasha Inniss, recently hired as director of education and industry outreach at the Institute, is herself an INFORMS member and applied mathematician. “Analytics can be thought of as an umbrella term to describe all quantitative decision-making,” she says. “Some people use the term ‘analytics’ to refer just to metrics. So then they can characterize analytics as part of data science. But analytics in the broadest sense includes many different areas such as data science, machine learning – possibly even optimization, although some people bristle at that idea. Thus, any area, field or discipline that contributes to quantitative decision-making is all subfields of analytics. As a professional society, we have to put a stake in the ground and say how we’re defining these terms, and what’s within the profession.”

INFORMS centers its efforts around the ANSI-accredited Certified Analytics Professional (CAP) exam, which INFORMS has been working diligently to position as “the “premier global professional certification for analytics professionals.” Since the exam is intended to cover the entire analytics process, INFORMS encourages hiring organizations to use the exam to vet candidates’ skills in building data science teams All this, however, leaves open the question of how to convince the rest of the world that OR/MS/analytics proficiency generally translates into better decisions. It also leaves open the related, continuing question, debated within the profession for 60 years or more, of exactly what skills and accomplishments should be considered part of the profession’s defining proficiency.

Another possible course of action is to offer continuing education for consumers of analytics services, teaching managers how to assess and recognize good analytics. This, however, would require much more effort and learning than relying on the profession to certify who the capable people are – without anyone having to get into the question of which analyses were actually good. But tolerating the growth of a managerial class that can’t evaluate the quality of analyses invites the market to teach them the hard way, with likely bad consequences for society as a whole. How best to employ analytics and assess how well it is working might be a good area for both rigorous research and less formal studies.

Yet another way of looking at the situation: None of the great pioneering OR/MS analysts had degrees in OR/MS. Over-specialization has been shown to be damaging. So, if OR/MS is increasingly focused around an established set of techniques, and exclusive of other points of view, the future of the profession appears to be in jeopardy. But if OR/MS can stay true to its origins as a deliberately interdisciplinary discipline, incorporating new insights and building a uniquely productive way of looking at the world, then its future can be bright. ORMS

Doug Samuelson (samuelsondoug@yahoo.com), a longtime member of INFORMS, is president and chief scientist of InfoLogix, Inc., a small R&D and consulting company in Annandale, Va. He is the author of the ORacle column in OR/MS Today.

References

  1. Galit Shmueli and Peter Bruce, 2016, “Data Mining for Business Analytics: Concepts, Techniques and Applications with XL Miner,” Wiley. There is also a 2017 version keyed to R.
  2. Randy Bartlett, 2013, “A Practitioner’s Guide to Business Analytics,” McGraw-Hill.
  3. William Luker, Sr., and Bill Luker, Jr., 2010, “Signal from Noise, Information from Data,” Xlibris.
  4. Bill Luker, Jr., 2018, “Data Science Critique v. 4.3,” unpublished draft report shared with author.
  5. Douglas W. Hubbard and Douglas A. Samuelson, 2009, “Modeling Without Measurement: How the Decision Analysis Culture’s Lack of Empiricism Harms its Effectiveness,” OR/MS Today, October.
  6. Doug Samuelson, 2014, “The Sharding Parable,” OR/MS Today, April.
  7. Thomas Peters and Robert Waterman, Jr., 1982, “In Search of Excellence: Lessons from America’s Best-Run Companies,” Harper & Row.
  8. Thomas Davenport and Jeanne Harris, 2007, “Competing on Analytics: The New Science of Winning,” Harvard Business School Press.

Visit these Advertiser Sites