Sports analytics taxonomy, V1.0

Classification techniques provide an important first step in the serious study of the fast-growing field of sports analytics.

By Gary Cokins, Walt DeGrange, Stephen Chambal and Russell Walker

Sports Analytics

The application of analytics to sports has been going on for decades, long before computers came along. Major League Baseball (MLB) fans, for example, have been “keeping a scorecard” (with pencil and paper) at games for more than a hundred years, dutifully recording balls, strikes, outs, runs, hits, errors and other data. Today, in the era of advanced analytics and big data, MLB teams can employ multiple high-speed cameras and in some cases sensors to capture not only every minute detail of the game (speed and location of a pitch, direction and distance of a hit), but to correlate every action and player movement with associated conditions such as the game situation (two outs, runner in scoring position, bottom of the ninth, one run behind, playing a night game on the road against a left-handed pitcher). The same is true for basketball, soccer and other professional sports teams and leagues.

Why Collect and Analyze Sports Data?

The explosive growth of advanced analytics in the professional sports world is mindboggling, prompting some to ask: What is the use of such minutia? It is a valid question, but when answers to this question begin to surface, one begins to observe that portions of the collected data support better decision-making. But that introduces another question: “What types of decisions?” For an MLB baseball manager, the decision might be as simple as whether a batter should bunt or hit away given the game situation. For an MLB general manager, the decision might be, based on data-driven player evaluations, how much should the team offer to pay that player during salary negotiations or should the team consider trading that player to another team, and what player should the team try to get in return?

The opportunities to apply analytics in sports extend beyond player evaluations and in-game coaching decisions and strategy. Examples include management of the sports franchise and venue, such as how much food to order for vendor concessions at the stadium, and how to best price tickets and corporate sponsorships. Additionally, analytics has given rise to a large investment in player biometrics (in order to monitor players’ health, including the severity of head concussions or heartbeat during extreme heat). Given this, sports teams deploy analytics at many levels (for players and during the execution of a specific play) and in many realms (from player performance to revenue management).

How can anyone make sense of all the possible applications of analytics in sports? How many categories of sports analytics are there? Which type of statistical method is appropriate for each category? What level of a particular type of statistical method is practiced in sports today? Where are the opportunities to apply analytics in sports going forward?

Figure 1: Version 1.0 of the sports taxonomy. Source: Ben Grannan

Figure 1: Version 1.0 of the sports taxonomy. Source: Ben Grannan

Sports Analytics Taxonomy

How do we answer these types of questions? The answer is to begin by creating a sports analytics taxonomy. A taxonomy is a technique for classifications; it’s typically hierarchical with a tree-branches-leaves structure. Examples are found in biology (plant and animal species), chemistry (basic elements and compounds), astronomy (types of galaxies and stars) and business organizations (types of industries and services). The taxonomy allows for a useful organization of the field.

In this article we propose a version 1.0 of a sports analytics taxonomy. Its purpose will be not only to define the classifications of uses of analytics in sports, but also to identify applicable statistical methods for each classification, thereby promoting best in class research and application of analytics in sports. With an associated coding scheme, articles and other digital content can be cross-referenced to organize a “body of knowledge” of sports analytics to aid researchers. Going further, the taxonomy can be used to assess the current stage of maturity of various analytical methods in practice. One purpose for this assessment is to identify what opportunities exist for further application of analytics or to refine existing ones in sports.

Figure 1 depicts the authors’ version 1.0 of the sports taxonomy. Rather than trying to describe every detailed branch and leaf of the taxonomy, this article is intended to describe the reasoning for the major “tree branches” of the taxonomy and then provide a few examples of some of the “tree branches.” The eight major tree branches depicted can be broadly grouped into three super-branches as follows:

  1. The first three branches are team, individual and league sports. What these three branches have in common is “sports,” implying competition.
  2. The fourth branch is recreational with a personal focus that is oriented toward an individual’s health and performance.
  3. The last four branches involve the quest to conquer uncertainty. They are fantasy sports, sports betting, games of chance and professional online gaming.

We considered alternative branches such as differentiating professional sports from amateur sports. Upon further examination, it became apparent that there would be substantial similarities between the two, creating redundant “branches and leaves.” On the other hand, team and individual sports trunks reduce redundancies since each trunk involves different types of decision-making. Consider this in regards to super-branch No. 1:

  • Team sports: Examples are familiar, including soccer, baseball and hockey. Some of the team-related minor branches are winning strategies; recruiting and scouting players; business operations, including stadium management and ticket pricing; and player evaluation for salary negotiations. Note that each of these mostly apply regardless of the players’ age level (youth, high school, college or professional).
  • Individual sports: Examples are golf and tennis. Some of the individual-related minor branches are body conditioning, biometric physio monitoring and behavioral modeling.
  • League sports management: The third branch of the “sports” grouping superbranch No. 1 involves the coordination of teams (but in some cases individuals) and has business management aspects to it such as scheduling, revenues, TV licensing and brand licensing.

There will always be some similarities in each of the first two minor branches, for example strategy to win and body conditioning.

High-Tech Revolution

Sensor technology and strategically placed high-speed cameras – briefly mentioned at the beginning of this article – have clearly stimulated greater interest in the field of sports analytics for one obvious reason: they provide more data!

One example, again from MLB, is Statcast, a technology that provides many cameras and radars located with different view angles in baseball stadiums. Data is collected at 30 frames per second for every player on the field, both defensive and “at bat,” as well as of the baseball. With this technology everything that is happening on the playing field is collected. Therefore anything can be measured. The acceleration of players running the bases or reacting to an outfield fly ball can be measured. The velocity, arc and accuracy of a shortstop’s throw can be measured. An outfielder’s path to make a play on a fly ball can be computed for efficiency.

Sensor technology and strategically placed high-speed cameras capture enormous amounts of information, whether it is professional basketball (above), baseball or other sports.

Sensor technology and strategically placed high-speed cameras capture enormous amounts of information, whether it is professional basketball (above), baseball or other sports.

With this type of data, fan arguments about “who is the best player in a position?” will be bolstered with the facts. But a team’s management might also be able to better protect its assets. As an example, if a pitcher’s throwing arm delivery starts deviating from its normal pattern, it could provide an alert warning that the pitcher strained a leg muscle, and that further throwing could lead to a severe season-ending surgery.

What will teams and leagues do with all that data? That is both the fun and the conundrum of sports analytics. The breadth of uses of sensor technologies in sports is unimaginable.  

Sensors and high-tech cameras focused on the playing field bring up a controversial topic worth mentioning. Thanks to instant replay and sensors, television viewers already see precisely whether a tennis serve is in or out, whether a baseball pitch is a ball or a strike, or whether a football receiver got both of his feet down in bounds on a catch near the sideline. The same technology that is collecting data can also be used for “rules enforcement.” Will such technology eliminate the need for umpires, referees and other game officials? Wouldn’t we miss a baseball umpire raising his thumb and yelling “yourrrrrree out!” when a base runner slides into home plate just as the catcher applies the tag?

Before closing, the recreational and gaming/betting super-branches of the taxonomy tree of sports analytics deserve some mention. The recreational super-branch applies analytics with little or no emphasis on winning to avoid losing. Here, the purpose is to support personal health and nutrition. Think digitized treadmills, Fitbit and other wearable devices that not only monitor one’s food intake and calorie burn, but also provide periodic analysis. The gaming and betting super-branch applies analytics to fantasy league player drafting, betting point spreads, cheating detection and many more.

Where Do We Go from Here?

The wide breadth of applying sports analytics is apparent and possibly overwhelming, but that is a good thing. It means the opportunities for researchers and analysts are many.  What this article is intended to convey is the need to organize the “body of knowledge” – developing an initial structured taxonomy – for sports analytics to enable more efficient research and application of the tsunami of sports data that is approaching.

Gary Cokins (gcokins@garycokins.com) retired from SAS in 2013. He is founder of Analytics Based Performance Management LLC (www.garycokins.com).

Walt DeGrange (wdegrange@canallc.com) is a principal operations research analyst at CANA Advisors (www.canallc.com) and the chairperson for the INFORMS SpORts Section.

Stephen Chambal (stephen.chambal@theperducogroup.com) is the CEO of The Perduco Group (www.theperducogroup.com), a high-end data analytics company working in defense, healthcare and sports industries.

Russell Walker (russell-walker@kellogg.northwestern.edu) is clinical associate professor of managerial economics and decision sciences at the Kellogg School of Management of Northwestern University. He can be reached at http://www.russellwalkerphd.com/.
All four co-authors are members of the INFORMS Section on OR in Sports (SpORts).