The Sports Analytics Explosion

Opportunities for research: drilling deeper into team sports.

Analytical methods can provide a competitive edge to win more games, better manage a team’s business, perform at higher levels and prevent injuries. Image © Pariwat Intrawut | 123rf.com

Analytical methods can provide a competitive edge to win more games, better manage a team’s business, perform at higher levels and prevent injuries. Image © Pariwat Intrawut | 123rf.com

By Gary Cokins and Dave Schrader

The June 2016 issue of OR/MS Today included an article, “Sports Analytics Taxonomy, V1.0” [1], that showed the taxonomy as a tree with branches and leaves that describe how sports information could be catalogued to make better data-driven decisions. The three major branches in that taxonomy were team sports, individual sports and league sports management. This article drills more deeply into team sports with examples and illustrates interesting research problems, along with analytical techniques.

Team sports analytics

Team sports analytics can be divided into front-office “business-side” analytics, back-office “team operations” analytics, and health and safety analytics. A landmark paper by Thomas Davenport described many examples in these three areas [2]. Drilling into these areas exposes commonality with business analytic techniques.

1. The business side of sports includes topics such as ticket pricing, merchandising/brand licensing, venue management, sponsorship return on investment, and TV and radio contracts. For professional sports, it also includes (with inputs from scouts, coaches and owners) financial analytics for player acquisition and trades, and salary cap optimizations.

Sports business shares many analytical techniques with customer relationship management (CRM) techniques used by marketing and sales, as well as statistical approaches by the finance organization. Substitute “fan” for “customer” and you’ll find sports organizations equally interested in monitoring fan sentiment (what are fans saying about the team, players, coaches and opponents). The same techniques used for “revenue optimization” through dynamic pricing of airline seats apply to the pricing of seats for individual games. Mobile phone plan pricing with various tiered options is akin to season ticket pricing.

2. The operations side of sports depends heavily on new big data techniques, since so much of game planning and competitive analysis depends on video or sensor processing. Internet of Things (IoT) techniques for fast streaming, filtering and geospatial analysis are now being adapted for sports. The most interesting problems involve “coverage” or “space creation and destruction” as players move around a field, court or rink. Current statistics are too offense-centric, giving credit to the player who made the score but with little or no credit to his/her teammate who created open space; this will change.

3. The health and safety aspects of sports rely heavily on techniques used in the healthcare industry, an area of vibrant research. Predictive models for injury prevention, sleep and rest, and nutrition all apply. Athletes can be modeled like machines with useful lifetimes and failure modalities, so techniques such as hazard functions and time-to-failure models for locomotives or airliners can predict useful player career timelines.

Nine examples of research projects in these three areas of analytics appear below. Many come from research by members of the Teradata University Network (TUN) [3] doing “Moneyball on Campus” projects that involve athletic departments with data and problems, and business school students and faculty who can analyze their data to develop insights. Other examples come from best practice work by analysts who present their research results at the annual MIT Sloan School Sports Analytics Conference (SSAC) that takes place in February/March each year in Boston [4].

Substitute “fan” for “customer” and you’ll find sports organizations equally interested in monitoring fan sentiment. Image © Oleksii Sidorov | 123rf.com

Substitute “fan” for “customer” and you’ll find sports organizations equally interested in monitoring fan sentiment. Image © Oleksii Sidorov | 123rf.com

Information Management

All projects need to collect and analyze data. Sports data may be easily obtainable (e.g., ticket sales where customers identify themselves with credit cards, names and addresses) or difficult (e.g., real-time medical information about players). Which data is collected depends on which questions need to be answered, as well as which data is available. Historically, analytics have been siloed efforts in the three areas mentioned above with little or no thought to building an integrated system for data analysis. Often, the data is in files or Excel rather than database environments that offer rich statistics and analytics packages, complete with visualization technologies.

Nine examples of sports analytics:

1. Sports metadata. Researchers can benefit by having access to standardized and detailed descriptions of all entities, attributes and relationships among objects for each sport. This is the metadata problem.

For example, team sports entities to model include teams that have seasons with games against opponents. Teams consist of players who try to score goals with a ball or puck using various play techniques. There may be offensive and defensive plays. Each player on the field, court or rink may have a position, a history of game performance (including goals or penalties), a training program and perhaps a history of injuries.

It really doesn’t matter if one is talking football, soccer, hockey, lacrosse or basketball – these sports all have the same kinds of concepts, entities and relationships. For example, substitute “puck” for “ball” in the case of hockey and many of the same concepts apply.

Students and faculty at Loyola University in Chicago have started to build metadata models that can be shared. Students built a basketball model using a publically available and free tool called ERDPlus (www.erdplus.com). Their first results are shown in Figure 1.

Figure 1: College basketball: entity relationship diagram for a single game.

Figure 1: College basketball: entity relationship diagram for a single game.

We can leverage a model built for one sport like basketball for others, like soccer or rugby. As students build additional modules within one sport, we can carve out common areas, like “fans,” and leverage those across all the sports models. For example, fans may buy tickets, have favorite teams, send tweets about the teams or players, and reside in particular geographic locations. Computer science practitioners will recognize this as building a type hierarchy with type specializations.

2. Sales and marketing management: ticketing. Research and analysis can help sports business managers understand the demographics of who buys tickets to games so they can target different types of customers with tailored advertising and offers. Dynamic ticket pricing can increase revenues and fill arenas.

Two TUN projects are underway at the University of North Carolina at Greensboro and at Wright State University. The objective is for business school students and faculty to analyze season and single game ticket data to understand who attends games and how many people renew season tickets. The feedback can help athletic departments know where their most loyal customers live and what their lifetime buying patterns are so they can refine future marketing campaigns.

For pro team sports, Major League Baseball led the way on dynamic ticket pricing. An initial San Francisco Giants’ study [5] revealed that 10 of 29 predictive model variables in their initial model were significant. Examples include the opposing team’s win-loss record, day of week, time of game, starting pitchers and how many opposing team all-stars play. The Sports Institute at the University in Cologne, Germany, studied ticket pricing for the FC Bayern Munich soccer team [5]. Its results indicate that they could easily double prices for eight categories of tickets with no impact of reduced attendance!

3. Athlete recruiting. The movie “Moneyball” starring Brad Pitt popularized the idea of using analytics to do better baseball talent recruiting. All pro teams across all sports now use analytics to create deeper insights on the best players to draft or trade.

For colleges, recruiting high school athletes is like the sales funnel lead problem for businesses. How much time and effort should go into recruiting any particular athlete? How can one track the progression from a raw lead to signing with the team? Which athletes in the pipeline are at risk of being picked off by the competition? The goal is to optimize staff time/effort and maximize the selection of the best possible recruits.

Athlete recruiting optimization applies across multiple sports, but it is an especially acute problem for college football given the numbers of leads (as opposed to college basketball or soccer). A TUN video based on Bryant University shows new factors for recruiting football wide receivers at the collegiate level [6].

Now let’s turn to some analytics problems in the second of the three areas of analytics – the back office operations area.

4. Space coverage. The problem of creating space (offense) or reducing space (defense) is fundamental to many team sports. Animations and statistics about space over time can be very helpful. These visualizations involve mapping each player’s location data (using RFID or GPS or from video annotations) to points on a Cartesian coordinate plane. A landmark paper in this space by Kirk Goldsberry illustrates key ideas [7]. Derived measures like the distance between players or clusters of players can be calculated. “Cones of coverage” can be computed given windows of time and ball locations, as well as advanced defense statistics never before available.

This field is advancing at a fast clip. Second Spectrum [8] leads the way for basketball. Disney Research in Pittsburgh analyzed three English Premier League soccer teams’ offensive and defensive space coverage [9].

5. Improved predictive models for play tactics. Defensive coaches in (American) football often watch films of their upcoming opponent’s prior games to detect play tendencies. The offense obviously knows what play they will execute. The defense tries to determine what the offense’s next play will be.

Many possible variables exist for building a next-play model. What yard line is the ball on and which hash mark? What is the down number and yard distance to a first down? What is the score differential at the time of this play? How much time is left? What is the offense formation? What is the defense players’ formation? What is the outcome of each play? The sequence of plays can also be relevant.

In a TUN-sponsored project at Oklahoma State University, students analyzed a year’s worth of University of Dubuque football play data. The students used 45 fields for each play to construct a cascaded decision model that predicts whether the next play will be a run or a pass (see Figure 2). They achieved 75 percent accuracy (after only one week of modeling!).

Figure 2: Decision tree to predict if the next football play will be a pass or run.

Figure 2: Decision tree to predict if the next football play will be a pass or run.

The students also built heat maps of passing zones to help the coach see where his defense is weak in pass coverage. Finally, they built interactive Sankey diagrams using Teradata’s Aster tool to help the coach see, for “explosive plays” (the offense gains 16+ yards passing or 12+ yards rushing), which of his defenses gave up those yardage numbers and which offenses they did not defend well as shown in Figure 3.

Figure 3: Sankey diagram shows what combinations of football offensive and defensive formations are more/less effective for explosive, high-yardage plays.

Figure 3: Sankey diagram shows what combinations of football offensive and defensive formations are more/less effective for explosive, high-yardage plays.

6. Predicting player injuries. The National Collegiate Athletic Association (NCAA) maintains an injury surveillance database that it shares with qualified researchers. It contains information about 12,800 athlete injuries per year for the 380,000 monitored athletes [10]. The latest injury summary by sport shows that the incidence of concussion is not highest in football, but rather wrestling. Women collegiate athletes experience ACL tears at a rate five times that of men [11] (Figure 4).

Figure 4: High-propensity injury locations for women’s soccer players.

Figure 4: High-propensity injury locations for women’s soccer players.

Rather than react to injuries after they occur, promising research aims to prevent injuries through predictive models. With wearable biosensors, athletes can be monitored and compared to baseline metrics – both to other skill-position players, as well as to their own histories. Injury profiles for each sport, like women’s soccer, can be built and become the focus for better training to avoid injuries.

For example, if the arm motion of a baseball pitcher begins to deviate from its baseline, it could imply an injury for which the pitcher’s arm motion is compensating. This could lead to more serious injury and surgery that might be avoided.

Research is underway at Auburn University and the University of Tennessee-Chattanooga to use accelerometers in mobile phones to collect data from a variety of tests, and then build predictive models for injuries across a variety of sports and skill positions. For example, a 10 percent difference in an athlete’s ability to balance on one leg for 20 seconds vs. the other might indicate a core strength problem that could be remediated through selected training techniques [12].

7. Player reaction time. With sensor technologies that monitor player movement, researchers can track reaction times. Sample stimuli include ball movements and trajectories, opposing player movements and even the movements of teammates. All players are not the same when it comes to their reaction times so this is another factor for recruiting.

Research by Jocelyn Faubert at the University of Montreal [13] has been performed for hockey, rugby and soccer. A “bouncing balls test” showed significant differences in the ability of professional, college and non-athletes to track multiple moving objects. This research could be extended to football linebackers who must read and then react to offensive formations and the movements of players before the snap. For soccer, researchers could test the ability to “see” and react to where the ball lands compared to the relative positioning of teammates and the opposing team’s defenders.

These examples drive more general questions: What are relevant baseline measures of reaction times that could be defined for each sport? For each skill position? What drills or coaching techniques would be useful to improve player reaction times?

8. Predicting ball trajectories. Diving deeper, can we use technology to study which players are the best at predicting trajectory information such as a ball’s speed, angle of trajectory, and the location where the object will land? Examples include a batted baseball, a thrown football, a soccer kick, a hockey slap shot or a basketball rebound. Interesting research for baseball by Peter Fadde of Southern Illinois University [14] uses visual occlusion systems to see whether it is the reaction time or the ability to “read pitcher body cues” that drives the ability of a batter to discern whether the next pitch is a fast ball, curve, slider, etc., and when to swing.

9. The value of sleep. Other factors such as sleep and nutrition also contribute to performance. An amazing Stanford study [15] shows that if varsity basketball players sleep at least 10 hours a night, their sprint speeds decrease from 16.2 to 15.5 seconds, and both free-throw and 3-point shooting accuracy goes up by 9 percent.

The dawning of sports science

Art, craft or science? Sport team managers and coaches have traditionally viewed their role as a craft, but team and player management is becoming more scientific. Coaches and trainers, and even medical staff will still have notebooks filled with their ideas of what will work or not in various play or training/recovery situations, but insights from analytics will help them test and validate their ideas. Just as with business, the ROI of investments in analytics will become more widely known as teams succeed with data-driven approaches.

So where do we go from here? Analytical methods can provide a competitive edge to win more games, better manage a team’s business, perform at higher levels and prevent injuries. This article provided examples of how researchers, coaches, athletic directors, trainers and players can work together to advance the field of sports analytics, but there’s still much to be done.

Thanks to the many enthusiastic faculty and students who have joined the Teradata University Network to start doing interesting research to move this exciting field forward. Readers interested in sports analytics reading lists or publicly available sports data sets to investigate should contact the authors.

Gary Cokins (gcokins@garycokins.com) retired from SAS in 2013. He is founder of Analytics-Based Performance Management LLC (www.garycokins.com) and is in the Baseball Hall of Fame for having developed the oldest computer baseball game (1969).

Dave Schrader (drdaveschrader@gmail.com) is on the Board of Advisors for the Teradata University Network. He retired from Teradata in 2014. In 2016, he gave 47 talks at 25 universities to more than 2,700 students, faculty and coaches about sports analytics.

Notes & References

  1. Gary Cokins, Walt DeGrange, Stephen Chambal and Russell Walker, 2016, “Sports Analytics Taxonomy, V1.0,” ORMS Today, June 2016. Available online at: http://viewer.zmags.com/publication/085442e2#/085442e2/42?platform=hootsuite
  2. Tom Davenport, 2014, “Analytics in Sports – The New Science of Winning.” Appears as a SAS-sponsored white paper online at http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/iia-analytics-in-sports-106993.pdf
  3. Teradata University Network, a free website for faculty and students to access homework assignments, reading lists, curriculum suggestions, videos, case studies and research suggestions. See www.teradatauniversitynetwork.com.
  4. MIT Sloan Sports Analytics Conference. See http://www.sloansportsconference.com/
  5. Christoph Kemper and Christoph Breuer, 2016, “How Efficient is Dynamic Pricing for Sports Events? Designing a Dynamic Pricing Model for Bayern Munich,” International Journal of Sports Finance, Vol. 11, No 1. Available online at: http://fitpublishing.com/articles/how-efficient-dynamic-pricing-sport-events-designing-dynamic-pricing,
  6. Seven-minute minute video appears at http://www.teradatauniversitynetwork.com/About-Us/Whats-New/BSI--Sports-Analytics---Precision-Football/
  7. Kirk Goldsberry, 2014, “Databall,” Grantland. Appears at http://grantland.com/features/expected-value-possession-nba-analytics/
  8. Rajiv Maheswaran, March 2015, TEDx Talk in Vancouver, B.C. Appears at http://www.secondspectrum.com/videos/ along with other videos of this technology.
  9. Iavor Bojinov and Luke Bornny, 2016,“The Pressing Game: Optimal Defensive Disruption in Soccer,” research report from MIT SSAC16 appears at http://www.sloansportsconference.com/?page_id=462
  10. NCAA Sports Injuries Website at http://www.ncaa.org/health-and-safety/medical-conditions/sports-injuries. Detailed facts sheets for selected sports are at http://www.datalyscenter.org/fact-sheets/
  11. http://www.livestrong.com/article/513231-frequency-of-injury-among-college-athletes/
  12. Ross Gruetzemacher, Ashish Gupta and Gary Wilkerson, 2016, “Sports Injury Prevention Screen (SIPS): Design and Architecture of an Internet of Things (IoT) Based Analytics Health App.” Available online at: http://aisel.aisnet.org/confirm2016/18/
  13. Jocelyn Faubert, 2013, “Professional athletes have extraordinary skills for rapidly learning complex and neutral dynamic visual scenes,” Scientific Reports 3, Article 1154, 31 January 2013. An animation of the test appears at 7:30 in a TEDx Montreal talk at https://vimeo.com/86217755
  14. Peter Fadde, Southern Illinois University. See his website at: http://peterfadde.com/projectspitchbaseball.html for several articles and videos on his research.
  15. Cheri Mah, Kenneth Mah, Eric Kezirian and William Dement, 2011, “The Effects of Sleep Extension on the Athletic Performance of Collegiate Basketball Players,” Sleep, Vol. 34, No. 7, pp. 943-950. Abstract available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3119836/