Sunday, October 4, 2015

Game Tree: Point Progression


"Game Tree" is a depiction of Point Progression for a selection of games within a tennis match or across a series of tennis matches; it is a Sankey Diagram and possesses the "Markov property", meaning that the set of future "states" that are possible are constrained by the current "state", the point score at any moment in a game. Here is a nice interactive explanation.
"Markov Chains" have often been applied to tennis games.  Google "Markov Tennis" and you'll find a large number of articles on statistics which use Tennis to explore probability.  A few of the results use Data Flow Diagrams to depict Point Progression: Wolfram Alpha has an attractive visualization (see above) which was reproduced in an article on Predictive Modeling; and NC State University produced a YouTube video, as part of an online course titled "Introduction to Finite Math", with a whiteboard explanation of state transitions in tennis games.  When visualizations are provided they are usually arranged like the GameFish in TennisVisuals.com, with either horizontal or vertical orientation:

As far as I can tell, the Game Tree design created by Damien Saunder and David Webb at GameSetMap.com is the first time a Sankey Diagram (or Harness Flow Map) was applied to Point Progression in Tennis. The primary innovation was to apply the idea of "quantitive flow lines" to the possible point paths through the tree such that the width of each line represents the frequency with which games passed through each possible "state" for the score, but the real power of the design comes from its interactive nature.  SVG (Scalable Vector Graphics) are used to:
  1. animate exploration of the data when it is filtered by selecting individual games, groups of games, or constraining the games to only service games for a chosen player
  2. provide contextual information when "hovering" over specific elements
The original implementation of Game Tree, presented as a celebration of Nadal's 2013 comeback, used match data downloaded in XML format from the William Hill Sports betting website.


In the TennisVisuals version of Game Tree, data is retrieved in JSON format from the Mongo database which underpins TennisVisuals.com.  That data, in turn, is presently sourced from Jeff Sackmann's Match Charting Project (many other data sources will come online soon).

The inspiration for the Game Tree design seems to have been the same frustration that drove the development of the Points-to-Set chart: the final score of a tennis match reveals very little about how close a match actually may have been. Even a match with a 6-0, 6-0 score may have been "hotly contested".  Traditional stats miss the story every time. Percentage of Points Won for a 6-0, 6-0 match, for instance, provides only a very crude view of match intensity - ranging from 100% for complete dominance by one player to 62.5% for a match in which every game reached Deuce once and only once - it relates nothing of the drama and is of very little use for constructive analysis.

In the following Game Tree visualization of Nadal's service games in a match against Wawrinka at the 2013 Madrid Masters, it is easy to see that Nadal won the first point of his service games 77.8% of the time.  When he did lose the first point in a match, 100% of the time he won the second point.


With Game Tree it is possible to see how often Deuce was reached during a match; the thickness of flow lines even indicates how often game scores ricocheted between Deuce and Advantage.  In the match with Wawrinka, for games both served and received, Nadal lost only one game that reached Deuce:


In the Saunder/Webb implementation of Game Tree, the "Nodes" of the tree are color-coded to indicate momentum.  Dark nodes represent positive momentum while Red nodes represent negative momentum.  In the TennisVisuals version of Game Tree these representations still hold true, but momentum is always viewed from the perspective of the primary player; when filtering for the opponent's service game, the tree is not "flipped", as occurs in the Saunder/Webb version of the Nadal-Djokovic Roland Garros 2014 final.

The relative importance of each point in tennis games, sets, and matches has been analyzed extensively, most famously by Carl Morris in his article "The most important points in tennis", which was published in Optimal Strategies in Sports in 1977.   It is probably impossible to publish analysis of points in tennis without referencing Morris...  here is one of many studies, notable for its visualization of relative point importance within the context of a set:
In 2014, Professors Franc Klassen and Jan Magnus provided ample coverage of the topic in their book "Analyzing Wimbledon".  Most recently Jeff Sackmann wrote a series of blog posts ("How Important is the First Point of Each Game?""The Pivotal Point of 15-30") drawing on a theoretical model which he has published and utilizing his extensive match database.

My plan is to integrate the insights garnered from such analyses into the TennisVisuals version of Game Tree so that results for each match can be viewed in the context of benchmark figures. I'd like to auto-generate summary reports to go along with Game Tree visualizations, similar to what Saunder has done for Nadal-Djokovic 2014 Roland Garros Final.  I also plan to divide each point "flow line" into errors and winners and highlight "clutch" performance.

Shortly after releasing the initial version of Game Tree, Saunder published a follow-up entitled "Where are you most likely to win a point on Nadal's serve?" In this article he introduced a "Proportional Symbol Game Tree" which shows the percent chance at every possible "state" of the score that an opponent had of winning the point.  It's enticing to think about using a similar Game Tree to visualize a player's service game performance in one match relative to their average over the course of the past year...  perhaps overlapped Proportional Symbols of reduced opacity...




No comments:

Post a Comment