After-math of the IPL

Your betweenness isn't good enough.
Your betweenness isn’t good enough.

Yes, we all know the Kolkata Knight Riders (KKR) won the Indian Premier League 2014, but who among all the teams’ many players really did well? And were they awarded for it? The former is a decidedly subjective question based on what you consider goodness in a cricketer – especially one playing in a new format of the game, Twenty20, that places different emphases on skills than by Tests and ODIs, the other formats. And a satisfactory answer to the latter question depends on how you answer the former. How do you take this forward?

Satyam Mukherjee, a postdoctoral fellow at the Kellogg School of Management, Northwestern University, has an answer. He has extended network analysis, a tool conventionally used to analyzing social media interactions, to cricket. On a social network like Facebook, people are treated as nodes and the connections between the nodes denote certain things about how the nodes are interacting. On a cricket ground, each team becomes one network in Mukherjee’s notebook, and the competition between them is a competition to be the better network.

Cricket is played by two teams at a time with 11 players per team. At all points during the game, teamwork is paramount although to varying extents. Even if one player fails to perform, the game could be lost by the team that player belongs to. Conversely, if one player plays too well, then the burden on the rest of the team is lighter.

In this scenario, network analysis provides a useful way to look not at players’ skills but how they’ve deployed them in situations that required their deployment.

In the second qualifier in IPL 2014, the Chennai Super Kings (CSK) trounced the Mumbai Indians (MI). Batting first, MI were restricted to 173 on a surface on which defending 190+ would’ve been easier, thanks to economic and incisive bowling from R. Ashwin and Mohit Sharma respectively.

Nonetheless, the sub-par score did require CSK to score at a stiff 8.7 runs-per-over (rpo) to win – and Suresh Raina became the man to do this, taking them to 176 in 18.4 overs at a rate of 9.4 rpo. Even though he had mowed down a sub-par score on a batting-friendly ground, he was awarded the ‘Man of the Match’ title, not Ashwin or Sharma. Why was that?

By addressing each game as a meeting of two networks that can interact only in specific ways, network analysis throws up four metrics that represent the quality of the interactions. They are

  1. PageRank – A proportional measure that describes the importance of a player
  2. In-strength – The sum of the fractions of runs a player has scored in partnership with others players
  3. Betweenness – Denotes the number of partnerships a batsman was involved in
  4. Closeness – Another proportional measure that describes the ability of a player to adapt to different batting positions (higher, middle, lower, etc.)

It’s reasonable that whichever player has made the most significant contribution to the values of those metrics deserves the ‘Man of the Match’ title. In the MI v. CSK game, Raina outperforms any other player from CSK, the winning team (from which the MoM is usually chosen).

Favoring batsmen, just like the game

Some things are immediately clear. One, PageRank and closeness are like global variables, with scores that can be carried and calculated across games (And while their definition seems arbitrary, their values do abide by well-defined formulas).

Two, all four metrics are relevant for batsmen and not bowlers or fielders. This is odd because it is the bowling side (same as the fielding side) that requires all the teamwork in the game. Does this mean Mukherjee’s approach is invalid? Not entirely because it is still useful in assessing how well batsmen have performed against certain oppositions and from certain batting positions.

For example, in the first qualifier in IPL 2014, KKR put in an all-round great performance to strangle KXIP and proceed to the finals. Ryan ten Doeschate (KKR) hit two sixes in the death overs to revive a sagging run rate and anchored two partnerships on the way. He takes the highest PageRank and betweenness in the game. Piyush Chawla scored the majority of the runs in the two partnerships he was involved in and has the highest in-strength. Finally, Robin Uthappa finished with the highest centrality, having played both as a upper-middle- and opening batsman in the tournament.

So much stands to reason, but this is where things get interesting because the ‘Man of the Match’ was Umesh Yadav.

How did this happen? A common woe among IPL teams is the performance of the ominously named death bowler, i.e. the player who bowls during the last four overs of a Twenty20 game. Over seven editions of the IPL, these so-called death overs have become notorious for the pace at which teams accrue runs in them. A death bowler, therefore, has to be good enough to stem the flow even if an in-form batsman is at the crease.

During the KKR v. KXIP game, Yadav bowled four overs for 13 runs (3.25 rpo) and took three wickets – Sehwag’s, Maxwell’s and Bailey’s. Moreover, one was a death over in which he conceded the princely sum of 1 run. These are match-winning feats in a Twenty20 game, and Yadav more than deserved to become the ‘Man of the Match’.

mf1

For the final game, Mukherjee drew up a network visualization (above) of the batting partnerships of KXIP and KKR. The nodes are colored according to their betweenness centrality. The size of each node is proportional to its PageRank. The colors of the connections are according to the colors of the source nodes. “For example, if we see the connection between Uthappa and Gambhir, Gambhir has a larger share of the runs they scored,” he explained.

Bowling performances notwithstanding: In IPL 2014, “for a majority of the matches, the Man of the Match compares well with the top three performers as per their centrality measures,” Mukherjee said. He said he hopes that such tools would work their way into extant decision-making procedures as a way to eliminate vested interests, biases and “close calls” as well as to help recruit new players. In the future, Mukherkjee plans to work something in to gauge bowlers and fielders, too.

Earlier, he had similarly analyzed the 2013 Ashes series held in and won by England. Then, the ‘Man of the Match’ awards agreed with his analysis of the games: Joe Root, Michael Clarke and Shane Watson, each of whom had higher in-strength and betweenness centrality than other players. He published his methods and results in Advances in Complex Systems in November 2013 (pre-print).

The numbers game

Cricket is a complex game – about as difficult to get a full hang of as a 21-year old trying to learn English from scratch. It takes a while, and a lot of practice. Even then, many people (I know) still have difficulty getting all the rules right. As a result, there are a lot of numbers that emerge after each game – so many runs scored in different directions, so many balls bowled at so-so speeds, at so-so lengths to so many batsmen, so many partnerships each lasting so many balls, so many successful and not-so-successful fielding positions, etc. In short, cricket is a statistics-heavy game, perhaps heavier than baseball itself. So if baseball had sabermetrics, what does cricket have?

Nothing official in place, for starters. Cricketing sabermetrics isn’t new, but it isn’t prevalent either – the reasons are too many to be dealt with here, but not the least of them is that cricket is also more complex than baseball. Building a statistical framework to encompass all of its nuances is difficult. So, a simplified version of cricketing sabermetrics – one making a lot of assumptions – assessing only the batsmen’s performance during Ashes 2013 caught my interest. Satyam Mukherjee, a post-doctoral fellow at the Kellogg School of Management, had used complex network analysis to figure out why Clarke, Trott and Bell were the better players during the tournament, and he establishes it with mathematical proof.

His work also raises a lot of questions on the relevance of such mechanisms in modern sport. Read my piece on this work for The Hindu.

Photo: Ashes2013.net

After-math of the Ashes

In the recently concluded Ashes test series, England retained the urn by beating Australia 3-0 in five games. England always looked the more confident team, reinforced as well as evinced by the confidence each player had on every other. They batted well, they bowled well, they fielded well.

Australian players, on the other hand, looked out of place. Often, great performances by a batsman or a bowler didn’t translate into the rest of the team moving with that spirit, betraying high – if not unreasonable – dependence on some players, who were expected to bear the burden.

Now, a post-doctoral fellow from the Kellogg School of Management, Northwestern University, has put these conclusions to the test. Satyam Mukherjee has used complex network analysis to determine how England and Australia differed in their strategies during the Ashes matches, to gauge the “quality” of wins and how much of a role each player played in it.

Satyam thinks that, to the best of his knowledge, “this work is the first of its kind in cricket”, and is hoped to motivate analysts to look at behind-the-scenes statistics of players whose best skills may not always be brought to the fore.

Specifically, Satyam uses concepts like the PageRank algorithm (which Google uses to determine the ‘influentiality’ of websites), betweenness and network centrality, and treats each team as a network of players who have to perform specific roles.

Math & matters of the heart

“Two football players are linked if one player passes the ball to another, a pitcher and batter is connected if they face each other, or Nadal and Federer get connected if they play against each other,” says Satyam, explaining how networks are built. “But in cricket, no such studies exist although there is no dearth of statistics.”

However, a network-analysis of a game of cricket is much less straightforward as the success of the game doesn’t depend solely on the ball being pass around or batsmen like Tendulkar and Lara facing each other off. Instead, they face off different bowlers, which means their performances can’t be compared directly, either.


What a self-organised social network looks like (nodes of the same colour are of the same group). Image: Wikimedia Commons

So he used publically available data from Cricinfo to compute the network performance of players and how well they’d performed different roles. “The network based approach gives us the hidden properties of the performance of players,” Satyam explains, adding that the advantage is that “it doesn’t suffer from any biases which exist in traditional schemes.”

In his network analysis, each player is thought of as a node (as shown above) in a network, with the lines connecting them being the runs scored by them together. This way, as the game progresses through different partnerships, nodes are added and connected, with the distance between nodes denoting the number of runs.

Then, Satyam brings his tools to reveal, when studied as a network of people trying to accomplish a common goal with different skills between them, how the team strategised and how it fell short.

According to his calculations, for example, Gautam Gambhir was the most successful player in terms of centrality scores during the 2011 ICC World Cup final for India. This means that he was involved in the most number of batting partnerships during the game (betweenness centrality). However, the man-of-the-match award went to skipper M.S. Dhoni.

“So there is a human bias coming into play,” exclaims Satyam. However, this doesn’t come across as a call to replace the more “spiritual” aspects of the game with a mathematical framework. Instead, Satyam is vouching for using such analytical methods to decrease the chances of missing out on important statistics that come into play during drafting, team-selection, etc.

Teams as competing networks

These and other network analysis concepts have been around for quite a while. They have been applied to sports for the last decade or so, quite famously to football using the Girvan-Newman algorithm and others. The parameters they use to “evaluate” teams are simple.

PageRank, a relatively newer measure developed by and named for Google co-founder Larry Page, measures the “quality” of outcomes (i.e. wins or losses).

In the context of a match, PageRank scores give a measure of the quality of wins. If a weak team wins against a relatively stronger team, it gains points. However, if the weak team loses to a strong team, it isn’t penalized that much. Each outcome’s PageRank is dependent on the performance of every player.

In the context of players’ performance, “it gives the importance of the player in the batting line up,” explains Satyam. In other words, it provides us with an idea of the importance of runs scored — such as Graeme Swann’s 34 in England’s first innings of the final test.

It is calculated as:

… where,

p_i = PageRank score

w_ij = weight of a link

s_j-out = out-strength of a link

i = whichever team it is

q = control parameter = 0.15 (default)

N = total number of players in the network

δ = a correcting term

In-strength is the sum of the fractions of runs a player has scored in partnership with others players.

Closeness measures the connectedness of a player in the team. The ‘closer’ he is, the more open he will be to his place in the playing order being changed. For example, the ‘closest’ batsmen will be comfortable opening the batting, playing in the middle order, or holding up the lower order. This can be decided based on the match situation, pitch conditions, availability of other players, etc. Thus, having ‘close’ players increases the adaptability of the team.

The results

Satyam put together a network of players in each of the five matches, and computed these scores for all of them in terms of their batting performances.

He found that, in the first and second games both of which England won, Ian Bell and Joe Root emerged as the best batsmen, respectively. Bell, especially, had the highest PageRank, in-strength, betweenness and closeness among all batsmen. In the second match, Root had the highest in-strength, betweenness and closeness, but Usman Khawaja and Michael Clarke beat him to the top on PageRank.

In the third test, Australia dominated the game. However, the domination arose through Clarke, the man of the match, while the rest of the players put up a less-than-dominating performance. In fact, this pattern was visible in Australia throughout the tournament. As opposed to it, England’s batsmen’s betweennesses were more evenly distributed. Everyone seems to have contributed, not just the top order.

For example, one batsman who regularly features in the top five players in terms of PageRank is Tim Bresnan, an all-rounder. Thus, his ability to build partnerships even when most specialist batsmen had departed was crucial for England to have stayed on top –such as in the fourth test at Riverside Ground. Looking at the overall scores: Bell – most betweenness centrality; Matt Prior – most closeness; Jonathan Trott – highest in-strength; Graeme Swann – highest PageRank.


A batsmen’s performance network as it transpired during the Ashes 2013. Notice how almost every player on the English side was capable of holding up partnerships while, for Australia, noticeable ‘hubs’ exist in the guise of Haddin, Hughes and Rogers. Image: Satyam Mukherjee

For Australia, on the other hand, Haddin, Hughes and Rogers have high betweenness centrality, which quickly drops off when other pairs are considered.

Accordingly, batsmen who received man-of-the-match awards during the series were Joe Root, Michael Clarke, and Shane Watson. This is an instance of simple mathematical concepts having encapsulated our practical considerations well enough to have reached almost the same conclusions (even though a lot of assumptions were made in the process). However, cricket is only new to this arena.

An informer

In 2003, Michael Lewis published a book titled Moneyball: The Art of Winning an Unfair Game. It brought to a wider audience the field of study called sabermetrics, which uses in-game statistics in baseball to separate objective judgments – “Who contributed the most…” – from subjective ones – “Was that a great…” – so that teams are aware of what their strongest and weakest resources are.

In 2011, this book was adapted into a successful movie, starring Brad Pitt. Although I’d heard of the book at the time, I hadn’t read it, and the movie helped me confront for the first time how managers unfamiliar with sabermetrics’ pros might react to the idea. In the movie, many of them quit (However, most of them were old, too, and just couldn’t cope with the power to pick or drop players leaving their hands and falling into those of some “new fangled, cold-and-calculated” sabermetrician).

For this, what network analysis in sports can bring to the fore has to be understood well before it is dismissed. In its simplest form, it makes correcting for regional biases in selections easy and helps spot ‘hidden’ talent in the domestic circuit. At its very nuanced, it could factor in bowlers and fielders, not just batsmen, and also include an “athletic index” for each batsman to denote how agile he is between the wickets, to see who has been the best performer (a suggestion included in Satyam Mukherjee’s paper).

Of course, for the game to stay competitive and entertaining, both subjective and objective methods are important. Even with cricket, I can’t imagine the BCCI resorting completely to sabermetrics’ version of cricket to choose the national cricket team – how would they be able to account for the reassuring presence of Captain Cool? Instead, they could use such tools to better inform their decision-making.

(For those interested, a more detailed presentation of Satyam’s methods is available in this paper.)

(This blog post first appeared at The Copernican on September 8, 2013.)