### Saturday, May 31, 2008

## Rating IPL bowling

I have been saying in comments around the blogosphere that economy rate is probably much more important than bowling average in T20. I decided to work out just how much wickets are worth.

Once I finished getting some numbers out, I realised that the method I'd used was quite close to Duckworth-Lewis, and I could probably have just adopted the old DL tables for these purposes. Hopefully I'll get around to comparing them to what I got some time. In the meantime, I figured that IPL innings might be different from the last 20 overs of ODI innings, and that you all probably wanted a nine-colour scatterplot.

Each dot represents a wicket in the first innings of the league stage of the IPL. (I ignored second innings, since they don't always last 20 overs.) I've fitted linear curves for each wicket, forcing it through the origin (you can't score any runs with zero balls left). You'll note that the points near the origin tend to be above the best-fit lines — that's because of late-over slogging. That would be important if I wanted to adjust targets for a rain-rule method, but here I'm only interested in the gaps between the best-fit lines, to see what the wickets are worth.

We see that the wickets aren't particularly important. A wicket on the first ball of the innings reduces the final score, on average, by about two and a half runs. This agrees with common sense — with only twenty overs to bat, you can keep batting aggressively with the fall of a few wickets.

The slopes of the regression lines (to more significant figures than are really justified...) are:

0 (extrapolated from wickets 1 to 6): 1,378

1: 1,357

2: 1,329

3: 1,298

4: 1,271

5: 1,249

6: 1,233

7: 1,027

8: 0,459

9: 0,172

Now, we can use this to start evaluating the impact of bowlers. Suppose a bowler takes the fifth wicket on the last ball of the tenth over. With four wickets down with 60 balls left is worth, the batting team should score another 1,271*60 = 76,26 runs. With five wickets down, they should score 1,249*60 = 74,94 runs. The difference of 1,3 runs gets credited to the bowler. Do this for all the bowler's wickets, and you can adjust his runs conceded and get an effective economy rate.

There are a few points worth noting:

- There's no consideration of how high-scoring the pitch/ground is.

- The quality of the batsman dismissed is ignored.

- The same crediting applies in both first and second innings.

- If a team collapses quickly (say six wickets down by 10 or 12 overs), then the bowler who picks up the next wicket gets quite a lot of credit, since the difference between being in the tail and being in the recognised batsmen is large when there are still some overs left to bat. This isn't really fair on the bowlers who took the early wickets, but it doesn't seem to cause too many problems when comparing bowlers who bowl regularly.

The overall economy rate for bowlers during the IPL was about 1.36 runs per ball. By taking the effective economy rate and comparing it to the average, you get a measure of how many runs the bowler was worth. In the table below, I've called this the value-24: the number of runs above average the bowler is over 24 balls (kind of). I'm not good at coming up for names of things. I wanted to do this because I'm hoping to do something similar for batsmen (that is, get a run per game value for them), so that we can put batsmen and bowlers on the same scale.

The top bowlers, qual. 144 balls (i.e., six four-over spells):

Sohail Tanvir has, of course, been the stand-out bowler of the IPL. McGrath has been much talked about, but fellow metronome Shaun Pollock not so much.

Tanvir's also been lucky, of course. He's almost certainly not that good. I'll pick up this theme a little bit in the next post.

Lastly, some guys over at Rediff (e.g., here) have been doing what look to be good statistical analyses of the IPL. Unfortunately, they seem to sweep all the calculations under the carpet; if anyone happens to know how they calculate the MVP index, please share with us. (

Once I finished getting some numbers out, I realised that the method I'd used was quite close to Duckworth-Lewis, and I could probably have just adopted the old DL tables for these purposes. Hopefully I'll get around to comparing them to what I got some time. In the meantime, I figured that IPL innings might be different from the last 20 overs of ODI innings, and that you all probably wanted a nine-colour scatterplot.

Each dot represents a wicket in the first innings of the league stage of the IPL. (I ignored second innings, since they don't always last 20 overs.) I've fitted linear curves for each wicket, forcing it through the origin (you can't score any runs with zero balls left). You'll note that the points near the origin tend to be above the best-fit lines — that's because of late-over slogging. That would be important if I wanted to adjust targets for a rain-rule method, but here I'm only interested in the gaps between the best-fit lines, to see what the wickets are worth.

We see that the wickets aren't particularly important. A wicket on the first ball of the innings reduces the final score, on average, by about two and a half runs. This agrees with common sense — with only twenty overs to bat, you can keep batting aggressively with the fall of a few wickets.

The slopes of the regression lines (to more significant figures than are really justified...) are:

0 (extrapolated from wickets 1 to 6): 1,378

1: 1,357

2: 1,329

3: 1,298

4: 1,271

5: 1,249

6: 1,233

7: 1,027

8: 0,459

9: 0,172

Now, we can use this to start evaluating the impact of bowlers. Suppose a bowler takes the fifth wicket on the last ball of the tenth over. With four wickets down with 60 balls left is worth, the batting team should score another 1,271*60 = 76,26 runs. With five wickets down, they should score 1,249*60 = 74,94 runs. The difference of 1,3 runs gets credited to the bowler. Do this for all the bowler's wickets, and you can adjust his runs conceded and get an effective economy rate.

There are a few points worth noting:

- There's no consideration of how high-scoring the pitch/ground is.

- The quality of the batsman dismissed is ignored.

- The same crediting applies in both first and second innings.

- If a team collapses quickly (say six wickets down by 10 or 12 overs), then the bowler who picks up the next wicket gets quite a lot of credit, since the difference between being in the tail and being in the recognised batsmen is large when there are still some overs left to bat. This isn't really fair on the bowlers who took the early wickets, but it doesn't seem to cause too many problems when comparing bowlers who bowl regularly.

The overall economy rate for bowlers during the IPL was about 1.36 runs per ball. By taking the effective economy rate and comparing it to the average, you get a measure of how many runs the bowler was worth. In the table below, I've called this the value-24: the number of runs above average the bowler is over 24 balls (kind of). I'm not good at coming up for names of things. I wanted to do this because I'm hoping to do something similar for batsmen (that is, get a run per game value for them), so that we can put batsmen and bowlers on the same scale.

The top bowlers, qual. 144 balls (i.e., six four-over spells):

name balls runs wkts cred avg econ eff econ value-24

Sohail Tanvir 211 210 21 -50,34 10,0 5,97 4,54 14,40

GD McGrath 300 319 12 -22,89 26,6 6,38 5,92 8,87

SM Pollock 276 301 11 -25,34 27,4 6,54 5,99 8,59

IK Pathan 294 326 14 -31,11 23,3 6,65 6,02 8,49

MF Maharoof 192 215 12 -21,09 17,9 6,72 6,06 8,32

AB Dinda 234 260 9 -20,40 28,9 6,67 6,14 7,99

DW Steyn 228 252 10 -8,49 25,2 6,63 6,41 6,93

A Nehra 269 348 12 -50,57 29,0 7,76 6,63 6,02

AB Agarkar 156 207 8 -33,11 25,9 7,96 6,69 5,81

M Muralitharan 300 346 8 -8,83 43,3 6,92 6,74 5,59

DJ Bravo 170 232 11 -37,64 21,1 8,19 6,86 5,12

SR Watson 283 344 13 -19,57 26,5 7,29 6,88 5,05

SK Warne 264 349 17 -42,63 20,5 7,93 6,96 4,71

M Ntini 162 198 5 -8,34 39,6 7,33 7,02 4,46

Shahid Afridi 180 225 9 -13,55 25,0 7,50 7,05 4,37

Sohail Tanvir has, of course, been the stand-out bowler of the IPL. McGrath has been much talked about, but fellow metronome Shaun Pollock not so much.

Tanvir's also been lucky, of course. He's almost certainly not that good. I'll pick up this theme a little bit in the next post.

Lastly, some guys over at Rediff (e.g., here) have been doing what look to be good statistical analyses of the IPL. Unfortunately, they seem to sweep all the calculations under the carpet; if anyone happens to know how they calculate the MVP index, please share with us. (

**Edit**: Here's a description of it.)
Comments:

<< Home

David, good stuff. I can't work out why I've never noticed your blog before. I'm not sure you are quite capturing the value of wickets though.

What you are really doing is measuring the difference at each point in the innings between being x wickets down and x+1 wickets down.

You only start to see a measurable difference when a side loses its 6th+ wicket. Meaning, the value of a top-order wicket in the late overs is generally zero, and in the early overs, often zero, but sometimes substantial (because they eventually lost more). You note this in your text, but it matters for what follows.

I'd say four things (two picky, two substantial):

1) You should really use a quadratic to fit the lines better, though the difference is probably marginal (hard to say, the difference between bowlers is already small).

2) You are (sometimes) double counting the bowling metric, in that, if a bowler takes a wicket at the start of their spell, they have already received the benefits of a reduced run-rate in their existing economy rate.

3) I'm not sure you are right - when it comes to the bowler comparison that the 6th+ wicket isn't systematically understating the value of early wickets and over-valuing the contributions of some (spinners) over others (openers).

The real value of an early wicket is that it raises the probability that eventually the opposition will be 6+ down. Instead of giving the 6th+ wicket taker the full value of that difference, an improved metric would give the bowler the change in probability that a team will move from x to x+1 with a wicket (100%), and the change in probability of moving from x to x+2...10, which will vary by the number of overs remaining. To avoid double counting, the value of the change in later wickets is then reduced by the same probability. Essentially, you are redistributing the run-rate difference to other wickets.

4) You are comparing the wrong thing in the late overs. Early on, losing a wicket makes a probabilistic change in the final total In the late overs, there is little measurable difference in losing a 4th or 5th wicket in the late overs because either way a good batsman comes in to slog. But there may be a measurable difference between losing any wickets in the late overs and losing none at all - the dot ball, the new unset batsman, the change in momentum.

To capture it, you would need to add on the defensive value of getting a wicket after x overs without a wicket loss, versus no wicket at all at that point.

What you are really doing is measuring the difference at each point in the innings between being x wickets down and x+1 wickets down.

You only start to see a measurable difference when a side loses its 6th+ wicket. Meaning, the value of a top-order wicket in the late overs is generally zero, and in the early overs, often zero, but sometimes substantial (because they eventually lost more). You note this in your text, but it matters for what follows.

I'd say four things (two picky, two substantial):

1) You should really use a quadratic to fit the lines better, though the difference is probably marginal (hard to say, the difference between bowlers is already small).

2) You are (sometimes) double counting the bowling metric, in that, if a bowler takes a wicket at the start of their spell, they have already received the benefits of a reduced run-rate in their existing economy rate.

3) I'm not sure you are right - when it comes to the bowler comparison that the 6th+ wicket isn't systematically understating the value of early wickets and over-valuing the contributions of some (spinners) over others (openers).

The real value of an early wicket is that it raises the probability that eventually the opposition will be 6+ down. Instead of giving the 6th+ wicket taker the full value of that difference, an improved metric would give the bowler the change in probability that a team will move from x to x+1 with a wicket (100%), and the change in probability of moving from x to x+2...10, which will vary by the number of overs remaining. To avoid double counting, the value of the change in later wickets is then reduced by the same probability. Essentially, you are redistributing the run-rate difference to other wickets.

4) You are comparing the wrong thing in the late overs. Early on, losing a wicket makes a probabilistic change in the final total In the late overs, there is little measurable difference in losing a 4th or 5th wicket in the late overs because either way a good batsman comes in to slog. But there may be a measurable difference between losing any wickets in the late overs and losing none at all - the dot ball, the new unset batsman, the change in momentum.

To capture it, you would need to add on the defensive value of getting a wicket after x overs without a wicket loss, versus no wicket at all at that point.

That's for that Russ. Always good to see substantive comments here.

1) This is marginal.

2) Yeah, but it's usually less than a fifth of not many runs, so I didn't bother correcting for it. I also only used summary scorecards, so I didn't know how many balls the bowler had left to bowl. I'll probably download the ball-by-ball stuff at some point.

3) This one's been bothering me. I'll see if I can implement what you suggest (or something like it) this weekend.

I had justified what I did to myself by saying that if a team was five or six wickets down after ten overs, then the batsmen would be batting carefully, so the bowler would have to do more to take a wicket. But this is not too satisfying a justification.

4) I'm not big on "momentum" in general. I agree that batsmen get their eye in, but I'm not sure how much effect there is when the next batsman comes in and starts slogging. The bowler already gets the credit of the dot ball.

I'll have a look at the ball-by-ball data to see what I can get out of it though.

1) This is marginal.

2) Yeah, but it's usually less than a fifth of not many runs, so I didn't bother correcting for it. I also only used summary scorecards, so I didn't know how many balls the bowler had left to bowl. I'll probably download the ball-by-ball stuff at some point.

3) This one's been bothering me. I'll see if I can implement what you suggest (or something like it) this weekend.

I had justified what I did to myself by saying that if a team was five or six wickets down after ten overs, then the batsmen would be batting carefully, so the bowler would have to do more to take a wicket. But this is not too satisfying a justification.

4) I'm not big on "momentum" in general. I agree that batsmen get their eye in, but I'm not sure how much effect there is when the next batsman comes in and starts slogging. The bowler already gets the credit of the dot ball.

I'll have a look at the ball-by-ball data to see what I can get out of it though.

Hi,

This is my first time on your blog. Neat stuff. Very happy to find someone interested in the statistics of IPL. Just noticed that you are also a grad student :)

During the IPL, I came up with some visualization of different IPL statistics. They can be found here. http://www.cs.ucsb.edu/~acharya/IPL

I did these to both explore the visualization APIs and visualize the IPL statistics.

Regarding your "value-24", I tried defining something similar (I called it Impact factor), but never got around to actually exploring/fine-tuning it.

Post a Comment
This is my first time on your blog. Neat stuff. Very happy to find someone interested in the statistics of IPL. Just noticed that you are also a grad student :)

During the IPL, I came up with some visualization of different IPL statistics. They can be found here. http://www.cs.ucsb.edu/~acharya/IPL

I did these to both explore the visualization APIs and visualize the IPL statistics.

Regarding your "value-24", I tried defining something similar (I called it Impact factor), but never got around to actually exploring/fine-tuning it.

Subscribe to Post Comments [Atom]

<< Home

Subscribe to Posts [Atom]