Sunday, March 22, 2009
New cricket stats analysis blog
While I'm being lazy and not doing any real updates (I am getting closer to writing all the readme files etc. for my database), you can get your stats fix from this blog, which has been around for a few months and is generally very good. It is likely that some of the new ideas will need to be improved over time, but they are very valuable first step, especially on declaration stategies (on this, Elliot Tonkes has done some academic work on this, but I don't know if it's been published) and wins above replacement for ODI's.
Do check it out.
Do check it out.
Sunday, March 15, 2009
Batted ball speed
Hey look, I have this thing called a blog, maybe I should update it.
I should update my profile – I'm no longer a student (sort of, my Masters thesis is under assessment), having got out of my PhD and started a 9-5 job, which is in geostatistics, if anyone is wondering.
I've been very slack with cricket stats lately, with work and chess taking up more of my time. Anyway, this is a long and interesting discussion about a baseball hitting a baseball bat. The very counter-intuitive result is that the batted ball speed doesn't depend on the grip, as long as the ball hits the "meat" of the bat. The same is true for cricket (this page gives a few physicsy aspects of cricket). All that matters for batted ball speed is the speed of the bat at impact.
On a different topic, I have been thinking about putting my cricket database online. It wouldn't be pretty, and any professional coders out there may be horrified at my code, but there are enough of you out there with good ideas that I think it would be worthwhile. In the long term, I would like there to be a cricket equivalent of Retrosheet, which now has over 50 years' worth of play-by-play data for Major League Baseball.
I'm sort of thinking out loud on this at the moment. My database uses CricketArchive player and match ID's, so if they changed their numbering system (which they did a few years ago), that'd screw things up. I might exchange emails with them and see what comes out of it.
Feel free to share your thoughts on this.
I should update my profile – I'm no longer a student (sort of, my Masters thesis is under assessment), having got out of my PhD and started a 9-5 job, which is in geostatistics, if anyone is wondering.
I've been very slack with cricket stats lately, with work and chess taking up more of my time. Anyway, this is a long and interesting discussion about a baseball hitting a baseball bat. The very counter-intuitive result is that the batted ball speed doesn't depend on the grip, as long as the ball hits the "meat" of the bat. The same is true for cricket (this page gives a few physicsy aspects of cricket). All that matters for batted ball speed is the speed of the bat at impact.
On a different topic, I have been thinking about putting my cricket database online. It wouldn't be pretty, and any professional coders out there may be horrified at my code, but there are enough of you out there with good ideas that I think it would be worthwhile. In the long term, I would like there to be a cricket equivalent of Retrosheet, which now has over 50 years' worth of play-by-play data for Major League Baseball.
I'm sort of thinking out loud on this at the moment. My database uses CricketArchive player and match ID's, so if they changed their numbering system (which they did a few years ago), that'd screw things up. I might exchange emails with them and see what comes out of it.
Feel free to share your thoughts on this.
Sunday, January 04, 2009
Form
Russ in a post here suggests a way of measuring form (go and read that post, it's full of goodies). The broad question is, if there are two batsmen with the same true talent, and one has better recent form, will that batsman tend to average more in his next innings?
To start with, Russ weights more recent innings more heavily than earlier innings. Specifically, the k-th innings for a batsmen with N innings in his career is weighted 0.95N-k. (This is, I believe, similar to what the ICC rankings do.)
Now, one problem with assessing "form" statistically is that a batsman will usually play a series against one team, followed by a series against another, etc. Since one of those two opposition teams can have a much stronger bowling attack than the other, what may appear to be good form and bad form may simply be a result of playing against weak bowlers and then strong bowlers. So, for everything I do in this post, I'll adjust the batting averages by the quality of the attack, as explained here.
So, when I talk about a regular average, I really mean an adjusted average. When I talk about a weighted average, I'll mean Russ's weighting by how recent the innings was (each innings also being adjusted for the quality of the attack).
Before continuing about form, I'll just look at the weighted average as a predictive tool. For all batsmen with at least 50 innings, I calculated career-to-date averages and weighted averages, as well as a 10-innings moving average. Then, from the 11th innings of each batsman's career, I calculated the absolute difference between his next innings and each of those three measures. (If the innings was a not-out, I used the not-out score as the absolute difference for each measure.) Then I averaged these errors. I did the same for all batsmen, and then found the "average average" error. The regular average was the best predictor, about 1% better than the weighted average, and 4% better than the moving average. The weighted average becomes more accurate if the 0.95 in the formula is increased towards 1, but it is always worse than the regular average.
So, as a measure of true talent of a batsman, I'll use the regular average rather than the weighted average.
Now to the question of defining form. Russ does this by taking a weighted log average. Defining Ri as the runs scored in the i-th innings, and wi as the weight of that innings, this weighted log average is:
I've actually modified this a bit. If the i-th innings is a not-out, I didn't include it in the sum in the denominator. I hope this isn't too great a crime against statistics.
The measure of form is then the ratio of the weighted log average to the weighted average. Now, if scores are distributed exponentially, then this ratio is about 0.56 (well, it is with equally weighted innings at least). If a batsman makes the same score every innings (and gets out!), the ratio is 1. If a batsman recently has one big score and a bunch of little scores, the ratio is down towards 0.3. So, good form is a high ratio, bad form is a low ratio.
Because I exclude not-outs in the denominator, it's possible to get ratios greater than 1. I'm not really sure how to interpret these, but let's carry on anyway.
Russ's hypotheses are (I hope I've got this right):
a) If there are two batsmen with a similar average, one with a typical ratio and one with a low ratio, then the one with the low ratio will tend to average more in his next innings. The logic here is that the batsman with the low ratio is capable of larger scores, whereas the other batsman is just not so good.
b) Given two batsmen with the same average, one with a high ratio will tend to do better in his next innings than one with a typical ratio.
Both of these are correct, somewhat to my surprise. I went through all batsmen, and for each innings (after the tenth in their career), calculated the career-to-date average, and the ratio-to-date, and binned them as in the table below. I then calculated the overall average for each bin.
Ratios are down the left-hand side, averages across the top. The figures are the low end of the bin. So, e.g., the '5' means that the bin is for averages 5 to 9.99, the '10' is for averages 10-14.99, etc. Only bins with at least 50 innings are shown; bold is used for at least 100 innings.
When the ratio is very low, the batsman does indeed tend to average much more in his next innings. (Since I've used regular averages to define the true talent, the top row may be full of players early in their career. I'm not sure.) Going down each column, the minimum is usually somewhere around 0.5 to 0.6, which seems to correspond to the 0.56 that you'd expect from the exponential distribution.
Really good recent form seems to give a 20% boost and sometimes more. This is a lot more than I had expected.
(My thinking on this issue seems to have been confused — in my last post I said that Johnson was good because he kept getting starts, which is consistent with this analysis.)
To start with, Russ weights more recent innings more heavily than earlier innings. Specifically, the k-th innings for a batsmen with N innings in his career is weighted 0.95N-k. (This is, I believe, similar to what the ICC rankings do.)
Now, one problem with assessing "form" statistically is that a batsman will usually play a series against one team, followed by a series against another, etc. Since one of those two opposition teams can have a much stronger bowling attack than the other, what may appear to be good form and bad form may simply be a result of playing against weak bowlers and then strong bowlers. So, for everything I do in this post, I'll adjust the batting averages by the quality of the attack, as explained here.
So, when I talk about a regular average, I really mean an adjusted average. When I talk about a weighted average, I'll mean Russ's weighting by how recent the innings was (each innings also being adjusted for the quality of the attack).
Before continuing about form, I'll just look at the weighted average as a predictive tool. For all batsmen with at least 50 innings, I calculated career-to-date averages and weighted averages, as well as a 10-innings moving average. Then, from the 11th innings of each batsman's career, I calculated the absolute difference between his next innings and each of those three measures. (If the innings was a not-out, I used the not-out score as the absolute difference for each measure.) Then I averaged these errors. I did the same for all batsmen, and then found the "average average" error. The regular average was the best predictor, about 1% better than the weighted average, and 4% better than the moving average. The weighted average becomes more accurate if the 0.95 in the formula is increased towards 1, but it is always worse than the regular average.
So, as a measure of true talent of a batsman, I'll use the regular average rather than the weighted average.
Now to the question of defining form. Russ does this by taking a weighted log average. Defining Ri as the runs scored in the i-th innings, and wi as the weight of that innings, this weighted log average is:
/ SUM wi log(Ri) \
exp | ------------- |
\ SUM wi /
I've actually modified this a bit. If the i-th innings is a not-out, I didn't include it in the sum in the denominator. I hope this isn't too great a crime against statistics.
The measure of form is then the ratio of the weighted log average to the weighted average. Now, if scores are distributed exponentially, then this ratio is about 0.56 (well, it is with equally weighted innings at least). If a batsman makes the same score every innings (and gets out!), the ratio is 1. If a batsman recently has one big score and a bunch of little scores, the ratio is down towards 0.3. So, good form is a high ratio, bad form is a low ratio.
Because I exclude not-outs in the denominator, it's possible to get ratios greater than 1. I'm not really sure how to interpret these, but let's carry on anyway.
Russ's hypotheses are (I hope I've got this right):
a) If there are two batsmen with a similar average, one with a typical ratio and one with a low ratio, then the one with the low ratio will tend to average more in his next innings. The logic here is that the batsman with the low ratio is capable of larger scores, whereas the other batsman is just not so good.
b) Given two batsmen with the same average, one with a high ratio will tend to do better in his next innings than one with a typical ratio.
Both of these are correct, somewhat to my surprise. I went through all batsmen, and for each innings (after the tenth in their career), calculated the career-to-date average, and the ratio-to-date, and binned them as in the table below. I then calculated the overall average for each bin.
Ratios are down the left-hand side, averages across the top. The figures are the low end of the bin. So, e.g., the '5' means that the bin is for averages 5 to 9.99, the '10' is for averages 10-14.99, etc. Only bins with at least 50 innings are shown; bold is used for at least 100 innings.
r/a 5 10 15 20 25 30 35 40 45 50 55
0.35 23.4 28.4 28.7 40.8 41.3
0.40 12.6 18.8 21.4 31.7 30.5 34.9 40.3 40.7 55.3
0.45 7.4 12.1 15.9 20.9 26.7 30.1 34.0 46.9 39.9 50.2
0.50 9.6 11.1 17.6 22.2 26.4 31.7 36.3 34.9 43.5 46.0
0.55 8.4 11.9 16.5 21.0 26.1 31.6 35.2 40.1 42.6 53.3
0.60 7.6 12.2 18.3 24.4 27.9 33.5 38.2 42.7 46.6 58.0 40.4
0.65 7.8 12.3 18.1 24.7 28.0 33.4 39.2 43.3 48.7 48.1 46.6
0.70 8.2 12.4 19.3 25.7 27.5 34.4 40.4 46.1 44.0 56.7
0.75 9.3 15.3 17.4 23.1 29.3 38.9 41.0 47.4 51.8 50.1
0.80 9.8 12.0 17.7 24.4 30.5 35.1 48.2 54.2 51.8 58.9
0.85 15.0 22.1 25.8 38.8 40.5 44.6 51.3 61.9 46.8
0.90 15.6 26.7 42.1 42.6 59.2 54.1
0.95 46.5
1.00 16.6 35.9 32.5 44.7 53.9 68.5 73.2 62.9
When the ratio is very low, the batsman does indeed tend to average much more in his next innings. (Since I've used regular averages to define the true talent, the top row may be full of players early in their career. I'm not sure.) Going down each column, the minimum is usually somewhere around 0.5 to 0.6, which seems to correspond to the 0.56 that you'd expect from the exponential distribution.
Really good recent form seems to give a 20% boost and sometimes more. This is a lot more than I had expected.
(My thinking on this issue seems to have been confused — in my last post I said that Johnson was good because he kept getting starts, which is consistent with this analysis.)
Saturday, December 20, 2008
Johnson's batting
I've only seen Johnson bat once (at the Gabba against NZ), and he looked very good. Looking through his Test scores to date, he's only been dismissed in single figures in a quarter of his innings. If you assume an exponential distribution of scores, that is consistent with an average of about 35. Currently he's averaging low 20's.
So, there are two main possibilities — he's been lucky to make so many starts, or he's been unlucky to get out for 20odd so often. Batting at number 10 probably isn't helping him.
I'd like to see him moved up to number 8. At worst, he'll be about as good as Lee. At best, I think he could be a very good bowling all-rounder.
Sorry for the lack of posting lately. I'm trying to finish off my Masters thesis (transferred down to PhD), I will start a proper job in February, and I'm spending much of my spare time playing and studying chess.
So, there are two main possibilities — he's been lucky to make so many starts, or he's been unlucky to get out for 20odd so often. Batting at number 10 probably isn't helping him.
I'd like to see him moved up to number 8. At worst, he'll be about as good as Lee. At best, I think he could be a very good bowling all-rounder.
Sorry for the lack of posting lately. I'm trying to finish off my Masters thesis (transferred down to PhD), I will start a proper job in February, and I'm spending much of my spare time playing and studying chess.
Saturday, November 15, 2008
Can Dan Cullen bowl a doosra?
I ask because I saw him bowl balls spinning from leg to off during the warmups last night (the warmups was the only time I was in line with the bowler). Some of them I picked as leg breaks, but others I didn't, making me wonder if he can bowl a doosra now. It's equally possible that I wasn't paying close enough attention to his hand.
Other notes from last night's game:
- What would be a good name for a team which includes Michael Dighton, Dan Marsh, and Ryan Harris? I know, All Stars.
- Brendan Drew was bowling nice little outswingers in the warmups.
- When Xavier Doherty was fielding at long off a few metres from me, I resisted the temptation to tell him that he's got the worst first-class average of any bowler ever. It's only just false. With the qualification of 5000 balls, the only people with worse averages are a guy who only played for Oxford University, Sachin Tendulkar, and Stuart Saunders, who at least averaged 25 with the bat for Tassie. But Doherty's pretty good at limited-overs cricket. Weird.
- When Magoffin (I like Magoffin — he was the only guy who stuck in my memory from the Queensland Academy side I saw play India in 2003/4) was bowling to Sean Marsh, Gilchrist set a 7-2 field. It was a disgustingly negative tactic. Captains have a responsibility to make sure that the game is entertaining for the spectators. It's because of people like Gilchrist that people are turning away from Twenty20 cricket.
Other notes from last night's game:
- What would be a good name for a team which includes Michael Dighton, Dan Marsh, and Ryan Harris? I know, All Stars.
- Brendan Drew was bowling nice little outswingers in the warmups.
- When Xavier Doherty was fielding at long off a few metres from me, I resisted the temptation to tell him that he's got the worst first-class average of any bowler ever. It's only just false. With the qualification of 5000 balls, the only people with worse averages are a guy who only played for Oxford University, Sachin Tendulkar, and Stuart Saunders, who at least averaged 25 with the bat for Tassie. But Doherty's pretty good at limited-overs cricket. Weird.
- When Magoffin (I like Magoffin — he was the only guy who stuck in my memory from the Queensland Academy side I saw play India in 2003/4) was bowling to Sean Marsh, Gilchrist set a 7-2 field. It was a disgustingly negative tactic. Captains have a responsibility to make sure that the game is entertaining for the spectators. It's because of people like Gilchrist that people are turning away from Twenty20 cricket.
Tuesday, October 21, 2008
Keepers
I've got what I think is an interesting post up at it Figures. It ranks wicket-keepers by the rate at which they let byes through, adjusted by country. It doesn't work for keepers who kept up to the stumps to fast bowling a lot (i.e., keepers from the olden days), but I'm happy with how well it works for modern keepers. Here is the full list (qualification: 20 Tests) for those who want to just see the results.
I must admit I hadn't heard of some of those guys near the top. Kirmani rang a bell, but Tamhane was new to me. I was happy to see in his Cricinfo profile that Wally Grout compared Tamhane to Don Tallon. It's unfortunate that Tallon (and Grout, and the rest) kept up to the stumps so often. It makes cross-era keeper comparisons difficult. (You can't just go by the prevailing rate of byes in world cricket at the time, otherwise modern keeper-batsmen would come up as the equal of Knott, Taylor, etc.).
Ideally, we'd be able to take this one step further and get a well-founded measure of a keeper-batsman. But the main difference between a good keeper and a bad one is the number of dismissals effected, and it would be close to impossible to get accurate estimates on, e.g., how many dismissals Knott would have had if he'd kept for Pakistan in the 1990's.
I must admit I hadn't heard of some of those guys near the top. Kirmani rang a bell, but Tamhane was new to me. I was happy to see in his Cricinfo profile that Wally Grout compared Tamhane to Don Tallon. It's unfortunate that Tallon (and Grout, and the rest) kept up to the stumps so often. It makes cross-era keeper comparisons difficult. (You can't just go by the prevailing rate of byes in world cricket at the time, otherwise modern keeper-batsmen would come up as the equal of Knott, Taylor, etc.).
Ideally, we'd be able to take this one step further and get a well-founded measure of a keeper-batsman. But the main difference between a good keeper and a bad one is the number of dismissals effected, and it would be close to impossible to get accurate estimates on, e.g., how many dismissals Knott would have had if he'd kept for Pakistan in the 1990's.
Wednesday, October 08, 2008
Free hits
We're on the eve of what should be another fantastic battle of Test cricket, but this evening I finally got around to a simple study of the effect of the free hit rule in the IPL. The IPL's the only ball-by-ball database I have, otherwise I'd see if the results are the same in ODI's as well.
Anyway, I went through the tournament and found the run rate on the ball immediately following a no-ball (there might be some non-front-foot no-balls in there, but whatever...). The result was 140 runs off 82 balls, a run rate of 10.2 per over.
To work out how many runs you'd expect to have been scored if it wasn't a free hit, I used the overall run rates by over (actually I did a bit of smoothing first, fitting quadratic trends to the first six overs and remaining 14 overs). The graph looks like this:

So, e.g., if there was a free hit in the 10th over, you'd assume that if it was a normal ball, 7.0/6 = 1.17 runs would have been scored off that ball. Doing this for each of the free hits, you end up with an expected score of 111 runs coming off those free hits, a run rate of 8.1 per over. So the batsmen are scoring more than usual off the free hits, but not by a lot. An extra third of a run per free hit, on average.
It is a nice check that the average run rate in the 20th over (when you're batting like they're all free hits anyway) was 10.1.
Presumably free hits in ODI's also go at about 10/over (or 1.7 per ball), so proportionally speaking, the punishment is greater in the 50-over form of the game.
That graph above is pretty interesting. There's a bit of noise, but the end of the fielding restrictions is very clear. It's interesting that the acceleration is gradual.
The batsmen are circumspect (relatively speaking) in the first over, and the effect certainly muhc larger than just the first over being just bowled by the best bowler (no-one had an economy rate as low as 5.2 in the IPL). Perhaps it is worth opening with your "fifth" bowler. I don't know.
Anyway, I went through the tournament and found the run rate on the ball immediately following a no-ball (there might be some non-front-foot no-balls in there, but whatever...). The result was 140 runs off 82 balls, a run rate of 10.2 per over.
To work out how many runs you'd expect to have been scored if it wasn't a free hit, I used the overall run rates by over (actually I did a bit of smoothing first, fitting quadratic trends to the first six overs and remaining 14 overs). The graph looks like this:

So, e.g., if there was a free hit in the 10th over, you'd assume that if it was a normal ball, 7.0/6 = 1.17 runs would have been scored off that ball. Doing this for each of the free hits, you end up with an expected score of 111 runs coming off those free hits, a run rate of 8.1 per over. So the batsmen are scoring more than usual off the free hits, but not by a lot. An extra third of a run per free hit, on average.
It is a nice check that the average run rate in the 20th over (when you're batting like they're all free hits anyway) was 10.1.
Presumably free hits in ODI's also go at about 10/over (or 1.7 per ball), so proportionally speaking, the punishment is greater in the 50-over form of the game.
That graph above is pretty interesting. There's a bit of noise, but the end of the fielding restrictions is very clear. It's interesting that the acceleration is gradual.
The batsmen are circumspect (relatively speaking) in the first over, and the effect certainly muhc larger than just the first over being just bowled by the best bowler (no-one had an economy rate as low as 5.2 in the IPL). Perhaps it is worth opening with your "fifth" bowler. I don't know.
Subscribe to Posts [Atom]