Sunday, May 25, 2008

Batting well with a batsman

That's right people, a new post! Now that I'm back at uni, I have less time for cricket analysis, so I'll be aiming to get about one post per week, maybe two if I find something simple and interesting on Statsguru.

Some long time ago at Well Pitched, there was a discussion on great batsmen and how they supposedly "lift" their teammates when batting with them. I was sceptical about this being a real effect. Analysing it properly will take at least a couple of posts, and this is the first one.

Getting data on partnerships from summary scorecards always carries with it the problem of retired hurts. It's not just a question of definition (if an opener retires hurt before the fall of the first wicket, do you have two first-wicket partnerships or a three-way partnership?). The problem is that retired hurts are not always recorded on scorecards if the batsman in question returned to the crease. (Certainly in my lazy database, this is never recorded.) So it sometimes happens that you look at the FOW's to work out which partnerships happened and how many runs each was worth, subtract and you find that a batsman contributed negative runs during some passages of play.

So before I started gathering partnership data, I did my best to get rid of innings where there was a retired hurt. Innings were deleted if:

- a batsman finished retired hurt;

- the number three was the first wicket to fall, etc.;

- reconstructing the FOW's from the minutes batted by each batsman (where possible) disagreed with the actual FOW's;

- any partnerships required negative runs from one batsman to make sense.

Point three is an interesting one, because careful traces through of the minutes batted can identify both the presence of retired hurts and also of errors in the minutes as given. One curious error is in this Test, in which Manoj Prabhakar apparently batted for 304 minutes, while the rest of the batsmen combined for only 274.

Anyway, the above procedure isn't perfect — it won't pick up all retired hurts, especially if the minutes aren't recorded, and there are probably some innings where the anomalous minutes are just scorer/Cricinfo/CricketArchive errors and not actually showing retired hurts. But it seems to do a reasonable job, and about 430 innings were removed.

Now to the analysis proper. For each batsman and each innings, I took the runs in his partnerships and subtracted off his own score, so that we're left with the runs scored by his partners and extras while he was at the crease. Then you count how many times he saw his partners get out, and you have the average of his partners (plus extras) when he was at the crease.

To get an expected average, I added up the averages of all his partners, and divided by the total number of partnerships.

Then divided the actual partner-average by the expected partner-average, and you get a measure of how well people bat with him, relative to their careers.

When you do this, you find that players with short careers have much more variation than players with longer careers. Graph (qual. 20 innings, batsman's average at least 30):

Look at those ugly decimal points.

(The average ratio across all these batsmen is about 1,1.)

Now, what I think I should do at this point is to work out how reliable the statistic is (i.e., how much of it is skill, and how much just luck), and then regress each player to the mean appropriately. (I'm learning from the baseballers, who do this sort of thing a lot.) But working out how reliable this stat is will require some thought (you're welcome to do the thinking for me). One problem is that part of what it measures might be called flat-track-bully-ness. If a batsman does disproportionately well on flat tracks, then it might be the case that he is part of many big partnerships which bloat his partner average.

But I will ignore this for now, and instead find z-scores. I ordered the batsmen in order of innings batted, found the moving standard deviation of the next 30 ratios, and then fitted a curve to it. It goes a bit like 1,3/sqrt(no. inns), for those interested. Then for each batsman, you use this as the standard deviation, and find how many standard deviations from the overall mean his ratio is.

(In terms of the reliability, the question is: Does being a standard deviation above the mean after 20 innings mean that you'll probably be a standard deviation above the mean after 200 innings?)

In the table below are the batsman's average, innings batted (having excised team innings probably involving retired hurts), runs by partners (incl. extras), total number of partnerships, expected partner average, actual partner average, ratio, z-score. Note that the partnership average is not just the partner runs divided by the number of partnerships — it's the partner runs divided by the number of times the batsman saw partners dismissed.

name avg inns p-runs pships exp act ratio z
RT Ponting 58,6 183 9622 308 43,7 63,7 1,46 3,94
RL Dias 36,7 33 1260 54 28,9 54,8 1,89 3,67
DS Lehmann 45,0 42 1733 62 43,8 78,8 1,80 3,66
DJ Bravo 33,0 44 1601 70 33,7 59,3 1,76 3,52
RWT Key 31,0 25 895 36 39,1 74,6 1,91 3,24
RT Robinson 36,4 45 1971 72 37,0 61,6 1,67 3,06
HH Dippenaar 30,1 60 2237 88 43,0 67,8 1,58 2,96
ME Trescothick 43,8 136 5890 234 38,8 54,5 1,41 2,88
Shoaib Mohammad 44,3 65 3673 123 37,1 56,5 1,52 2,73
G Pullar 43,9 44 1898 67 43,7 70,3 1,61 2,70
CL Cairns 33,5 97 2815 158 30,0 42,7 1,42 2,55
FA Iredale 36,7 22 930 42 25,1 44,3 1,76 2,48
V Sehwag 53,8 82 2794 128 39,7 57,0 1,44 2,44
Javed Miandad 52,6 172 8715 335 35,8 47,6 1,33 2,42
MLC Foster 30,5 23 624 26 45,2 78,0 1,72 2,39
GC Smith 49,5 107 4905 191 40,0 55,1 1,38 2,31
Habibul Bashar 30,9 96 2775 189 21,3 29,5 1,38 2,22
M Prabhakar 32,7 57 2064 89 35,8 51,6 1,44 2,06
CG Greenidge 44,7 175 7559 301 41,8 54,0 1,29 2,00
AH Jones 44,3 71 3560 144 31,9 44,5 1,39 1,97

Make of that what you will....

The bottom-end, those who apparently make their partners bat badly:

name avg inns p-runs pships exp act ratio z
Saeed Anwar 45,5 84 3287 194 34,3 29,3 0,86 -1,92
DJ Cullinan 44,2 111 3965 217 38,3 33,9 0,89 -1,95
WR Hammond 58,5 129 6160 284 40,1 36,0 0,90 -1,98
RA McLean 30,3 66 1049 105 31,0 25,0 0,81 -2,02
FE Woolley 36,1 92 2404 162 37,0 31,2 0,84 -2,10
SM Pollock 32,3 151 3704 249 30,3 27,2 0,90 -2,16
JT Tyldesley 30,8 54 1655 129 29,0 21,8 0,75 -2,16
AG Chipperfield 32,5 20 431 52 24,4 12,3 0,50 -2,19
MA Noble 30,3 70 2277 162 29,7 23,0 0,77 -2,30
WJ Cronje 36,4 105 3764 219 36,7 30,6 0,83 -2,32

Well if Hansie Cronje coming last on this statistic isn't the most appropriate thing I've ever put on this blog, then I don't know what is! Good to see his worshipper Shaun Pollock also down there.

Two names mentioned in the Well Pitched discussion were Steve Waugh and Inzamam-ul-Haq. They are at z = -1,22 and z = -0,40 respectively.

When I next attack this problem, I will also check to see if there are any patterns with batting position, and also look at batting with the tail.

Steve Waugh and Inzamam were mentioned in the context of batting well with the tail. Either they protected the tail or made the tail bat better than they usually do.

Again that was a very qualitative assessment more than a statistical one like yours.

McGrath held on to take Waugh to a 100 umpteen times. Same with Inzi being the last man out on most occasions.

Your next part will be interesting.
I'm not on my home computer at the moment, so I'll just mention the McGrath centuries - Steve Waugh reached 100 three times when McGrath was at the crease. McGrath, in those innings, scored 2 runs from 8 balls, 21 not out from 28, and 3 from 11. Only in the middle one did he do better than his average (both in terms of runs and balls faced).
Good informative site.

Would you be interested in exchanging links with my blog?

The address is:

I've added to to the links list, Isaac.
Post a Comment

Subscribe to Post Comments [Atom]

<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]