Saturday, May 10, 2008

John Buchanan and The Guardian article

Hello to those of you who've come here from Andy Bull's piece in The Guardian. I hope you find something interesting here.

There are a couple of ideas in that article that I think are worthy of more detailed discussion.

What John Buchanan says is interesting, but it seems to me that he's taking a purely coaching perspective. He says:

1) Ignore existing cricket statistics - these are just the 'outcome numbers' of a process of getting there.

If I were a coach, I would probably agree with this. Buchanan goes on to give the example of strike rate. It would be no good a coach saying to a player, "Hey, you're averaging 35 at a strike rate of 70. I want you to average 40 at a strike rate of 80." You need to break batting down into its parts and make improvements at that level.

That's where the ball-by-ball analysis comes in — what Buchanan calls 'process numbers'. (Buchanan is very big on processes, I gather. I've seen him talk about them elsewhere.) You look at the dot balls, try to improve shot selection on them, etc. You hope that you'll end up scoring more runs at a higher rate.

That's what the coach does. From a selection perspective, the outcome numbers are still going to be important. No-one cares what your percentage of dot balls is if you average 25, and no batsman will hold down a spot in the national side with such a low outcome number. Cricket games are won by the team that scores the most runs, and we shouldn't lose sight of that. All the 'processes' work is no good if it doesn't improve averages (or strike rates, in limited overs cricket).

Now, there are times when process numbers might be useful in selection — if a batsman has bad process numbers, then perhaps with coaching he might improve a lot more than a batsman who's already largely optimised his game. I don't know. Without seeing the figures involved and knowing what improvements are usually made, it's hard to say how useful such an approach would be.

Now onto one of the questions Bull posed at the end of the column:

Could we see teams selected through statistical proof rather than the current woolly combination of gut instinct, vague notions about character and compromised measures such as batting averages?

I will be very surprised if, in the forseeable future, detailed statistics will be better at team selection than human experts with regular stats. In terms of working out when to drop players, they might be. (I said here that selectors are probably best off with their gut on dropping players. Perhaps with detailed process stats you could do better, I don't know.)

But when it comes to finding the best players in domestic cricket, I doubt if a computer would do better than Duncan Fletcher, for example (if you haven't read Andrew Strauss's thoughts on Fletcher, I recommend doing so). Fletcher famously picked Michael Vaughan for the 1998/9 tour of South Africa on 'temperament'. His record in county cricket was not great — his first-class averages in the previous two seasons were 34 and 41. His average for Yorkshire is still well under 40. But despite that, in England colours he turned himself into a good batsman, doing better against Test sides than against county sides.

Now, it's possible that with sufficient process numbers from his county games, you would be able to tell him apart from the rest of the county hacks averaging high 30's. But I'd be surprised if it were so.

Obviously you'll want to be paying attention to stats when picking national sides — you won't consider batsmen averaging under 30, and you'll certainly be looking at those averaging 60 — but since the quality of the players is significantly lower in domestic cricket, you'll want humans watching them, gauging their technique and judging if they'll hold up against 90mph pace bowling or top-class spinners.

They don't always get it right, of course, but I think that they do better than a computer (or a person looking only at numbers) would do.

Tuesday, May 06, 2008

Australia batting first in ODI's

There's an interesting comment by Nesta on my rambly post about batting-first strategies. Essentially, Nesta reckons that Australia have come close to perfecting the art of batting first in 50-over cricket.

Since there's much more scope for variation in batting-first strategies than batting-second strategies (in the latter, everyone know how many runs they need), you might conjecture that this will show up in the results. And it looks like it does.

I considered ODI's between the top eight sides in the 2000's. I split them into day games and day-night games, because the two are markedly different (day games strongly favour the team batting second; day-night games favour the team batting first).

In day games, Australia has won 73% of matches when batting first (ignoring no-results). Second is Sri Lanka at 49% — a whopping 24 percentage points! Australia has won 78% of matches batting second, with South Africa second at 71% — only seven percentage points behind.

In day-night games, batting first: Aus 76%, South Africa 63%; batting second: Aus 62%, South Africa and Pakistan 55%. Once again, a bigger difference in batting first results.

So it does look like Australia have an advantage over their rivals when it comes to batting first, above and beyond their general cricket superiority.

Now for some tables. For each team, I give the number of matches (actually this column includes no-results because I was lazy when doing the copy-paste), the win fraction batting first, the win fraction batting second, and the ratio. First up, day games:

team mats 1st 2nd ratio
Pakistan 41 0,40 0,37 0,92
Australia 42 0,73 0,78 1,07
Sri Lanka 53 0,49 0,61 1,25
India 47 0,39 0,58 1,49
West Indies 47 0,29 0,50 1,73
South Africa 39 0,38 0,71 1,85
New Zealand 39 0,29 0,62 2,14
England 38 0,19 0,56 2,94

Only Pakistan does better batting first in day games, but that is probably noise, given where Pakistan is on the next table. Australia is second, with only a small improvement when chasing.

Day-nighters:

team mats 1st 2nd ratio
Sri Lanka 59 0,58 0,34 0,59
Australia 70 0,76 0,62 0,82
England 42 0,37 0,31 0,85
South Africa 43 0,63 0,55 0,88
India 50 0,42 0,39 0,92
Pakistan 55 0,58 0,55 0,93
West Indies 22 0,25 0,25 1,00
New Zealand 41 0,43 0,43 1,01

Australia once again second — it's interesting to see Sri Lanka in the top three in both tables as well. Only New Zealand have a better record chasing in day-nighters.

It's worth pointing out that this could do with a more detailed analysis — Australian grounds may be more bat-first-friendly in day-nighters than others, which would explain Australia's high position in the second table.

Sunday, May 04, 2008

Luck

I thought I'd simulate a double-round-robin tournament with eight teams, to model the IPL. So Teams A to H each play 14 games. Here is the final ladder, ordered by number of wins:

C: 10
F: 10
B: 9
G: 7
H: 7
D: 6
A: 4
E: 3

Team E'll be looking for a new coach — only three wins out of fourteen.... Anyway, as the title of this post will suggest, the result of each match was decided by a (virtual) coin toss. The point here is that if all teams are perfectly evenly matched and results come down to the luck of the day, you'll still end up with teams at the top of the ladder having much better records, over 14 matches, than the teams at the bottom.

Now of course there is skill involved in cricket, and some teams in the IPL are better than others. But can we tell which team is the best just from the results? Probably not from just one season (unless they put in a really dominant performance — lots of wins, by big margins). And more importantly, it'll be impossible to say how good each team actually is. To explain this point, I'm going to borrow the notation from American sports (since that's how I think of it in my head — much of what I write here can be found somewhere in the archives of this blog and this blog). A .500 team ("five hundred") is a team that wins 50% of its matches. A 0.600 team wins 60%, and so on.

To work out if a team is really a .600 team (say), you'd need an infinite number of matches to prove it. Of course, we could get by with a large number — just how large depends on how much luck is involved in each game. The problem with T20 is that we don't know how much luck there is. So we're going to be fumbling around in the dark somewhat — once we've had a few seasons (to get enough data), we'll be able to look at the win-loss records of the teams and see if the how much greater the variance is than that expected by chance.

I worked out some numbers for ODI's and Tests here; T20 will have more luck involved than fifty-over cricket, but the IPL complicates things as the foreigners are dominant, and there's only four of them per side. If the long-term variance in win percentage is the same in the IPL as it is for ODI's (a big if), then you'll need each team to play about 17 or more games before the skill will demonstrably be playing a part in the results.

One season of IPL isn't going to be enough. In the coin-toss example above, every team was a .500 team. Only G and H ended with .500 records. Teams above them were lucky, teams below (especially E) were unlucky.

If we look at the IPL table today, Rajasthan are at .833. Are they genuinely an .833 team? They could be. Or they could be a .900 team that happened to lose one of their first six matches, or a .500 team that's had a bit of luck.

Let's not forget, Zimbabwe beat Australia not long ago in a T20 game. We should expect bad teams to win matches. And sometimes, mediocre teams will string together a few wins on the trot. Conversely, good teams will lose some. Does anyone really believe that Deccan (Gilchrist, Afridi, et al.) is a .167 team?

One way of seeing how much skill is involved will be to compare the first half of the tournament with the second, and see what correlation there is. Unfortunately, the coming and going of lots of big stars will make this really muddy, but I'll still do it at the end of the tournament.

So my message is, don't read too much into individual results. Don't say that the team on top of the ladder is the best simply because they're coming first — they might be the best team, but they might just be lucky. Go and read this excellent piece by Lawrence Booth at Cricinfo.

Saturday, May 03, 2008

The IPL so far

Each team in the IPL has now played five matches. I thought I'd have a look at the points table. Really the only reason I'm doing this is because I end up looking prescient, and if I let it go too long the results might start turning against me.

Near the end of this post, I came up with some half-baked ratings on how clever each team's bidding was. Contrary to just about everyone else, Jaipur (ie, Rajasthan) came out best. I didn't even really believe it myself, so I don't want you to go back and read the paragraph afterwards in that post. Just pay attention to the numbers.

Q gave his auction ratings here, while Arjwiz gave his here.

Actual Me Q Arjwiz
1. Delhi Rajasthan =Delhi Bangalore
2. Chennai Chennai =Kolkata =Delhi
3. Rajasthan Delhi Deccan =Kolkata
4. Punjab Deccan =Chennai =Deccan
5. Kolkata Bangalore =Punjab Chennai
6. Deccan Kolkata Mumbai Punjab
7. Mumbai Punjab Bangalore Mumbai
8. Bangalore Mumbai Rajasthan Rajasthan

I've got the top three! Albeit in the wrong order, because of net run rate. In terms of Pearson's rho (-1: perfectly wrong, 1: perfectly right), I'm at 0.62, Q's at 0.34, and Arjwiz is -0.27.

Thursday, May 01, 2008

Maximising runs or wins

In a post at 99.94, I took the comments thread off on a long tangent that was only just related to the original post.

It got me thinking about batting strategies (at a conceptual level) in limited-overs cricket. Batting second, it's simple: choose the strategy to maximise your chance of reaching the target. Every team does this instinctively — chasing 350, they go for broke, and often end up losing by a lot.

Batting first, I'm not sure what the optimal strategy is. Instinctively, I at first thought that you should choose the strategy to maximise the expected number of runs that you score. But scoring runs isn't actually the end goal — it's winning the game. And increasing the average number of runs you score won't always improve your win/loss ratio.

To take an extreme example, suppose you're a really bad team like Bangladesh, up against a team like Australia. Whenever Bangladesh bats first, they choose the run-maximising strategy. The results might be a bell curve centred around 180. So a lot of scores around 170-190, a few past 200, a few below 160, etc.

Now Australia has no problem chasing any of those. Australia's only going to have problems when the target's up over 250. So while the Bangladeshi averages will be best-served by going with the run-maximising strategy, they may end up losing every game.

On the other hand, if they play more aggressively, then sometimes their batsmen will have a bit of luck and they'll end up with a big score. In their long series of matches with Australia, they'll have loads of heavy defeats, after making scores like 120 and 150 and so on, but every now and then, they'll make 250 and have a chance at winning. So their averages will suffer, but their win/loss ratio will improve.

It'd be a public relations disaster, of course — all those thrashings.

If you've got two more evenly-matched sides, choosing the win-maximising strategy when batting first becomes problematic. Maybe you've studied the opposition's batting and concluded that you're best-off aiming for 270+. But maybe the pitch is not so good, and you don't know how to adjust that 270 score. You'll probably go back to a run-maximising strategy.

Nevertheless, I think with a very careful analysis, there's scope for improving win/loss ratios. I think it's most applicable in T20, because it's so short. If you bat first and lose early wickets, what do you do? Go for broke (hoping for 140 but probably getting 90), perhaps, rather than slowly batting out the overs (and getting 120)? It'll probably need a few years of IPL before we have enough data to say.

On an unrelated topic, the latest post chez Z-Score has a teaser question: What is the highest Test partnership for a pair who only batted once together in Tests? The hints are that they aren't Australian, and that the partnership is higher than 320. For those who don't want to search for it themselves, feed this into ROT13:

yrauhggbanaqznhevpryrlynaqjuraratynaqznqrbireavaruhaqerq.

Sunday, April 27, 2008

WG Grace

WG Grace had a very long career — he played a long time after his peak. That's why, when looking at his career averages (unadjusted first-class averages from CricketArchive: bat 39,45; bowl 18,14), you don't see why he's such a huge figure in the history of the game. His aggregates are huge, sure, but it looks like he was a great who played for a long time, rather than a rival to Bradman as the greatest ever.

To see where this latter perception comes from, I plotted his cumulative adjusted averages (weighting innings according to the quality of the attack, relative to an overall average of 24,5) against time. I considered only first-class matches in England (since I know that that part of my database works — I didn't want to spend half a week debugging Australian matches that I haven't tested yet).

I plotted all matches for each season at the same x-value, which is why the curve is funny.

I don't know why Excel made a funny loop at around 1866.

To give a feel for the batting scale: Bradman is at 98,9; Headley 65,8; Ranji 60,5; Merchant 56,3 (those are the top four); Mike Hussey 52,2; Barry Richards 45,3.

Bowling scale: Murali 13,7; Lindwall 14,1 (top two); Darren Gough 20,9; Eddie Hemmings 25,6.

All references to averages below are adjusted ones.

Grace's (adjusted) batting average peaked at the end of the 1873 season at 92,8; at this time he had scored over 10000 first-class runs. Batting doesn't get much more Bradmanesque. By 1880, he had 19560 runs at 74,5. This also marks the start of his decline as a bowler. By the end of 1880, he had 1335 wickets at 21,9. If Grace had stopped playing then, his ratio of batting average to bowling average (3,4) would have been well clear of second place (Keith Miller at 2,8).

Even in 1886, though (more than 20 years after the start of his first-class career), his batting average was higher than Headley's.

The decline in batting average is very marked — it almost falls to 50 by the end of his career. The rise in his bowling average is much more gentle, because as he got older he bowled less, not bowling more than 4000 balls in a season after 1888.

Saturday, April 26, 2008

Bowler workloads

Over at CFF, there was a debate over spinners' averages. One poster said that spinners bowl disproportionately many overs on flat pitches, while lazy pacemen rotate at the other end. This has the effect of bloating out spinners' averages unfairly. Another poster responded by saying that spinners also bowl disproportionately many overs on raging turners, which would help their averages.

Which factor is the dominant one? To answer this, I considered every innings, and scaled each bowlers' figures so that he effectively bowled a quarter of the overs. So, for instance, if a team batted for 100 overs, and one bowler took 1/80 from 20 overs, that would be scaled up to 1,25/100 from 25 overs. If another bowler took 1/60 from 30, it would become 0,83/50. So each bowler's average in any given innings won't change, but we'll see any effects of bowling or not bowling in tough or easy conditions.

Now, sometimes a bowler might, say, bowl one over in an innings and take a wicket in it, or bowl one over and get hit for 15 runs. Obviously it's not realistic that he would have taken 25 wickets or gone for 375 runs from 25 overs, so if the number of balls bowled by a bowler was less than 60, I didn't do any adjustment. It's a bit arbitrary where you put the cut-off, but pushing it back to 30 balls doesn't change the overall trends.

One very stunning result comes out of this analysis. Every major wicket-taker (at least 100 Test wickets), except for Vanburn Holder, has his average increase. I'm still wondering a little bit if it's a bug in my code, but since I can't find one, and it passes various sanity checks, I'm reasonably confident that these results are true.

Assuming that I haven't made some silly mistake, there's a simple explanation for this phenomenon — captains can tell which bowlers are being effective on a given day and which ones aren't, and they make the effective bowlers bowl more overs. There could also be a bit of luck involved — say a bowler is a bit unlucky and goes for fifteen wicketless overs. If he'd been given another ten, he might have picked up a wicket or two. But since he hadn't, he didn't get to bowl again.

Let's have a look at the top and bottom of the table, ordered by the difference in weighted (by the average of the batsmen dismissed) average. Qualification: 100 wickets. Columns of the table are: wickets, regular average, weighted average, scaled regular average, scaled weighted average, difference between scaled regular average and regular average, difference between scaled weighted average and weighted average.

avg scaled avg diff
name wkts reg wtd reg wtd reg wtd
VA Holder 109 33,3 37,0 33,1 36,8 -0,2 -0,2
DA Allen 122 31,0 30,1 32,4 30,6 1,4 0,4
PM Pollock 116 24,2 26,5 24,9 27,0 0,7 0,5
M Dillon 131 33,6 31,4 34,3 32,1 0,7 0,6
AN Connolly 102 29,2 27,1 29,3 27,9 0,0 0,7
CEH Croft 125 23,3 23,3 24,3 24,1 1,0 0,8
NAT Adcock 104 21,1 23,5 22,1 24,3 1,0 0,9
M Muralitharan 724 21,8 24,6 22,5 25,5 0,6 0,9
MW Tate 155 26,2 26,3 26,9 27,3 0,8 1,0
WJ O'Reilly 144 22,6 22,8 23,1 23,8 0,5 1,0
---
C Blythe 100 18,6 23,8 22,5 28,9 3,9 5,1
H Trumble 141 21,8 26,5 25,8 31,6 4,0 5,1
Mohammad Rafique 100 40,8 40,6 45,4 45,8 4,7 5,2
N Boje 100 42,7 38,9 48,4 44,0 5,8 5,2
Intikhab Alam 125 36,0 37,4 41,8 42,9 5,8 5,5
W Rhodes 127 27,0 32,7 31,4 38,3 4,4 5,6
J Briggs 118 17,8 32,4 21,9 38,3 4,1 5,9
AF Giles 143 40,6 37,7 46,9 43,7 6,3 6,0
TE Bailey 132 29,2 30,6 35,6 37,1 6,4 6,4
AL Valentine 139 30,3 33,2 35,9 39,6 5,6 6,4

Those near the top of the table are the ones who bowl in the tough conditions or don't bowl so often in favourable ones; those near the bottom don't bowl so much in the tough conditions but do when things are going well.

The results aren't what I would have expected. Murali's position near the top is easy to explain — he does a huge amount of bowling for Sri Lanka come what may. Generally, though, it's pacemen at the top and spinners at the bottom.

The keen-eyed amongst you will note that, with the exception of Murali, none of the bowlers listed above took more than 155 wickets. There is much more variation for bowlers with lower numbers of wickets:

I should probably bin these and find z-scores.

I've shown a quadratic fit because it looks better than a linear one. The general trend is clearly downward — bowlers who take lots of wickets tend to get hidden less from flat pitches than bowlers who don't.

Now of course, considering only bowlers with large career wicket hauls will pick out good bowlers, but what about great bowlers from the olden days who didn't play so many Tests? If you take the ratio of scaled weighted average to weighted average, and plot against the weighted average, you get only a very very slight positive correlation (y = 0,0008x + 1,07; R-squared = 0,01).

(If you plot the difference, rather than the ratio, you get a strong positive correlation. I think it's more accurate to work with ratios here.)

So, my conclusions so far are:

- Most of the variation is to do with small samples.
- But better bowlers do have slightly smaller differences (or ratios) when you scale their workloads to one quarter of each innings' overs.

Murali definitely belongs near the top of the table — such a small difference after taking over 700 wickets is clearly a genuine (and easily explained) trait of his bowling, and not statistical noise. This looks to me like a good way of trying to answer the question, "How would Warne have done if he had Murali's workload?" I don't think it's reasonable to do this for most pairs of bowlers, but if they have a large number of wickets (as with Warne and Murali), then the difference between a pair of players is likely to be genuine. And in this case, Murali comes out easily the better.

wtd avg sc wtd avg
M Muralidaran 24,6 25,5
SK Warne 27,9 30,7

Now to the question that started this all — spinners v pacemen. For pacemen, the average ratio of scaled weighted average to weighted average is 1,08. For spinners it's 1,11. So it looks like spinners getting to bowl on raging turners is a bigger factor for their averages than having to shoulder the workload on flat tracks. Doing the diff v wkts plot as above for spinners and quicks separately clearly shows the quicks (on average) having smaller differences across all lengths of career.

Lastly, if you only scale downwards (i.e., if they bowled more than a quarter of the overs, then scale back to a quarter, else do nothing), then the ratios become 1,05 for quicks and 1,08 for spinners.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]