Pappus' plane - cricket stats: John Buchanan and The Guardian article

Saturday, May 10, 2008

John Buchanan and The Guardian article

Hello to those of you who've come here from Andy Bull's piece in The Guardian. I hope you find something interesting here.

There are a couple of ideas in that article that I think are worthy of more detailed discussion.

What John Buchanan says is interesting, but it seems to me that he's taking a purely coaching perspective. He says:

1) Ignore existing cricket statistics - these are just the 'outcome numbers' of a process of getting there.

If I were a coach, I would probably agree with this. Buchanan goes on to give the example of strike rate. It would be no good a coach saying to a player, "Hey, you're averaging 35 at a strike rate of 70. I want you to average 40 at a strike rate of 80." You need to break batting down into its parts and make improvements at that level.

That's where the ball-by-ball analysis comes in — what Buchanan calls 'process numbers'. (Buchanan is very big on processes, I gather. I've seen him talk about them elsewhere.) You look at the dot balls, try to improve shot selection on them, etc. You hope that you'll end up scoring more runs at a higher rate.

That's what the coach does. From a selection perspective, the outcome numbers are still going to be important. No-one cares what your percentage of dot balls is if you average 25, and no batsman will hold down a spot in the national side with such a low outcome number. Cricket games are won by the team that scores the most runs, and we shouldn't lose sight of that. All the 'processes' work is no good if it doesn't improve averages (or strike rates, in limited overs cricket).

Now, there are times when process numbers might be useful in selection — if a batsman has bad process numbers, then perhaps with coaching he might improve a lot more than a batsman who's already largely optimised his game. I don't know. Without seeing the figures involved and knowing what improvements are usually made, it's hard to say how useful such an approach would be.

Now onto one of the questions Bull posed at the end of the column:

Could we see teams selected through statistical proof rather than the current woolly combination of gut instinct, vague notions about character and compromised measures such as batting averages?

I will be very surprised if, in the forseeable future, detailed statistics will be better at team selection than human experts with regular stats. In terms of working out when to drop players, they might be. (I said here that selectors are probably best off with their gut on dropping players. Perhaps with detailed process stats you could do better, I don't know.)

But when it comes to finding the best players in domestic cricket, I doubt if a computer would do better than Duncan Fletcher, for example (if you haven't read Andrew Strauss's thoughts on Fletcher, I recommend doing so). Fletcher famously picked Michael Vaughan for the 1998/9 tour of South Africa on 'temperament'. His record in county cricket was not great — his first-class averages in the previous two seasons were 34 and 41. His average for Yorkshire is still well under 40. But despite that, in England colours he turned himself into a good batsman, doing better against Test sides than against county sides.

Now, it's possible that with sufficient process numbers from his county games, you would be able to tell him apart from the rest of the county hacks averaging high 30's. But I'd be surprised if it were so.

Obviously you'll want to be paying attention to stats when picking national sides — you won't consider batsmen averaging under 30, and you'll certainly be looking at those averaging 60 — but since the quality of the players is significantly lower in domestic cricket, you'll want humans watching them, gauging their technique and judging if they'll hold up against 90mph pace bowling or top-class spinners.

They don't always get it right, of course, but I think that they do better than a computer (or a person looking only at numbers) would do.

# posted by David Barry : 09:57

Comments:

There is one point here David. Consider a placement scenario in college. The first screening happens on the basis of one's marks. The shortlisted candidates are then picked for interviewing. Same is the problem with coaches. When the talent pool is huge, you have to rely on statistics to pick the initial shortlist. You can then observe how good they are with their game before roping them into the scheme of things.
Numbers are indispensable.
But then, they can mislead us, like many of our decisions do.

# posted by

Anonymous : 10 May 2008, 10:42:00 am

David, I'm not sure what Buchanan's argument is about the relationship between process numbers and outcome numbers, but I think the point is meant to be that the two have to be related to one another. It's the kind of thing Bill James tried to do...find an algorithm that churns through the process numbers and outputs the outcome numbers.

The idea for Buchanan seems to be that if you can improve the former, the latter will automatically improve too. Of course, there are all kinds of dependent/independent variable problems with this approach. It's like observing about batsmen that they're no more likely to be dismissed in the first twenty balls they face than any other twenty ball stretch, therefore they shouldn't bother starting their innings slowly.

# posted by

Anonymous : 10 May 2008, 7:22:00 pm

Yeah... I just thought that the way the article was presented, it was suggesting that we should all ('should' is too strong a word perhaps) do away with existing statistics. And I think that that's wrong.

It's like observing about batsmen that they're no more likely to be dismissed in the first twenty balls they face than any other twenty ball stretch
You could probably find another example of what you're talking about, but this one isn't true. Batsmen are actually more likely to get out in their first twenty balls than in any other twenty-ball stretch, and this despite their batting slowly at the start of their innings.

# posted by

David Barry : 10 May 2008, 7:42:00 pm

Agreed. It was a bad example.

Also, I think there's plenty of room for improving outcome stats as well as creating process stats. It's not only one or the other of course.

# posted by

Anonymous : 10 May 2008, 10:12:00 pm

I'm sure this has, in all likelihood, already been covered, but where is the line between process and outcome in batting?
Billy Beane's baseball analysis (indebted to to Bill James, etc) raised On-Base Percentage to supreme importance, since in theory a high OBP would lead to success more often than a high RBI, and so on. In this case, the outcome is scoring a run without conceding an out.

In cricket, things are more complicated (of course), as time and/or overs are limited. Does this mean that there is no cricketing equivalent of OPB as a 'winning statistic'? A high average doesn't necessarily mean a player will make more runs, and a high strike rate takes no account of how often a batsmen gets out.

For bowlers, I presume a low average to be a good indicator for Test/4-day bowling, and a low economy rate to be the best for Limited-Overs bowling - am I near the mark, or miles away?

# posted by

Anonymous : 11 May 2008, 3:05:00 pm

Hey spunout. I was thinking about talking about what Beane did, but I thought it'd make the blog entry too long.

Batting statistics in cricket and baseball are fundamentally different. In cricket, the only way you can score a run is to hit the ball and run (or let the ball reach the boundary). In baseball, you can score either by hitting a home run (which is like scoring in a cricket sense), or you can get on base. The latter has no equivalent in cricket, since you can't stop running half-way up the pitch.

So if you want to treat runs scored as the outcome stat, then whichever of OBP, BA, SLG, OPS, etc. you use, you're looking at a process stat. (I guess RBI would be an outcome stat.)

In concentrating on OBP rather than BA, Beane was using a more useful process stat than the other Major League managers.

(Pitching is a bit different. ERA and win/loss are outcome stats, but there are better ways to measure pitchers.)

When it comes to batting in cricket, the batting average is really useful for judging players, and predicting how they'll do in the future. For a lot of the analyses I do, I tweak it by adjusting for the strength of opposition (and era), but fundamentally the average is a really good number to use when evaluating a batsman. This is because, as I mentioned earlier, in cricket you can only score runs by scoring them - you can't half-score a run like in baseball.

A batsman with a high average is a good batsman, and will probably score more runs than a batsman with a low average. A bowler with a low average is a good bowler.

(Obviously we're assuming at least a moderate number of innings/wickets against quality opposition.)

For limited overs cricket, you're going to consider strike rate and economy rate as well. I don't know of anyone who's conclusively shown the best way to combine average with strike/economy rate - multiplying them together seems a reasonable thing to do, but there's probably a better way. In 20-over cricket, I think you'd want to put more emphasis on the economy rate. In 50-over cricket, wickets are still useful - I'd like a bowler who takes 3/50 from 10, rather than 0/35 from 10.

# posted by

David Barry : 11 May 2008, 3:55:00 pm

Pappus' plane - cricket stats

Saturday, May 10, 2008

John Buchanan and The Guardian article

About Me

Email

Links

Archives