## Wickets broken down by ball in the over

Quick one today, I've been busy for reasons that will become clear in a couple of days.

Here's the breakdown of wickets by ball in the over, in Tests since 1998 or so.

1: 2448
2: 2443
3: 2537
4: 2464
5: 2639
6: 2413

Ball five is about 3.3 standard deviations above the mean, which is interesting and significant at p=0.003. (Usually 3.3 standard deviations would correspond to p=0.0005, but there are six tests going on, which increases the likelihood that one of them will turn out significant. So I multiplied that 0.0005 by 6, which I hope is the correct thing to do.) I can't think of any obvious reason why the fifth ball in the over is relatively wicket-prone, so I'm leaning towards it just being a blip. Perhaps those stalemates in which the top-order batsman bats with the tail-ender and holds the strike for the first four balls? I don't know.

Now for the IPL:

1: 122
2: 131
3: 104
4: 104
5: 104
6: 124

The numbers are pretty small, but it's something to think about for when I gather more T20 data. Perhaps batsmen take a couple of balls to get their eye in against new bowlers. In Test cricket, these bowler changes happen less frequently, and also the batsmen are more watchful. In T20, they might be slogging from ball one. Just a thought, nothing concrete.

The p-value for multiple tests would be a binomial distribution, looking for 1 or more instances of significance. Or zero failures = 0.9995^6 = 0.9970. So, yes, 0.003.

In lower grade cricket you'd put it down to impatience on the part of the batsman: four consecutive dot balls induces a poor stroke. But if that was the case it should also manifest itself in balls 4 and maybe 6. Can you get a rundown of the percentage of each ball containing a scoring stroke? Alternatively, can you break this down by wicket, to see if it changes by batsman type?

There should be a marginal decrease in wickets for each ball, given the chance to bowl an innings ending ball 1 is more likely than the chance to bowl an innings ending ball 6 (tail-ender protection aside). But that wouldn't be much of an effect... maybe 0.5%?

The fifth ball is when bowlers most often put in a variation like a slower ball or something. I don't know if this practice is prevalent enough to show up statistically, but it's definitely something a lot of bowlers do ... at least at club level.

Thanks for the little stats lesson again Russ. The "multiply by number of tests" thing seems to be a trick that works with numbers close to 1.

I don't have ball-by-ball data for Tests (sort of - I have downloaded some, but it's raw Cricinfo HTML commentary, not ready to be computer-read).

There's a 0.8% difference in number of sixth balls to number of first balls in the dataset I worked with. I corrected for this.

I broke everything down by which wicket it was, and the fifth ball is more wicket-prone than average on all wickets except the fourth. There's barely any difference between the first three wickets and the last three wickets.

I didn't look at which batsmen got out on which ball, but it doesn't look like it'd affect anything too much.

It's weird.

Dave, thanks for that. It is an intriguing anomaly. Edladd might be on to something, but like my theories it would need some finer data resolution which we don't have.

Parsing the cricinfo commentaries is fairly straight-forward, but a bit labor intensive. If I get a chance sometime I'll write a perl parser and forward it on to you.

However, I ran a quick test on the first two tests of the England-South Africa series, which was equally intriguing. In both tests the 5th ball had slightly fewer dot balls and significantly more runs than the others. It also had substantially more wickets, but the data for that variable is so sparse it is not significant. (Across both tests, displayed by Balls, Wick, %Dot Balls, Run-rate)

1: 753 5 74.50% 2.87
2: 758 8 74.67% 2.99
3: 754 6 76.66% 2.77
4: 761 4 76.48% 2.89
5: 745 17 73.15% 3.63
6: 745 11 75.44% 3.06

Across two innings you'd expect the run-rate to even out, but it is 2 standard deviations away from the mean. (Actually, the 6 ball over works against us here, because it lifts both the mean and std.dev up; if we took the mean and standard deviation of the other five balls it is over 6 std.dev from their mean!)

And, you can see the same (smaller) effect again in the recent Aust-WI series:

1: 1127 20 71.52% 3.28
2: 1139 10 73.75% 3.22
3: 1128 25 73.94% 3.21
4: 1141 10 73.71% 3.01
5: 1130 20 73.01% 3.56
6: 1130 18 74.25% 3.27

This time there aren't significantly more wickets. But, note that the percentage of scoring strokes is similar yet the runs are higher.

Something is happening on the fifth ball. You are right, it is weird.

No need to write a parser for me, Russ, I'll do it myself tonight or tomorrow. I've been putting off doing ball-by-ball stuff for a while, but I am very intrigued by this ball five thing now.

"The "multiply by number of tests" thing seems to be a trick that works with numbers close to 1."

It's a classic approximation of a binomial expansion.

(1-x)^n = 1 - nx + n(n-1)x^2 - ... + (-x)^n

Now if x << 1, we take all powers of x > 1 to be negligible:

(1-x)^n ~= 1 - nx.

That was indeed what I was getting at, Electric Dragon. :)

Right, so looking at most Tests between 2001 and 2007 (which is what I have ball-by-ball for), there's nothing special about the runs or boundaries scored off ball 5. The run rates go 3.10, 3.07, 3.04, 3.05, 3.00, 2.93. Ball five wickets for this dataset still more than 2.3 sd's above where they should be.