Thursday, February 07, 2008
Increasing or decreasing scores
Imagine a scorecard in which each batsman's score is less than the previous one. So, for example, the first opener scores 131, the second 82, the number three makes 75, the number four makes 56, and so on. What would you guess is the longest such sequence of declining scores in a Test innings? What would you guess is the longest increasing sequence?
These are, I admit, not the most important questions in cricket, but I found them fun. I would have thought that, while a long decreasing sequence would be rare, once you hit number 7 and the tail it should common enough to get down to number eight or nine. The problem is that once a batsman makes a duck, the sequence has to stop, so it'd be pretty unlikely to get all the way to number eleven.
An increasing sequence should be rarer, but perhaps when a captain reversed his batting order on a sticky wicket, you could get a sequence up to seven or so. Certainly I thought the longest increasing sequence would be shorter than the longest decreasing sequence.
As it happens, the longest increasing sequence goes to number six, but so does the longest decreasing sequence!
Decreasing sequences: Eng v Aus, 1905, Eng v SA, 1955, Aus v Eng, 1990/1, Zim v NZ, 1992/3.
Increasing sequences: Aus v Eng, 1884/5, Aus v Eng, 1932/3, Eng v WI, 1957.
If you allow a score to be equal to the previous score, the longest non-decreasing sequences still only go down to number six, but there are two non-increasing sequences to number 7: SA v Aus, 1949/50, Eng v Aus, 1977.
What about first-class matches? I only have a database of matches played in England (which is about 28000, more than half ever played), but the largest sequences (not allowing equal scores) in this dataset are of length eight:
Decreasing: Lancs v Middlesex, 1913, Lancs v Glamorgan, 1928, Wales v Minor Counties, 1930, Yorks v Lancs, 1952, Kent v Surrey, 1988.
Increasing: Lancs v Kent, 1988.
The Kent v Surrey innings (Surrey's first) is actually non-increasing right the way down to number eleven (it finishes with three ducks and a nought not out). There are two other cases of non-increasing sequences from one to eleven: Yorks v Lancs, 1888 (finishes with five ducks and a nought not out), Gloucs v Leics, 1929.
The longest non-decreasing sequence is still eight.
Is there a way we could have guessed how rare these sequences are? Any maths-phobes may wish to stop reading now. There's some calculus below.
A naïve approach would be to say that, since each score is either higher or lower than the previous one, it's just like flipping a coin. (Not exactly: it could be equal, but it'd be pretty close.) What's the probability of getting seven heads in a row? One in 128. But there have been over 1800 Test matches, so sequences of length seven should have occurred dozens of times!
The error in the above reasoning is at the start: it's not like flipping a coin. The first step is (the second opener could be higher or lower than the first, with a close to 50% chance each way), but after that, it's no longer 50-50. Suppose the first opener makes 40. It might be 50-50 as to whether the second opener scores less than 40. He might make 30, say. But then the probability that the number three will score less than 30 is less than 50%. If he makes 20, the chance that the number four will make less than this is even smaller than the previous probability, and so on.
A better way would be to assume that individual innings follow an exponential distribution, which says that scores of 0 are more common than scores of 1, which are more common than scores of 2, etc. (This isn't real distribution — in reality it's quite skewed towards zero — but it's a reasonable approximation for these fun purposes. Also, the runs are scored in discrete units — 1, 2, 3, etc. — but the exponential distribution allows for any positive real number of runs, such as sqrt(2) or 4.9.) Assume further (to make the maths easier) that each batsman has the same average.
The probability that a batsman with average 1/k makes a score less than x is given by:
Sorry for the ugly formatting. That's the integral from 0 to x of k*exp(-k*s) with respect to s.
Now let the first n batsman's scores be called s1, s2, ..., s(n-1), sn. We want the probability P that the sequence goes s1 > s2 > ... s(n-1) > sn. To start, we use (*) on the last link in the chain of inequalities:
Once we have that, we carry that on to the next inequality, and so on, until we have all of them. The last integral is from 0 to infinity, since the first score s1 can be anything:
Now we evaluate these integrals! Trust me when I say that, when you expand it all out, most of the terms cancel, and you're left with a term that comes from integrating the exponentials which multiply together, so that the probability is 1/n!.
So, a decreasing sequence of length six should happen about once every 720 innings, a sequence of length seven about once every 5040 innings, and a sequence of length 8 about once every 40320 innings.
Essentially the same argument gives the same probabilities for increasing sequences.
It's not perfect (and we wouldn't expect it to be so, given that the real distribution isn't exponential, and batsmen's averages aren't all equal), but it gets the right order of magnitude at least. It gives us a good idea of why we haven't seen a sequence of length seven in Test cricket yet, though we should get one eventually, perhaps in the next ten or twenty years.
These are, I admit, not the most important questions in cricket, but I found them fun. I would have thought that, while a long decreasing sequence would be rare, once you hit number 7 and the tail it should common enough to get down to number eight or nine. The problem is that once a batsman makes a duck, the sequence has to stop, so it'd be pretty unlikely to get all the way to number eleven.
An increasing sequence should be rarer, but perhaps when a captain reversed his batting order on a sticky wicket, you could get a sequence up to seven or so. Certainly I thought the longest increasing sequence would be shorter than the longest decreasing sequence.
As it happens, the longest increasing sequence goes to number six, but so does the longest decreasing sequence!
Decreasing sequences: Eng v Aus, 1905, Eng v SA, 1955, Aus v Eng, 1990/1, Zim v NZ, 1992/3.
Increasing sequences: Aus v Eng, 1884/5, Aus v Eng, 1932/3, Eng v WI, 1957.
If you allow a score to be equal to the previous score, the longest non-decreasing sequences still only go down to number six, but there are two non-increasing sequences to number 7: SA v Aus, 1949/50, Eng v Aus, 1977.
What about first-class matches? I only have a database of matches played in England (which is about 28000, more than half ever played), but the largest sequences (not allowing equal scores) in this dataset are of length eight:
Decreasing: Lancs v Middlesex, 1913, Lancs v Glamorgan, 1928, Wales v Minor Counties, 1930, Yorks v Lancs, 1952, Kent v Surrey, 1988.
Increasing: Lancs v Kent, 1988.
The Kent v Surrey innings (Surrey's first) is actually non-increasing right the way down to number eleven (it finishes with three ducks and a nought not out). There are two other cases of non-increasing sequences from one to eleven: Yorks v Lancs, 1888 (finishes with five ducks and a nought not out), Gloucs v Leics, 1929.
The longest non-decreasing sequence is still eight.
Is there a way we could have guessed how rare these sequences are? Any maths-phobes may wish to stop reading now. There's some calculus below.
A naïve approach would be to say that, since each score is either higher or lower than the previous one, it's just like flipping a coin. (Not exactly: it could be equal, but it'd be pretty close.) What's the probability of getting seven heads in a row? One in 128. But there have been over 1800 Test matches, so sequences of length seven should have occurred dozens of times!
The error in the above reasoning is at the start: it's not like flipping a coin. The first step is (the second opener could be higher or lower than the first, with a close to 50% chance each way), but after that, it's no longer 50-50. Suppose the first opener makes 40. It might be 50-50 as to whether the second opener scores less than 40. He might make 30, say. But then the probability that the number three will score less than 30 is less than 50%. If he makes 20, the chance that the number four will make less than this is even smaller than the previous probability, and so on.
A better way would be to assume that individual innings follow an exponential distribution, which says that scores of 0 are more common than scores of 1, which are more common than scores of 2, etc. (This isn't real distribution — in reality it's quite skewed towards zero — but it's a reasonable approximation for these fun purposes. Also, the runs are scored in discrete units — 1, 2, 3, etc. — but the exponential distribution allows for any positive real number of runs, such as sqrt(2) or 4.9.) Assume further (to make the maths easier) that each batsman has the same average.
The probability that a batsman with average 1/k makes a score less than x is given by:
x
/
| ds k*exp(-k*s) (*)
/
0
Sorry for the ugly formatting. That's the integral from 0 to x of k*exp(-k*s) with respect to s.
Now let the first n batsman's scores be called s1, s2, ..., s(n-1), sn. We want the probability P that the sequence goes s1 > s2 > ... s(n-1) > sn. To start, we use (*) on the last link in the chain of inequalities:
s(n-1)
/
| dsn k*exp(-k*sn)
/
0
Once we have that, we carry that on to the next inequality, and so on, until we have all of them. The last integral is from 0 to infinity, since the first score s1 can be anything:
oo s1 s(n-1)
/ / /
| ds1 k*exp(-k*s1) | ds2 k*exp(-k*s2) ... | dsn k*exp(-k*sn)
/ / /
0 0 0
Now we evaluate these integrals! Trust me when I say that, when you expand it all out, most of the terms cancel, and you're left with a term that comes from integrating the exponentials which multiply together, so that the probability is 1/n!.
So, a decreasing sequence of length six should happen about once every 720 innings, a sequence of length seven about once every 5040 innings, and a sequence of length 8 about once every 40320 innings.
Essentially the same argument gives the same probabilities for increasing sequences.
It's not perfect (and we wouldn't expect it to be so, given that the real distribution isn't exponential, and batsmen's averages aren't all equal), but it gets the right order of magnitude at least. It gives us a good idea of why we haven't seen a sequence of length seven in Test cricket yet, though we should get one eventually, perhaps in the next ten or twenty years.
Subscribe to Posts [Atom]