<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-22713811</id><updated>2012-01-08T07:22:08.311+01:00</updated><title type='text'>Pappus' plane - cricket stats</title><subtitle type='html'>My writings on and analyses of cricket statistics.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>99</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-22713811.post-9041504742812254626</id><published>2012-01-08T06:52:00.002+01:00</published><updated>2012-01-08T07:22:08.337+01:00</updated><title type='text'>Note on the probability of century</title><content type='html'>This is mostly for myself so that I have something to refer to in future.  In the &lt;a href="http://pappubahry.blogspot.com/2011/12/duck-to-century-ratios.html"&gt;previous post&lt;/a&gt; I skipped some algebra and gave a formula for the probability of a century, given a hazard function which is 3/(avg+3) on nought and 1/(avg+3) for all scores greater than nought.&lt;br /&gt;&lt;br /&gt;The algebra I originally posted was slightly wrong (I have corrected it in the second edit), but could have been simplified.  The probability of a score of at least 100 is [avg/(avg+3)] * [(avg+2)/(avg+3)]&lt;sup&gt;99&lt;/sup&gt;.  It is a little bit annoying having such a high exponent.  Re-write (avg+2)/(avg+3) as [1-1/(avg+3)].  Then from a definition of the exponential function, raising that to the power of 99 is approximately equal to exp[-99/(avg+3)].  &lt;br /&gt;&lt;br /&gt;So, the probability of a century is [avg/(avg+3)] * exp[-99/(avg+3)].&lt;br /&gt;&lt;br /&gt;Empirically, the simple exp(-100/avg) is a better predictor of the fraction of a batsman's innings that are centuries.  If we exclude not-outs below 100, then it is much of a muchness between those two and exp(-99/avg).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-9041504742812254626?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/9041504742812254626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=9041504742812254626' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9041504742812254626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9041504742812254626'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2012/01/note-on-probability-of-century.html' title='Note on the probability of century'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2913865673130513939</id><published>2011-12-26T06:38:00.004+01:00</published><updated>2012-01-08T06:15:19.477+01:00</updated><title type='text'>Duck to century ratios</title><content type='html'>&lt;a href="https://twitter.com/#!/RicFinlay/status/151157970586181632"&gt;Ric Finlay on Twitter&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SMarsh Oz's 1319th Test duck. 747 Oz centuries. Oz 1.77 ducks per century Ind 2.01 ducks per century. Bangladesh 9 ducks/100!&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;As a bit of pointless fun, let's pretend that all representatives of each country are equally good batsmen (a terrible assumption, obviously, but perhaps the tail-enders' ducks and top-order centuries will roughly cancel out in the right ratios) and apply the hazard function mentioned at the end of &lt;a href="http://pappubahry.blogspot.com/2010/12/are-some-batsmen-nervous-starters.html"&gt;this post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This hazard function is the simplest extension of the constant-hazard model, the latter saying, effectively, that a batsman always has his eye in.  In the "new" model, the batsman is three times more likely to get out on nought than to get out when on some other score, but gets his eye in as soon as he's off the mark.  When studying the proportion of ducks, this is an important change to make, because the constant-hazard model will under-estimate the number of ducks.&lt;br /&gt;&lt;br /&gt;The algebra is boring and I will skip it.  The probability of a duck is 3/(avg + 3), and the probability of a score less than 100 is 3/(avg + 3) + {1 - [(avg+2)/(avg+3)]&lt;sup&gt;98&lt;/sup&gt;} * avg/(avg + 3).  Subtract the latter expression from 1 to get the probability of scoring a century.&lt;br /&gt;&lt;br /&gt;Throw it into an Excel formula, and you find that a duck:century ratio of 9 corresponds to an average of about 21.1, a ratio of 2.01 corresponds to an average of about 30.1, and a ratio of 1.77 corresponds to an average of about 31.1.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;filter=advanced;groupby=team;orderby=batting_average;template=results;type=batting"&gt;The actual averages&lt;/a&gt; are 20.6, 31.5, and 32.3 respectively.&lt;br /&gt;&lt;br /&gt;(&lt;b&gt;Edit&lt;/b&gt;: Fixed the exponent in the probability for a score less than 100.)&lt;br /&gt;&lt;br /&gt;(&lt;b&gt;Edit 2, 8/1/2011&lt;/b&gt;: The earlier edit fixed the typo in my transcription from my handwritten work to the blog, but the handwritten work was wrong.  The exponent should be 99, not 98.  The corresponding averages to the three duck:century ratios should be about 21.3, 30.4, and 31.4 respectively.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2913865673130513939?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2913865673130513939/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2913865673130513939' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2913865673130513939'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2913865673130513939'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2011/12/duck-to-century-ratios.html' title='Duck to century ratios'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6258360494114040823</id><published>2011-02-08T07:09:00.002+01:00</published><updated>2011-02-08T07:14:57.685+01:00</updated><title type='text'>The World Cup group stage</title><content type='html'>Russ has done some ODI ratings and &lt;a href="http://deggles.csoft.net/post.php?postid=1544"&gt;looked ahead&lt;/a&gt; to the group stage of the World Cup.  His main point is that it is more likely than a side from outside the "top eight" will make the quarters.  (I need to use scare quotes, because Bangladesh are actually ahead of the West Indies in the ICC ODI rankings now, though not in Russ's rankings.)&lt;br /&gt;&lt;br /&gt;But there's more to the group stage than seeing if Bangladesh will beat the Windies (I'm writing off Ireland &amp;ndash; they were easily beaten by Bangladesh in the one series they've played in Asia).  In the quarter-finals it'll be A1 v B4, A2 v B3, etc., so finishing top of the group will probably mean an easy quarter-final.  I'm not sure if this'll be enough to keep us interested, but it is a real effect.&lt;br /&gt;&lt;br /&gt;Using Russ's ratings (so don't blame me for India being easily below Australia and South Africa) and assuming that the "top eight" sides make the quarters in rating order, I ran 100000 simulations of the knockout stage.  These are the probabilities of overall victory than I get:&lt;br /&gt;&lt;pre&gt;Aus 0.28     SA  0.28&lt;br /&gt;SL  0.13     Ind 0.18&lt;br /&gt;Pak 0.04     Eng 0.08&lt;br /&gt;NZ  0.01     WI  0.01&lt;/pre&gt;&lt;br /&gt;Now suppose that Australia, through bad luck, loses its games to SL, Pak, and NZ, and so finishes fourth in group A, thus playing South Africa in the quarters:&lt;br /&gt;&lt;pre&gt;SL  0.22     SA  0.20&lt;br /&gt;Pak 0.05     Ind 0.21&lt;br /&gt;NZ  0.01     Eng 0.11&lt;br /&gt;Aus 0.19     WI  0.01&lt;/pre&gt;&lt;br /&gt;Sri Lanka went from having less than half Australia's chance at overall success, to having a better chance than Australia.  Obviously South Africa takes quite a hit in this scenario as well.&lt;br /&gt;&lt;br /&gt;What if we only swap India and South Africa?&lt;br /&gt;&lt;pre&gt;Aus 0.28     Ind 0.21&lt;br /&gt;SL  0.13     SA  0.25&lt;br /&gt;Pak 0.04     Eng 0.08&lt;br /&gt;NZ  0.01     WI  0.01&lt;/pre&gt;&lt;br /&gt;A little change.  &lt;br /&gt;&lt;br /&gt;These ideas should motivate the teams to play hard in the group stage &amp;ndash; there won't be any really dead rubbers against the major sides (except maybe at the very end).  Of course the winning team will still have to win three straight games to take the Cup, so my suspicion is that from a fan's perspective, this won't add much excitement to the group matches.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6258360494114040823?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6258360494114040823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6258360494114040823' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6258360494114040823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6258360494114040823'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2011/02/world-cup-group-stage.html' title='The World Cup group stage'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5050205686910772565</id><published>2011-01-21T13:21:00.002+01:00</published><updated>2011-01-21T14:08:52.101+01:00</updated><title type='text'>Conversions from 50 to 100</title><content type='html'>&lt;a href="http://cricketingview.blogspot.com/2011/01/samit-patel-and-englands-discriminatory.html"&gt;In comments&lt;/a&gt; at Kartikeya's blog there was a little aside about Samit Patel's conversion rate &amp;ndash; he has only 10 first-class centuries, despite reaching fifty 34 times.  Kartikeya said that such a low conversion rate was typical of players who bat at 6 or 7, and gave the example of VVS Laxman.&lt;br /&gt;&lt;br /&gt;(From the few scorecards I've checked, Patel seems to often bat at 4 for Notts.)&lt;br /&gt;&lt;br /&gt;The breakdown of Laxman's record is indeed &lt;a href="http://stats.espncricinfo.com/ci/engine/player/30750.html?class=1;template=results;type=batting"&gt;stark&lt;/a&gt;: batting at 6, he averages 51 and has made 5 centuries having reached fifty 25 times; batting at 3, he averages 47 and has made 4 centuries having reached fifty 10 times.&lt;br /&gt;&lt;br /&gt;The obvious question is, is this typical?  This seems like as good an excuse as any to use the aside mentioned at the bottom of &lt;a href="http://pappubahry.blogspot.com/2010/12/are-some-batsmen-nervous-starters.html"&gt;this post&lt;/a&gt;.  In that basic model, batsmen effectively bat like they average 2 runs more per innings once they get off the mark.  So, their conversion rate from 50 to 100 should be, on average, exp(-50/(avg+2)).  &lt;br /&gt;&lt;br /&gt;Here is a scatterplot of actual conversion rate against expected conversion rate, for all batsmen who've scored 2000 runs batting at positions 1-4:&lt;br /&gt;&lt;br /&gt;&lt;img alt="So it looks like I haven't given up blogging just yet...." title="So it looks like I haven't given up blogging just yet...." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/conversionrate1-4.png"&gt;&lt;br /&gt;&lt;br /&gt;The red line is y=x.  There are 79 batsmen above the line and 75 below, so the model seems pretty decent.  &lt;br /&gt;&lt;br /&gt;Now here is the same scatterplot for batsmen at positions 6-7:&lt;br /&gt;&lt;br /&gt;&lt;img alt="I still have one more spreadsheet waiting to turn into a blog post as well." title="I still have one more spreadsheet waiting to turn into a blog post as well." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/conversionrate6-7.png"&gt;&lt;br /&gt;&lt;br /&gt;Only 6 of the 27 batsmen are above expectation, presumably because they're left stranded or have to start hitting out with 9 wickets down.  Laxman is the point (0.406, 0.179).  He's on the bottom edge of the scatter, so his very low conversion rate somewhat unusual, even for lower-middle order batsmen.  The regression line forced through the origin is y = 0.82x.&lt;br /&gt;&lt;br /&gt;Returning to the "purer" sample of top-order batsmen, we can ask whether conversion from 50 to 100 is a skill.  Using the same method as in the post I linked to earlier, we can treat "scoring a century, having reached fifty" as a binomial random variable, which happens with probability p = exp(-50/(avg+2)).  If a batsman has reached fifty N times, then we can calculate z = (actual number of hundreds - Np) / sqrt[Np(1-p)].  If "scoring a century, having reached fifty" is a skill separate from the batting average, then we'd expect the standard deviation of z's to be greater than 1.&lt;br /&gt;&lt;br /&gt;As it happens, the standard deviation is 0.93.  Perhaps that'd increase a little bit if you treated not-out innings between 50 and 99 properly.  But it looks to me like most of the variation in conversion rates between top-order players is down to differences in their general batting ability (as measured by their average) and random luck.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5050205686910772565?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5050205686910772565/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5050205686910772565' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5050205686910772565'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5050205686910772565'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2011/01/conversions-from-50-to-100.html' title='Conversions from 50 to 100'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_conversionrate1-4.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7737329058914627146</id><published>2010-12-29T13:58:00.002+01:00</published><updated>2010-12-29T14:04:29.521+01:00</updated><title type='text'>Update on no-ball rates</title><content type='html'>It's been a few years since I wrote &lt;a href="http://pappubahry.blogspot.com/2008/01/no-no-balls.html"&gt;this&lt;/a&gt;, noting the sharp decline in the no-ball rate in ODI's since the start of Twenty20.  The trends don't appear to be a blip, as we now have three more years of data to look at, and the evidence is pretty solid:&lt;br /&gt;&lt;br /&gt;&lt;img alt="This time I haven't got funny French dates on the x-axis." title="This time I haven't got funny French dates on the x-axis." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/nbrateodi.png"&gt;&lt;br /&gt;&lt;br /&gt;The drop-off is also quite visible, though not as dramatic, in Tests:&lt;br /&gt;&lt;br /&gt;&lt;img alt="No prizes for guessing when the front-foot law was introduced." title="No prizes for guessing when the front-foot law was introduced." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/nbratetest.png"&gt;&lt;br /&gt;&lt;br /&gt;In less than a week, my almost month-long holiday from work will be over, so the recent burst of activity on this blog will probably end.  I'll have a couple more blog posts in the next few days, and then I'll probably go back to just being a commenter around the place.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7737329058914627146?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7737329058914627146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7737329058914627146' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7737329058914627146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7737329058914627146'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2010/12/update-on-no-ball-rates.html' title='Update on no-ball rates'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_nbrateodi.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3735493885282050814</id><published>2010-12-15T15:27:00.004+01:00</published><updated>2010-12-15T16:33:49.880+01:00</updated><title type='text'>18th century statistics</title><content type='html'>The &lt;a href="http://acscricket.com/"&gt;ACS&lt;/a&gt; seems to have settled on a list of 'great' matches from the 18th century, starting from 1772, considered equivalent to first-class for statistical purposes.  CricketArchive now lists these matches as first-class, starting with &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/4.html"&gt;Hampshire v England&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If any of the few remaining readers that I have were here three years ago, they might remember a series of posts on 19th-century first-class cricket in England.  The biggest challenge was estimating bowling averages for the very early scorecards, which don't credit catches to the bowler, and don't record the number of runs conceded by the bowler.  The method is described in &lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;this post&lt;/a&gt;, with results (with funny +/- values!) &lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Over the last few days I've dug up my old code and run the estimations on the "new" first-class matches, and added these numbers to the 19th-century estimations where applicable.&lt;br /&gt;&lt;br /&gt;Things to remember:&lt;br /&gt;&lt;br /&gt;- The error in the bowling average "should" be about 8%, in the sense that about 68% of the true averages should be within 8% of the estimated figure.  But in the testing runs I did a few years ago, very high wicket-takers seemed unusually likely to get anomalous estimates of their average (out to around 15%).  &lt;br /&gt;&lt;br /&gt;- The exercise is even more speculative for these very early matches.  The training data is from 1854-1863, and I'm here using it to find stats in games played more than 80 years before that.&lt;br /&gt;&lt;br /&gt;- Nevertheless, they can't really be too far wrong.&lt;br /&gt;&lt;br /&gt;- The estimates of wickets, unlike bowling averages, can be atrocious.  They are all almost certainly under-estimates, by seemingly random amounts.&lt;br /&gt;&lt;br /&gt;- I only consider matches where the dismissals are all known, and it was eleven-a-side.  (So the number of matches played is slightly lower than what you'll see on CricketArchive.)&lt;br /&gt;&lt;br /&gt;Here are the leading wicket-takers (at least 150 estimated wickets) of players who began their career in the 18th century, ordered by estimated bowling average (their batting stats are also there):&lt;br /&gt;&lt;pre&gt;                                                              batting          est. bowling&lt;br /&gt;name                start   end     mats    inns    n.o.    runs    avg    wkts   runs      avg&lt;br /&gt;D Harris            1789    1798      72     124      42     467     5.7   481.7  4985.1    10.3&lt;br /&gt;John Wells          1789    1815     138     256      19    2927    12.4   551.3  6134.9    11.1&lt;br /&gt;T Boxall            1790    1803      81     147      28     822     6.9   465.7  5251.4    11.3&lt;br /&gt;T Walker            1789    1810     166     315      19    5757    19.4   486.0  5488.7    11.3&lt;br /&gt;T Lord              1790    1815      55     100      15     831     9.8   217.0  2495.4    11.5&lt;br /&gt;Lord F Beauclerk    1791    1825     124     229      19    5259    25.0   577.5  7047.1    12.2&lt;br /&gt;R Purchase          1773    1803     109     199      18    1820    10.1   373.8  4572.2    12.2&lt;br /&gt;E Stevens           1773    1789      71     126      37     694     7.8   496.0  6112.4    12.3&lt;br /&gt;W Beldham           1789    1821     178     328      18    6709    21.6   376.0  4704.5    12.5&lt;br /&gt;J Hammond           1790    1816     114     206      13    3741    19.4   244.3  3098.8    12.7&lt;br /&gt;R Clifford          1789    1792      68     131       7    1484    12.0   343.5  4396.1    12.8&lt;br /&gt;T Brett             1773    1778      24      43      11     231     7.2   154.4  2180.9    14.1&lt;br /&gt;W Fennex            1790    1816      82     155      14    1881    13.3   220.6  3239.0    14.7&lt;br /&gt;W Bullen            1789    1797     108     207      43    1697    10.3   289.5  4271.4    14.8&lt;br /&gt;R Nyren             1774    1786      41      76      11     824    12.7   162.2  2459.1    15.2&lt;/pre&gt;&lt;br /&gt;My knowledge of 18th-century cricket is pretty close to zero, so I had to look most of these players up.  The &lt;i&gt;Who's Who of Cricketers&lt;/i&gt; says of David Harris: &lt;i&gt;The greatest bowler of the Hambledon Club, it is impossible to gauge his real success owing to lack of bowling analyses.&lt;/i&gt;  Well now we have a bit of a gauge.&lt;br /&gt;&lt;br /&gt;When I did the estimates for the 19th century, there were players (such as Alfred Mynn) who played in many matches with their bowling analyses recorded, and in many without.  These players serve as a useful check on the method &amp;ndash; if their estimated average was similar to their actual average in matches with known figures, then that adds to the confidence we can put in the estimates.&lt;br /&gt;&lt;br /&gt;For the 18th century, that's not possible.  We instead have to add another link in the chain &amp;ndash; take players who played in both centuries, whose 19th-century estimates we're pretty confident about because of the Mynn-like players from later on.  For whatever that's worth, we can look at Wells, Walker, Beauclerk, Beldham, and Hammond, and make the comparison.&lt;br /&gt;&lt;pre&gt;player    18th C   19th C&lt;br /&gt;Wells     10.9     11.4&lt;br /&gt;Walker    11.2     12.0&lt;br /&gt;Beauclerk 11.3     12.6&lt;br /&gt;Beldham   12.5     12.4&lt;br /&gt;Hammond   12.9     12.1&lt;/pre&gt;&lt;br /&gt;I don't think I'd want to draw too many conclusions from such a small sample, but it at least passes a sanity check.&lt;br /&gt;&lt;br /&gt;In case anyone's interested in batting stats, here they are (qual. 2000 runs):&lt;br /&gt;&lt;pre&gt;                                                              batting           est. bowling&lt;br /&gt;name                start   end     mats    inns    n.o.    runs    avg    wkts   runs      avg&lt;br /&gt;Lord F Beauclerk    1791    1825     124     229      19    5259    25.0   577.5  7047.1    12.2&lt;br /&gt;R Robinson          1792    1819     102     196      15    3992    22.1    46.6  1142.7    24.5&lt;br /&gt;W Beldham           1789    1821     178     328      18    6709    21.6   376.0  4704.5    12.5&lt;br /&gt;T Walker            1789    1810     166     315      19    5757    19.4   486.0  5488.7    11.3&lt;br /&gt;J Hammond           1790    1816     114     206      13    3741    19.4   244.3  3098.8    12.7&lt;br /&gt;J Aylward           1773    1797      99     194       6    3611    19.2     5.0   103.8    20.8&lt;br /&gt;H Walker            1789    1802      93     171       4    2518    15.1     0.0     0.0     0.0&lt;br /&gt;J Small sen         1773    1798     100     189       8    2724    15.0     7.1   158.1    22.4&lt;br /&gt;J Ring              1789    1796      83     164      10    2088    13.6     2.5    48.3    19.3&lt;br /&gt;J Small jun         1789    1810     134     252      13    3216    13.5     0.0     0.0     0.0&lt;br /&gt;A Freemantle        1789    1810     125     235      28    2674    12.9     0.0     0.0     0.0&lt;br /&gt;John Wells          1789    1815     138     256      19    2927    12.4   551.3  6134.9    11.1&lt;br /&gt;Earl of Winchilsea  1789    1804     124     235      10    2048     9.1     6.5   112.5    17.2&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3735493885282050814?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3735493885282050814/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3735493885282050814' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3735493885282050814'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3735493885282050814'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2010/12/18th-century-statistics.html' title='18th century statistics'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2939807593304853930</id><published>2010-12-03T10:51:00.002+01:00</published><updated>2010-12-03T12:02:31.991+01:00</updated><title type='text'>Are some batsmen nervous starters?</title><content type='html'>Probably.  But the ability to get off the mark seems to be determined by how good a batsman is overall.  There is of course variation between batsmen in the percentage of ducks they make, but no more than would be expected by random chance.&lt;br /&gt;&lt;br /&gt;The starting point is to work out what the relationship is between a batsman's average and the percentage of innings that are ducks.  (Ideally I would exclude scores of nought not-out from this analysis, but I did everything with Statsguru because it's easier.  This won't make much of a difference.)&lt;br /&gt;&lt;br /&gt;I took all batsmen with at least 20 Test innings against top-eight sides and put them into 'buckets' &amp;ndash; the first bucket had batsmen who averaged less than 10, the second averaged 10-19.99, the third 20-20.99, etc., up to 50-59.99.&lt;br /&gt;&lt;br /&gt;Then for each bucket, I sum up the number of ducks and divide by the number of innings to get the percentage of ducks.  I also find the overall average of all the batsmen in the bucket.&lt;br /&gt;&lt;br /&gt;Now, as discussed in &lt;a href="http://pappubahry.blogspot.com/2008/04/getting-your-eye-in.html"&gt;this old post&lt;/a&gt;, the probability H(x) of getting out on a particular score x is related to an 'effective average' µ(x) by µ(x) = 1/H(x) - 1.&lt;br /&gt;&lt;br /&gt;Since we will be plotting against the overall average, it makes sense to use the effective average on nought rather than the percentage of ducks.  The result is a lovely linear plot:&lt;br /&gt;&lt;br /&gt;&lt;img alt="Using buckets makes the plot so clean." title="Using buckets makes the plot so clean." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/effavgonnoughtvavg.png"&gt;&lt;br /&gt;&lt;br /&gt;Note that the problem of nought not-out innings is particularly acute for the first data point, which is full of people who batted at number 11.  These innings make it look like the batsmen were better at getting off the mark than they really were, thus improving their apparent effective average.  The regression line has been forced through the origin, both because logically it should do so, and so that the problem of the nought not-outs is reduced.&lt;br /&gt;&lt;br /&gt;By a wonderful quirk, the effective average on zero is (on average) one third of the overall average.  This makes the algebra relatively easy (details left as an exercise): a batsman's expected fraction of ducks is 3/(3 + avg).&lt;br /&gt;&lt;br /&gt;What I then did was, for each individual batsman, calculate the number of binomial standard deviations his actual number of ducks was from his expected number of ducks.&lt;br /&gt;&lt;br /&gt;As an example, consider Shane Warne.  Average 17.65, so expected duck fraction 3/(3 + 17.65) = 0.145.  He played 194 innings, which gives an expected number of ducks of 28.18.  Warne actually made 34 ducks.  A standard deviation for a binomial random variable is sqrt[N*p*(1-p)] = sqrt(194*0.145*0.855) = 4.9.  Warne's number of ducks is therefore (34 - 28.18) / 4.9 = 1.2 standard deviations above expected.&lt;br /&gt;&lt;br /&gt;If getting off the mark is a particular skill that some players are better at than others, independent of their overall batting abilities, then the standard deviation of the standard deviations should be greater than 1.  If the only two factors going into the number of ducks are the overall batting average and random luck, then the sd of sd's should be 1.&lt;br /&gt;&lt;br /&gt;The sd of sd's for all the batsmen who average more than 10 is 0.98, pretty close to 1.&lt;br /&gt;&lt;br /&gt;(The breakdown by bucket goes like this.  0-9.99: 1.16 (but remember the problem of nought not-outs).  10-19.99: 0.82.  20-29.99: 1.01.  30-39.99: 0.98.  40-49.99: 1.04.  50-59.99: 1.07.)&lt;br /&gt;&lt;br /&gt;By contrast, if you assume that there is no distribution of skill whatsoever in getting off the mark, and just assume that everyone (from Chris Martin to Sachin Tendulkar) gets off zero with equal probability (0.0917 in this sample), then the sd of sd's is 1.34, much greater than 1.&lt;br /&gt;&lt;br /&gt;So my conclusion is that if someone seems to make an unusually large number of ducks, then he's almost certainly just unlucky.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Mathematical aside: Usually when I need to model the distribution of a batsman's scores, I use the geometric or exponential distribution.  One level more advanced than this would be to have the hazard function take on a particular value at zero, and then a constant for scores greater than or equal to 1.&lt;br /&gt;&lt;br /&gt;Using the above result, such a hazard function is this:&lt;br /&gt;&lt;br /&gt;H(0) = 3/(avg + 3), H(n) = 1/(avg + 3) for n &amp;gt; 0.&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2939807593304853930?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2939807593304853930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2939807593304853930' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2939807593304853930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2939807593304853930'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2010/12/are-some-batsmen-nervous-starters.html' title='Are some batsmen nervous starters?'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_effavgonnoughtvavg.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6045662724633134768</id><published>2010-07-31T04:42:00.002+02:00</published><updated>2010-07-31T04:46:11.716+02:00</updated><title type='text'>My database</title><content type='html'>&lt;a href="http://sites.google.com/site/cricketdatabase/"&gt;Tests&lt;/a&gt;, &lt;a href="http://sites.google.com/site/cricketdatabase/first-class"&gt;first-class matches&lt;/a&gt;.  Have at it.&lt;br /&gt;&lt;br /&gt;Post any questions you have in comments, or by email.  I have no idea how easy it is for someone else to make sense of what I've done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6045662724633134768?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6045662724633134768/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6045662724633134768' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6045662724633134768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6045662724633134768'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2010/07/my-database.html' title='My database'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8572595080205393569</id><published>2010-07-18T11:44:00.002+02:00</published><updated>2010-07-18T12:12:35.687+02:00</updated><title type='text'>The co-efficient of variation</title><content type='html'>Gabriel Rogers &lt;a href="http://blogs.cricinfo.com/itfigures/archives/2010/06/achieving_the_right_consistenc.php"&gt;debuted at It Figures&lt;/a&gt; with a post on batsmen's consistency.  The main tool he used was the co-efficient of variation &amp;ndash; the standard deviation divided by the mean.  In general I think this is OK, but there is a problem with including players with short careers in the analysis.&lt;br /&gt;&lt;br /&gt;The problem is that shorter careers might tend to have lower CV's.  (I haven't checked this empirically.)  To show this I'll play with exponential random variables.  The distribution of a batsman's scores is reasonably close to an exponential distribution, so the results below should apply to real batsmen.&lt;br /&gt;&lt;br /&gt;I generated 10000 "careers" of 2 innings, 10000 careers of 3 innings, 10000 careers of 4 innings, and so on.  For each career length, I calculated the average CV.  This is a graph of the results:&lt;br /&gt;&lt;br /&gt;&lt;img alt="I wonder if I'm even on anyone's feed readers anymore." title="I wonder if I'm even on anyone's feed readers anymore." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/cvsamplesize.png"&gt;&lt;br /&gt;&lt;br /&gt;(I've used the "N-1" version of the standard deviation here.)&lt;br /&gt;&lt;br /&gt;The theoretical CV for an exponential distribution is 1 (the standard deviation equals the mean; for real cricketers the typical CV is about 1.05, because the distribution is skewed by lots of ducks and low scores, and occasional very big scores), and you can see that for moderately large careers, this is true &amp;ndash; the average CV for a 50-innings career is about 0.98.  But for short careers the CV's are noticeably less than 1.  For a two-innings career, I think the expectation of the CV is 1/sqrt(2).  &lt;br /&gt;&lt;br /&gt;My guess is that, if this effect carries over to real cricketers, then the trend shown in Figure 1 of the linked blog post is actually stronger than it looks &amp;ndash; batsmen with shorter careers tend to be worse and have lower averages, so there'll be disproportionately many dots in the lower-left part of the scatterplot.&lt;br /&gt;&lt;br /&gt;Of course I could check this myself, but I am pretty lazy with stats these days, as evidenced by the very long break in posting here!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8572595080205393569?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8572595080205393569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8572595080205393569' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8572595080205393569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8572595080205393569'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2010/07/co-efficient-of-variation.html' title='The co-efficient of variation'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_cvsamplesize.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7461465504954171323</id><published>2009-03-22T00:51:00.002+01:00</published><updated>2009-03-22T00:58:31.862+01:00</updated><title type='text'>New cricket stats analysis blog</title><content type='html'>While I'm being lazy and not doing any real updates (I am getting closer to writing all the readme files etc. for my database), you can get your stats fix from &lt;a href="http://cricketanalysis.com/"&gt;this blog&lt;/a&gt;, which has been around for a few months and is generally very good.  It is likely that some of the new ideas will need to be improved over time, but they are very valuable first step, especially on declaration stategies (on this, Elliot Tonkes has done some academic work on this, but I don't know if it's been published) and wins above replacement for ODI's.&lt;br /&gt;&lt;br /&gt;Do check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7461465504954171323?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7461465504954171323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7461465504954171323' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7461465504954171323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7461465504954171323'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2009/03/new-cricket-stats-analysis-blog.html' title='New cricket stats analysis blog'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1414671042952643646</id><published>2009-03-15T11:33:00.002+01:00</published><updated>2009-03-15T11:48:37.032+01:00</updated><title type='text'>Batted ball speed</title><content type='html'>Hey look, I have this thing called a blog, maybe I should update it.&lt;br /&gt;&lt;br /&gt;I should update my profile &amp;ndash; I'm no longer a student (sort of, my Masters thesis is under assessment), having got out of my PhD and started a 9-5 job, which is in geostatistics, if anyone is wondering.&lt;br /&gt;&lt;br /&gt;I've been very slack with cricket stats lately, with work and chess taking up more of my time.  Anyway, &lt;a href="http://www.insidethebook.com/ee/index.php/site/comments/everything_you_wanted_to_know_about_the_science_of_the_bat_ball_collision/"&gt;this&lt;/a&gt; is a long and interesting discussion about a baseball hitting a baseball bat.  The very counter-intuitive result is that the batted ball speed doesn't depend on the grip, as long as the ball hits the "meat" of the bat.  The same is true for cricket (&lt;a href="http://www.physics.usyd.edu.au/~cross/cricket.html"&gt;this page&lt;/a&gt; gives a few physicsy aspects of cricket).  All that matters for batted ball speed is the speed of the bat at impact.&lt;br /&gt;&lt;br /&gt;On a different topic, I have been thinking about putting my cricket database online.  It wouldn't be pretty, and any professional coders out there may be horrified at my code, but there are enough of you out there with good ideas that I think it would be worthwhile.  In the long term, I would like there to be a cricket equivalent of &lt;a href="http://www.retrosheet.org/"&gt;Retrosheet&lt;/a&gt;, which now has over 50 years' worth of play-by-play data for Major League Baseball.&lt;br /&gt;&lt;br /&gt;I'm sort of thinking out loud on this at the moment.  My database uses CricketArchive player and match ID's, so if they changed their numbering system (which they did a few years ago), that'd screw things up.  I might exchange emails with them and see what comes out of it.  &lt;br /&gt;&lt;br /&gt;Feel free to share your thoughts on this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1414671042952643646?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1414671042952643646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1414671042952643646' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1414671042952643646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1414671042952643646'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2009/03/batted-ball-speed.html' title='Batted ball speed'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2375121540561772486</id><published>2009-01-04T02:38:00.003+01:00</published><updated>2009-01-04T02:41:40.452+01:00</updated><title type='text'>Form</title><content type='html'>Russ in a post &lt;a href="http://deggles.csoft.net/post.php?postid=1366"&gt;here&lt;/a&gt; suggests a way of measuring form (go and read that post, it's full of goodies).  The broad question is, if there are two batsmen with the same true talent, and one has better recent form, will that batsman tend to average more in his next innings?&lt;br /&gt;&lt;br /&gt;To start with, Russ weights more recent innings more heavily than earlier innings.  Specifically, the k-th innings for a batsmen with N innings in his career is weighted 0.95&lt;sup&gt;N-k&lt;/sup&gt;.  (This is, I believe, similar to what the ICC rankings do.)&lt;br /&gt;&lt;br /&gt;Now, one problem with assessing "form" statistically is that a batsman will usually play a series against one team, followed by a series against another, etc.  Since one of those two opposition teams can have a much stronger bowling attack than the other, what may appear to be good form and bad form may simply be a result of playing against weak bowlers and then strong bowlers.  So, for everything I do in this post, I'll adjust the batting averages by the quality of the attack, as explained &lt;a href="http://pappubahry.blogspot.com/2007/12/modified-batting-averages.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So, when I talk about a regular average, I really mean an adjusted average.  When I talk about a &lt;i&gt;weighted&lt;/i&gt; average, I'll mean Russ's weighting by how recent the innings was (each innings also being adjusted for the quality of the attack).&lt;br /&gt;&lt;br /&gt;Before continuing about form, I'll just look at the weighted average as a predictive tool.  For all batsmen with at least 50 innings, I calculated career-to-date averages and weighted averages, as well as a 10-innings moving average.  Then, from the 11th innings of each batsman's career, I calculated the absolute difference between his next innings and each of those three measures.  (If the innings was a not-out, I used the not-out score as the absolute difference for each measure.)  Then I averaged these errors.  I did the same for all batsmen, and then found the "average average" error.  The regular average was the best predictor, about 1% better than the weighted average, and 4% better than the moving average.  The weighted average becomes more accurate if the 0.95 in the formula is increased towards 1, but it is always worse than the regular average.&lt;br /&gt;&lt;br /&gt;So, as a measure of true talent of a batsman, I'll use the regular average rather than the weighted average.&lt;br /&gt;&lt;br /&gt;Now to the question of defining form.  Russ does this by taking a weighted log average.  Defining R&lt;sub&gt;i&lt;/sub&gt; as the runs scored in the i-th innings, and w&lt;sub&gt;i&lt;/sub&gt; as the weight of that innings, this weighted log average is:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;     / SUM w&lt;sub&gt;i&lt;/sub&gt; log(R&lt;sub&gt;i&lt;/sub&gt;) \&lt;br /&gt;exp |  -------------  |&lt;br /&gt;     \     SUM w&lt;sub&gt;i&lt;/sub&gt;    / &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I've actually modified this a bit.  If the i-th innings is a not-out, I didn't include it in the sum in the denominator.  I hope this isn't too great a crime against statistics.&lt;br /&gt;&lt;br /&gt;The measure of form is then the ratio of the weighted log average to the weighted average.  Now, if scores are distributed exponentially, then this ratio is about 0.56 (well, it is with equally weighted innings at least).  If a batsman makes the same score every innings (and gets out!), the ratio is 1.  If a batsman recently has one big score and a bunch of little scores, the ratio is down towards 0.3.  So, good form is a high ratio, bad form is a low ratio.&lt;br /&gt;&lt;br /&gt;Because I exclude not-outs in the denominator, it's possible to get ratios greater than 1.  I'm not really sure how to interpret these, but let's carry on anyway.&lt;br /&gt;&lt;br /&gt;Russ's hypotheses are (I hope I've got this right):&lt;br /&gt;&lt;br /&gt;a) If there are two batsmen with a similar average, one with a typical ratio and one with a low ratio, then the one with the low ratio will tend to average more in his next innings.  The logic here is that the batsman with the low ratio is capable of larger scores, whereas the other batsman is just not so good.&lt;br /&gt;&lt;br /&gt;b) Given two batsmen with the same average, one with a high ratio will tend to do better in his next innings than one with a typical ratio.&lt;br /&gt;&lt;br /&gt;Both of these are correct, somewhat to my surprise.  I went through all batsmen, and for each innings (after the tenth in their career), calculated the career-to-date average, and the ratio-to-date, and binned them as in the table below.  I then calculated the overall average for each bin.&lt;br /&gt;&lt;br /&gt;Ratios are down the left-hand side, averages across the top.  The figures are the low end of the bin.  So, e.g., the '5' means that the bin is for averages 5 to 9.99, the '10' is for averages 10-14.99, etc.  Only bins with at least 50 innings are shown; bold is used for at least 100 innings.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;r/a   5     10    15    20    25    30    35    40    45    50    55&lt;br /&gt;0.35              &lt;b&gt;23.4  28.4  28.7  40.8&lt;/b&gt;  41.3  &lt;br /&gt;0.40  &lt;b&gt;12.6  18.8  21.4  31.7  30.5  34.9  40.3  40.7&lt;/b&gt;  55.3  &lt;br /&gt;0.45  7.4   &lt;b&gt;12.1  15.9  20.9  26.7  30.1  34.0  46.9  39.9  50.2&lt;/b&gt;&lt;br /&gt;0.50  &lt;b&gt;9.6   11.1  17.6  22.2  26.4  31.7  36.3  34.9  43.5  46.0&lt;/b&gt;  &lt;br /&gt;0.55  &lt;b&gt;8.4   11.9  16.5  21.0  26.1  31.6  35.2  40.1  42.6  53.3&lt;/b&gt;  &lt;br /&gt;0.60  &lt;b&gt;7.6   12.2  18.3  24.4  27.9  33.5  38.2  42.7  46.6&lt;/b&gt;  58.0  40.4&lt;br /&gt;0.65  &lt;b&gt;7.8   12.3  18.1  24.7  28.0  33.4  39.2  43.3  48.7  48.1&lt;/b&gt;  46.6&lt;br /&gt;0.70  &lt;b&gt;8.2   12.4  19.3  25.7  27.5  34.4  40.4  46.1  44.0&lt;/b&gt;  56.7  &lt;br /&gt;0.75  &lt;b&gt;9.3   15.3  17.4  23.1  29.3  38.9  41.0  47.4  51.8  50.1&lt;/b&gt;  &lt;br /&gt;0.80  9.8   &lt;b&gt;12.0  17.7&lt;/b&gt;  24.4  &lt;b&gt;30.5  35.1  48.2  54.2  51.8&lt;/b&gt;  58.9  &lt;br /&gt;0.85        15.0  22.1  25.8  38.8  40.5  44.6  &lt;b&gt;51.3&lt;/b&gt;  61.9  46.8  &lt;br /&gt;0.90        15.6  26.7        42.1        42.6  59.2  54.1  &lt;br /&gt;0.95                                            46.5&lt;br /&gt;1.00        16.6        35.9  32.5  44.7  53.9  &lt;b&gt;68.5&lt;/b&gt;  73.2  &lt;b&gt;62.9&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;When the ratio is very low, the batsman does indeed tend to average much more in his next innings.  (Since I've used regular averages to define the true talent, the top row may be full of players early in their career.  I'm not sure.)  Going down each column, the minimum is usually somewhere around 0.5 to 0.6, which seems to correspond to the 0.56 that you'd expect from the exponential distribution.  &lt;br /&gt;&lt;br /&gt;Really good recent form seems to give a 20% boost and sometimes more.  This is a lot more than I had expected.&lt;br /&gt;&lt;br /&gt;(My thinking on this issue seems to have been confused &amp;mdash; in my last post I said that Johnson was good because he kept getting starts, which is consistent with this analysis.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2375121540561772486?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2375121540561772486/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2375121540561772486' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2375121540561772486'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2375121540561772486'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2009/01/form.html' title='Form'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6967198251417669790</id><published>2008-12-20T04:32:00.002+01:00</published><updated>2008-12-20T04:37:47.965+01:00</updated><title type='text'>Johnson's batting</title><content type='html'>I've only seen Johnson bat once (at the Gabba against NZ), and he looked very good.  Looking through his Test scores to date, he's only been dismissed in single figures in a quarter of his innings.  If you assume an exponential distribution of scores, that is consistent with an average of about 35.  Currently he's averaging low 20's.&lt;br /&gt;&lt;br /&gt;So, there are two main possibilities &amp;mdash; he's been lucky to make so many starts, or he's been unlucky to get out for 20odd so often.  Batting at number 10 probably isn't helping him.&lt;br /&gt;&lt;br /&gt;I'd like to see him moved up to number 8.  At worst, he'll be about as good as Lee.  At best, I think he could be a very good bowling all-rounder.&lt;br /&gt;&lt;br /&gt;Sorry for the lack of posting lately.  I'm trying to finish off my Masters thesis (transferred down to PhD), I will start a proper job in February, and I'm spending much of my spare time playing and studying chess.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6967198251417669790?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6967198251417669790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6967198251417669790' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6967198251417669790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6967198251417669790'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/12/johnsons-batting.html' title='Johnson&apos;s batting'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7517885146541251682</id><published>2008-11-15T00:08:00.002+01:00</published><updated>2008-11-15T00:21:07.522+01:00</updated><title type='text'>Can Dan Cullen bowl a doosra?</title><content type='html'>I ask because I saw him bowl balls spinning from leg to off during the warmups last night (the warmups was the only time I was in line with the bowler).  Some of them I picked as leg breaks, but others I didn't, making me wonder if he can bowl a doosra now.  It's equally possible that I wasn't paying close enough attention to his hand.&lt;br /&gt;&lt;br /&gt;Other notes from last night's game:&lt;br /&gt;&lt;br /&gt;- What would be a good name for a team which includes Michael Dighton, Dan Marsh, and Ryan Harris?  I know, &lt;i&gt;All Stars&lt;/i&gt;.  &lt;br /&gt;&lt;br /&gt;- Brendan Drew was bowling nice little outswingers in the warmups.&lt;br /&gt;&lt;br /&gt;- When Xavier Doherty was fielding at long off a few metres from me, I resisted the temptation to tell him that he's got the worst first-class average of any bowler ever.  It's only just &lt;a href="http://stats.cricinfo.com/ci/content/records/283308.html"&gt;false&lt;/a&gt;.  With the qualification of 5000 balls, the only people with worse averages are a guy who only played for Oxford University, Sachin Tendulkar, and Stuart Saunders, who at least averaged 25 with the bat for Tassie.  But Doherty's pretty good at limited-overs cricket.  Weird.&lt;br /&gt;&lt;br /&gt;- When Magoffin (I like Magoffin &amp;mdash; he was the only guy who stuck in my memory from the Queensland Academy side I saw play India in 2003/4) was bowling to Sean Marsh, Gilchrist set a 7-2 field.  It was a disgustingly negative tactic.  Captains have a responsibility to make sure that the game is entertaining for the spectators.  It's because of people like Gilchrist that people are turning away from Twenty20 cricket.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7517885146541251682?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7517885146541251682/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7517885146541251682' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7517885146541251682'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7517885146541251682'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/11/can-dan-cullen-bowl-doosra.html' title='Can Dan Cullen bowl a doosra?'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6622126925794416241</id><published>2008-10-21T02:16:00.002+02:00</published><updated>2008-10-21T02:23:25.320+02:00</updated><title type='text'>Keepers</title><content type='html'>I've got what I think is an &lt;a href="http://blogs.cricinfo.com/itfigures/archives/2008/10/analysing_wicketkeepers_by_bye.php"&gt;interesting post&lt;/a&gt; up at it Figures.  It ranks wicket-keepers by the rate at which they let byes through, adjusted by country.  It doesn't work for keepers who kept up to the stumps to fast bowling a lot (i.e., keepers from the olden days), but I'm happy with how well it works for modern keepers.  &lt;a href="http://content-aus.cricinfo.com/ci/content/story/374726.html"&gt;Here&lt;/a&gt; is the full list (qualification: 20 Tests) for those who want to just see the results.&lt;br /&gt;&lt;br /&gt;I must admit I hadn't heard of some of those guys near the top.  Kirmani rang a bell, but Tamhane was new to me.  I was happy to see in his Cricinfo profile that Wally Grout compared Tamhane to Don Tallon.  It's unfortunate that Tallon (and Grout, and the rest) kept up to the stumps so often.  It makes cross-era keeper comparisons difficult.  (You can't just go by the prevailing rate of byes in world cricket at the time, otherwise modern keeper-batsmen would come up as the equal of Knott, Taylor, etc.).&lt;br /&gt;&lt;br /&gt;Ideally, we'd be able to take this one step further and get a well-founded measure of a keeper-batsman.  But the main difference between a good keeper and a bad one is the number of dismissals effected, and it would be close to impossible to get accurate estimates on, e.g., how many dismissals Knott would have had if he'd kept for Pakistan in the 1990's.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6622126925794416241?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6622126925794416241/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6622126925794416241' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6622126925794416241'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6622126925794416241'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/10/keepers.html' title='Keepers'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5491435886532333954</id><published>2008-10-08T12:26:00.000+02:00</published><updated>2008-10-08T12:47:49.794+02:00</updated><title type='text'>Free hits</title><content type='html'>We're on the eve of what should be another fantastic battle of Test cricket, but this evening I finally got around to a simple study of the effect of the free hit rule in the IPL.  The IPL's the only ball-by-ball database I have, otherwise I'd see if the results are the same in ODI's as well.&lt;br /&gt;&lt;br /&gt;Anyway, I went through the tournament and found the run rate on the ball immediately following a no-ball (there might be some non-front-foot no-balls in there, but whatever...).  The result was 140 runs off 82 balls, a run rate of 10.2 per over.&lt;br /&gt;&lt;br /&gt;To work out how many runs you'd expect to have been scored if it wasn't a free hit, I used the overall run rates by over (actually I did a bit of smoothing first, fitting quadratic trends to the first six overs and remaining 14 overs).  The graph looks like this:&lt;br /&gt;&lt;br /&gt;&lt;img alt="Feel free to suggest a caption." title="Feel free to suggest a caption." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/iplrrbyover.png"&gt;&lt;br /&gt;&lt;br /&gt;So, e.g., if there was a free hit in the 10th over, you'd assume that if it was a normal ball, 7.0/6 = 1.17 runs would have been scored off that ball.  Doing this for each of the free hits, you end up with an expected score of 111 runs coming off those free hits, a run rate of 8.1 per over.  So the batsmen are scoring more than usual off the free hits, but not by a lot.  An extra third of a run per free hit, on average.&lt;br /&gt;&lt;br /&gt;It is a nice check that the average run rate in the 20th over (when you're batting like they're all free hits anyway) was 10.1.&lt;br /&gt;&lt;br /&gt;Presumably free hits in ODI's also go at about 10/over (or 1.7 per ball), so proportionally speaking, the punishment is greater in the 50-over form of the game.&lt;br /&gt;&lt;br /&gt;That graph above is pretty interesting.  There's a bit of noise, but the end of the fielding restrictions is very clear.  It's interesting that the acceleration is gradual.&lt;br /&gt;&lt;br /&gt;The batsmen are circumspect (relatively speaking) in the first over, and the effect certainly muhc larger than just the first over being just bowled by the best bowler (no-one had an economy rate as low as 5.2 in the IPL).  Perhaps it is worth opening with your "fifth" bowler.  I don't know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5491435886532333954?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5491435886532333954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5491435886532333954' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5491435886532333954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5491435886532333954'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/10/free-hits.html' title='Free hits'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_iplrrbyover.png' height='72' width='72'/><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5604765770063624578</id><published>2008-09-28T11:10:00.000+02:00</published><updated>2008-09-28T11:50:17.893+02:00</updated><title type='text'>Queensland v Kolkata</title><content type='html'>In a post some time back, &lt;a href="http://nestaquin.wordpress.com/"&gt;Nestaquin&lt;/a&gt; told me that the Kolkata Knight Riders would be playing some games in Brisbane, and that I should go and report on what I see.  Today they played the first of three double-headers against Queensland, and this gives me an easy way to end the neglect of this blog.  The games were played at Allan Border Field, which is a lovely ground which doesn't have enough matches to warrant over-corporatisation &amp;mdash; there's still a white picket fence, a hill, and trees.&lt;br /&gt;&lt;br /&gt;The first Kolkata player I saw was Ajit Agarkar (at least I'm 90% sure it was him &amp;mdash; I only saw him for a second or two), who was going for a jog as I was walking to the ground.  But unfortunately he didn't play, as the coach John Buchanan brought his squad here to give the young guys some experience.&lt;br /&gt;&lt;br /&gt;Boy do they need it.  They batted first in the morning game.  I had a pen and notepad, and decided to keep a tally of balls where the batsman was beaten (i.e., played and missed, miscued, hit on pad).  I don't know of a good word to describe this, so I'll borrow the French football term &lt;i&gt;occasion&lt;/i&gt; (which in football means a scoring opportunity). In 120 balls plus a few wides, there were 58 of these &lt;i&gt;occasions&lt;/i&gt;, which makes one every other ball.  The batsmen, mostly sub-first-class standard, were completely unable to cope with the pace and bounce of the quick bowlers.  Then Chris Simpson came on to bowl some offies, and they weren't all that good against him either.&lt;br /&gt;&lt;br /&gt;Sharma (not sure of his first name) gave up 12 &lt;i&gt;occas&lt;/i&gt; on his way to 5 runs, and Patel gave up 10 on his way to 3 runs.&lt;br /&gt;&lt;br /&gt;They crawled their way to 8/79 from their 20 overs.  The only three batsmen to make double-figures were Michael Buchanan (John's son; he's been a fringe player for Queensland) with 16, their other opener Prashantha (I think that was his name &amp;mdash; I didn't see a match programme, haven't seen a scorecard, and had to go by the ground announcer), who was steady-ish but slow and who scored 12 before getting out in the 8th over, and Adam Hollioake.  I didn't even know he was still playing, but Cricinfo tells me that while he retired from first-class cricket in 2004, he kept playing T20's till 2007.  Anyway, he looked pretty overweight, but he looked a class above most of his teammates.&lt;br /&gt;&lt;br /&gt;There really wasn't much positive to take out of the batting.  Even with only four fielders inside the circle, they were incapable of nudging singles.  Symonds bowled two consecutive maidens with the field set deep!  &lt;br /&gt;&lt;br /&gt;I couldn't be bothered taking notes after that.&lt;br /&gt;&lt;br /&gt;Their bowlers in the morning game were no better than the batsmen.  Too short, too full, too much width, ....  They might have been a bit unlucky, in the sense that every time the batsmen swung at the ball it found the middle of the bat, but it was still carnage.  The Queensland openers whacked 80 runs in about 7 overs, and then they decided to keep playing, with a revised target of 180.  They made it with about four overs to spare, with seven wickets in hand.  Symonds failed, scoring just 2, skying a catch to long on with his first real swing of the bat.&lt;br /&gt;&lt;br /&gt;I stuck around for the afternoon game, and I'm glad I did.  Queensland batted first (shuffling their batting order), and this time the KKR bowlers stuck to a much better length.  Short balls got to the head rather than the lower chest.  And they had more luck, as batsmen started miscuing.  Chaudry (I think) took three wickets.  Queensland kept batting aggressively but lost wickets.  Siddarth Kaul was one of the bowlers.  Iqbal Abdullah bowled some left-arm orthodox darters and looked OK.  I can't remember the others.  They brought on a legspinner who struggled to find his length for a while, which meant that most of his spell was bad &amp;mdash; when you only get four overs, there's not much time to get your rhythm.  Queensland got to 8/96 after 12 and finished all out 129.&lt;br /&gt;&lt;br /&gt;Michael Buchanan led the chase with 49 from 31.  The batsmen at the other end were not so good, and when Buchanan was dismissed they were 4/80ish.  They looked ready to collapse, but the lower-order kept hanging on, Hollioake again making double figures.  I wasn't taking notes or tallying, but my impression was that they were much more solid, making contact with the ball and middling it a lot more often.&lt;br /&gt;&lt;br /&gt;They needed 29 off the last two, and Daniel Doran bowled the nineteenth over (I think it was the nineteenth...).  Doran must have good figures in grade cricket, because he keeps getting picked for Queensland, despite averaging over 70 with the ball in the last two years.  Anyway, one of the KKR tailenders hit him for 16 runs in three balls (all straight hits I think), and they then needed 11 off the last over with one wicket in hand.&lt;br /&gt;&lt;br /&gt;Grant Sullivan bowled it &amp;mdash; he had looked sharp but lacked control and Buchanan had hit him around earlier.  His first ball was short and gave too much width, and Kaul managed to get some bat on it and got a boundary to third man.  7 off 5.  Dot ball.  Third ball was good length, Kaul hits it over long off for 6, and momentarily thinks that he's won the match, but then realises that scores are tied.  Field comes in.  Dot ball.  Dot ball.  Last ball, defended to Andrew Symonds, they run, Symonds throws and misses, Kolkata win by one wicket.&lt;br /&gt;&lt;br /&gt;The KKR guys were all happy with a win.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5604765770063624578?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5604765770063624578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5604765770063624578' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5604765770063624578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5604765770063624578'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/09/queensland-v-kolkata.html' title='Queensland v Kolkata'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2467538748230483070</id><published>2008-09-07T11:14:00.001+02:00</published><updated>2008-09-07T11:14:44.717+02:00</updated><title type='text'>Simulating one-day cricket and batting orders</title><content type='html'>I spent some of today reading through &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;_udi=B6VC5-4DW8VR6-4&amp;_user=331728&amp;_rdoc=1&amp;_fmt=&amp;_orig=search&amp;_sort=d&amp;view=c&amp;_version=1&amp;_urlVersion=0&amp;_userid=331728&amp;md5=0237a750feaba9f9d4b9332c685c3543"&gt;this paper&lt;/a&gt; (you'll probably need a University subscription to read that) by Swartz et al.  It's called 'Optimal batting orders in one-day cricket', and it is useful because it gives a way of simulating one-day innings.&lt;br /&gt;&lt;br /&gt;(The paper itself looks at the Indian batting order in the 2003 World Cup.  The best they came up with went Dravid, Tendulkar, Ganguly, Sehwag, Mongia, Y Singh, Khan, Kaif, H Singh, Agarkar, Srinath.  Their second-best lineup swapped Dravid and Ganguly, sent Kaif to 7.  They reckon it would have done better by about 6 runs, on average, than their actual lineup for the World Cup final.  Not many matches are won and lost by less than six runs, but I suppose you want to squeeze out every run you can.  It's interesting that the simulations reckoned that Kaif was best left to come in and slog at the death.  A full run of their simulations takes a long time &amp;mdash; there are a &lt;i&gt;lot&lt;/i&gt; of batting lineups to go through, even when you do clever tricks and make the search much smaller.  But they say that it would be much quicker if you had only a limited number of options, such as during a match when you've lost a couple of wickets.  Using a computer to find the optimal batting order based on the situation of the game meshes well with &lt;a href="http://www.thewisdencricketer.com/blog/?p=157"&gt;Rob Smyth's belief&lt;/a&gt; that batting orders in one-day should be fluid.)&lt;br /&gt;&lt;br /&gt;The way they do it is to work out 'baseline' characteristics for each batsman in the team.  That is, they get the probability that a batsman will play a dot ball, score a single, a 2, a 3, a 4, a 6, or get out.  But they don't just take their overall career numbers, they take into account the match situation when they batted.&lt;br /&gt;&lt;br /&gt;So, given the number of wickets fallen w, balls bowled b, Duckworth-Lewis percentage resources used R(w,b), what they actually did was fit parameters to a loglinear model that looks like this (the subscript k denotes what happened on the ball, so k = 0 is a dot, etc.; the subscript j refers to the jth batsman):&lt;br /&gt;&lt;br /&gt;log(q&lt;sub&gt;jwbk&lt;/sub&gt;) = &amp;mu;&lt;sub&gt;jk&lt;/sub&gt; + &amp;alpha;&lt;sub&gt;k&lt;/sub&gt;*w/9 + &amp;beta;&lt;sub&gt;k&lt;/sub&gt;*b/299 + &amp;theta;&lt;sub&gt;k&lt;/sub&gt;*R(w,b)/100.&lt;br /&gt;&lt;br /&gt;The &amp;mu;&lt;sub&gt;jk&lt;/sub&gt;'s give the baseline probabilities for each type of ball-result (dot, single, etc.) for each batsman at the start of the innings.  (Well, not directly probabilities &amp;mdash; the probabilities p&lt;sub&gt;jk&lt;/sub&gt; are given by p&lt;sub&gt;jk&lt;/sub&gt; = q&lt;sub&gt;jk&lt;/sub&gt; / &amp;Sigma&lt;sub&gt;k&lt;/sub&gt; q&lt;sub&gt;jk&lt;/sub&gt;.)  &lt;br /&gt;&lt;br /&gt;The other parameters (&amp;alpha;, &amp;beta;, &amp;theta;) describe how the probabilities change as the game situation changes.  It is assumed that all batsmen change in the same way.&lt;br /&gt;&lt;br /&gt;So that's all well and good.  You throw the paramaters into the computer, generate a bunch of random numbers and you end up with 50 overs of simulated cricket.  The results are pretty close to what real cricket scores are, at least at the team level.  I tried running the same algorithm and one of the openers scored a double-century on the second run, so it's probably not perfect, but on the average it seems to do a good job.  (I'm getting a slightly higher result for the team average &amp;mdash; 253 against 250 &amp;mdash; than the authors of the study did.  Some minor bug in my code somewhere, I guess.)&lt;br /&gt;&lt;br /&gt;Anyway, I think that this paper could be useful for me.  What I want to do is see how to properly assess batting average and strike rate.  So what I hope to do is get the relevant parameters for batsmen as a whole in the 2000's.  Unfortunately, I don't have ball-by-ball ODI data, so I'm going to have to estimate it somehow.  I've asked S Rajesh for the overall numbers (i.e., total dot balls, singles, 2's, etc.), and hopefully with some fiddling I'll get the weird loglinear parameters to match them.&lt;br /&gt;&lt;br /&gt;The &amp;alpha;, &amp;beta;, and &amp;theta; parameters I'll leave unchanged.  Hopefully India's batsmen from 1998 to 2003 (the period that the study looked at) are representative of how batsmen generally change over the course of an innings.&lt;br /&gt;&lt;br /&gt;Then, once I've got a good simulator of an average batting lineup against average bowling, I'll be able to vary the parameters of one of the batsmen, tweaking average and strike rate (indirectly &amp;mdash; I'll be tweaking probability of dismissal on each ball, and probability of each type of scoring shot).  Then you see what effect this has on the average team score.  So it'll be like &lt;a href="http://pappubahry.blogspot.com/2008/08/averages-and-strike-rates-in-odis.html"&gt;the post below&lt;/a&gt;, only accurate.  Hopefully.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2467538748230483070?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2467538748230483070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2467538748230483070' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2467538748230483070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2467538748230483070'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/09/simulating-one-day-cricket-and-batting.html' title='Simulating one-day cricket and batting orders'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4601346254045513266</id><published>2008-08-25T11:49:00.001+02:00</published><updated>2008-08-25T11:53:33.375+02:00</updated><title type='text'>Averages and strike rates in ODI's</title><content type='html'>Rating batsmen in one-day cricket is difficult because both average and strike rate are important, and it's not clear how they should be weighted.&lt;br /&gt;&lt;br /&gt;I can't see a theoretical solution (multiplying the two measures might be good enough, but it seems arbitrary), and I think that the actual answer will come from simulations.  In this post, I show some results from some woefully inaccurate simulations.  But hopefully even though the total scores were below what they should be, the equivalences of various averages and strike rates should be reasonably accurate.&lt;br /&gt;&lt;br /&gt;Here's what I did.  I took the overall average and strike rate since 2000 for each batting position (I think using the top eight teams).  I ran a largish number (20000) simulations to get what the "average" total score is, and it turned out to be 208 (I told you it was inaccurate...).&lt;br /&gt;&lt;br /&gt;Then, I replaced one of the openers with a batsman with an average of 1, strike rate 50, then average 2, strike rate 50, average 3, strike rate 50, and so on, doing 20000 simulations each time to get the average total score.  I did this until I had a grid of average total scores for strike rates from 50 to 130, and averages from 1 to 60.  Then I made contour plots with curves of equal value.&lt;br /&gt;&lt;br /&gt;The simulations assumed a constant run rate and exponentially distributed scores.  Not realistic, but it was straightforward to do and avoided doing ball-by-ball simulations.&lt;br /&gt;&lt;br /&gt;There's a bit of noise in the results.  Here's the contour plot for openers:&lt;br /&gt;&lt;br /&gt;&lt;img title="I couldn't be bothered" alt="I couldn't be bothered" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/avgsrcurvesopener.png"&gt;&lt;br /&gt;&lt;br /&gt;The number 3:&lt;br /&gt;&lt;br /&gt;&lt;img title="getting the sizes" alt="getting the sizes" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/avgsrcurvesno3.png"&gt;&lt;br /&gt;&lt;br /&gt;The number 4:&lt;br /&gt;&lt;br /&gt;&lt;img title="the same each time." alt="the same each time." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/avgsrcurvesno4.png"&gt;&lt;br /&gt;&lt;br /&gt;Because I was exceedingly lazy, the contours in the separate plots may not correspond to the same total team scores.  But you will agree that the pictures are colourful.&lt;br /&gt;&lt;br /&gt;Now for some numbers.  In each of the following little tables, the rows are equivalent.  So, an opener with an average of 50 and a strike rate of 73 is worth the same as an opener with an average of 25 and a strike rate of 101.  According to the simulations, at least.&lt;br /&gt;&lt;br /&gt;Making an average score of 210, as an opener:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;avg  sr&lt;br /&gt;50   73&lt;br /&gt;45   74&lt;br /&gt;40   76&lt;br /&gt;35   80&lt;br /&gt;30   87&lt;br /&gt;25   101&lt;/pre&gt;&lt;br /&gt;210, number 3:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;avg  sr&lt;br /&gt;50   69&lt;br /&gt;45   71&lt;br /&gt;40   73&lt;br /&gt;35   76&lt;br /&gt;30   82&lt;br /&gt;25   96&lt;/pre&gt;&lt;br /&gt;210, number 4:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;avg  sr&lt;br /&gt;50   72&lt;br /&gt;45   75&lt;br /&gt;40   77&lt;br /&gt;35   82&lt;br /&gt;30   90&lt;br /&gt;25   110&lt;/pre&gt;&lt;br /&gt;I didn't get past number 4.  I'll do the rest tomorrow.&lt;br /&gt;&lt;br /&gt;I imagine that the curves would change with more accurate simulations, but this is at least a start.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4601346254045513266?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4601346254045513266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4601346254045513266' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4601346254045513266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4601346254045513266'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/08/averages-and-strike-rates-in-odis.html' title='Averages and strike rates in ODI&apos;s'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_avgsrcurvesopener.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4749942874993548803</id><published>2008-08-24T06:10:00.002+02:00</published><updated>2008-08-25T01:36:35.759+02:00</updated><title type='text'>Non-boundary strike rates</title><content type='html'>Near the end of the discussion &lt;a href="http://blogs.cricinfo.com/itfigures/archives/2008/08/the_odi_bowling_average_is.php"&gt;here&lt;/a&gt;, there's a comment from me about the changing nature of the way runs are scored in ODI cricket.  Most of it, of course, is coming from boundaries, which are much more common today.  But it is interesting that there's been no real changes in the rate of non-boundary scoring since 1990.&lt;br /&gt;&lt;br /&gt;Here's a graph showing the yearly overall "non-boundary strike rate", that is the runs that are actually run, divided by the number of balls not hit to the boundary (times 100).  Top eight sides only.  (There's some missing boundary data, especially before 1990.  The actual non-boundary strike rates for these years are lower than those in the graph.)&lt;br /&gt;&lt;br /&gt;&lt;img alt="Nice little up-down pattern from 1973 to 1983" title="Nice little up-down pattern from 1973 to 1983" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/runsr.png"&gt;&lt;br /&gt;&lt;br /&gt;(I've called it "run sr" in the graph, run as in running.)&lt;br /&gt;&lt;br /&gt;I would have thought that batsmen today are more adept at "milking" bowlers, but the non-boundary strike rate has never got above 3 runs per over for any long period.&lt;br /&gt;&lt;br /&gt;If you suppose that there are three types of balls:&lt;br /&gt;&lt;br /&gt;- good balls that can't be scored off&lt;br /&gt;- OK balls that can be worked around&lt;br /&gt;- bad balls that can be hit to the boundary&lt;br /&gt;&lt;br /&gt;then &lt;s&gt;it seems that batsmen these days are able to hit the OK balls to the boundary more often, but can't do much about the good balls&lt;/s&gt;.  (&lt;b&gt;Edit&lt;/b&gt;: No, wait, that's not right.  They're getting better at milking the good balls at the same rate as they're getting better at hitting the OK balls for four.  Roughly.  The constant non-boundary strike rate with an increased frequency of boundaries means that the percentage of dot balls is getting lower.)&lt;br /&gt;&lt;br /&gt;At a team level (now including all teams, since 2000, but I've forgotten what I did with Kenya, probably I discarded them because they play against minnows too often):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;team          sr    run sr&lt;br /&gt;Australia     83.6  50.6&lt;br /&gt;Sri Lanka     75.5  47.4&lt;br /&gt;South Africa  78.3  47.0&lt;br /&gt;Pakistan      77.3  46.7&lt;br /&gt;India         78.8  45.7&lt;br /&gt;England       74.4  45.4&lt;br /&gt;New Zealand   75.2  44.7&lt;br /&gt;West Indies   74.2  43.9&lt;br /&gt;Zimbabwe      67.0  43.3&lt;br /&gt;Bangladesh    63.4  38.6&lt;/pre&gt;&lt;br /&gt;Bangladesh are really bad at working the ball around for singles, etc.  There's a clear gap between Australia and Sri Lanka, then a gradual progression down through to Zimbabwe.  Then there's a huge dropoff to Bangladesh.&lt;br /&gt;&lt;br /&gt;India are a bit of an anomaly, with their high overall strike rate coming more heavily from boundaries than the other teams.  It's probably not just a factor of their grounds &amp;mdash; opposition teams in India have the highest non-boundary strike rate of away teams anywhere.  &lt;br /&gt;&lt;br /&gt;Top eight sides since 2000, individuals, average at least 30, at least 1000 runs.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Player          mats  inns  runs  avg   sr    run sr  diff&lt;br /&gt;JN Rhodes       71    66    1994  41.5  86.1  59.7    26.3&lt;br /&gt;DS Lehmann      44    39    1219  42.0  78.8  56.0    22.8&lt;br /&gt;MEK Hussey      77    60    2079  54.7  85.6  55.3    30.4&lt;br /&gt;L Klusener      91    73    1592  33.9  87.6  54.9    32.7&lt;br /&gt;A Symonds       157   133   4300  40.2  94.1  54.1    40.1&lt;br /&gt;MJ Clarke       117   105   3486  42.5  81.0  53.9    27.2&lt;br /&gt;MS Dhoni        101   91    3064  44.4  89.0  53.6    35.5&lt;br /&gt;SR Waugh        46    38    1134  40.5  79.9  52.6    27.2&lt;br /&gt;PD Collingwood  124   114   2956  31.1  74.6  51.5    23.1&lt;br /&gt;RP Arnold       136   123   2984  32.8  72.1  51.5    20.6&lt;/pre&gt;&lt;br /&gt;Symonds and Dhoni are best known for their hitting, but they're good at the less flashy stuff as well.&lt;br /&gt;&lt;br /&gt;Unhonourable mention goes to Chris Gayle: overall strike rate of 80.3, non-boundary strike rate of 37.3, lowest of all the players who made the qualification.  Lazy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4749942874993548803?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4749942874993548803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4749942874993548803' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4749942874993548803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4749942874993548803'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/08/non-boundary-strike-rates.html' title='Non-boundary strike rates'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_runsr.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3236230762734506830</id><published>2008-08-13T11:56:00.002+02:00</published><updated>2008-08-13T11:57:31.574+02:00</updated><title type='text'>Bowlers as they get more experienced</title><content type='html'>There was a comment from Gary Naylor &lt;a href="http://www.thewisdencricketer.com/blog/?p=127#comments"&gt;here&lt;/a&gt; saying that Monty Panesar should improve on his average of 32 as his career goes on, since he'll learn more about how to bowl.&lt;br /&gt;&lt;br /&gt;I'm not convinced.  I took all spinners with 149 or more wickets since WWII, and found split their wickets into which Test it was in their careers, so I could find the overall average in debut Tests, second Tests, third Tests, etc.  &lt;br /&gt;&lt;br /&gt;To get rid of some noise I actually took a five-Test moving average, so the first data point in the graph below is the overall average in the spinners' first to fifth Tests, the next the average in their second to sixth Tests, etc.&lt;br /&gt;&lt;br /&gt;Also I weighted wickets by the average of the batsmen dismissed.&lt;br /&gt;&lt;br /&gt;&lt;img alt="The down-then-up shape is more dramatic if you do a ten-Test moving average." title="The down-then-up shape is more dramatic if you do a ten-Test moving average." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/experiencespinners.png"&gt;&lt;br /&gt;&lt;br /&gt;Note that there's a bit of a selection effect going on.  I'm only looking at spinners who were good enough to play enough Tests to take 149+ wickets.  Towards the right-hand end of the graph this is also a factor &amp;mdash; if you imagine it continuing further on out, you'd eventually just be plotting Warne, Murali, and Kumble.  There are, for those interested, 24 bowlers going into Tests 1-36, then 23 in Test 37, 22 in Tests 38-43, 20 in Test 44, 19 in Tests 45 and 46, and 18 in Tests 47-49.&lt;br /&gt;&lt;br /&gt;I don't know how much I want to read into the graph, though I'm happy in saying that spinners improve after their first ten Tests or so.  After that there may or may not be a trend &amp;mdash; batsmen working them out?  Certainly there's no strong evidence that Panesar will improve significantly (he's played 33 Tests), though of course it's possible.&lt;br /&gt;&lt;br /&gt;Here's the corresponding graph for pacemen:&lt;br /&gt;&lt;br /&gt;&lt;img alt="R^2 of 0.38 with a linear fit" title="R^2 of 0.38 with a linear fit" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/experiencepacemen.png"&gt;&lt;br /&gt;&lt;br /&gt;There are 46 bowlers going into all of those data points.  There's a downward trend &amp;mdash; pacemen tend to get better with experience, at least for a few dozen Tests.&lt;br /&gt;&lt;br /&gt;One thing to try in future is age rather than Test experience.  The people who've done this sort of analysis in baseball say that age is a better thing to use than Major League experience.  But of course baseball is not cricket, so I'm not sure what will come out of it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3236230762734506830?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3236230762734506830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3236230762734506830' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3236230762734506830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3236230762734506830'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/08/bowlers-as-they-get-more-experienced.html' title='Bowlers as they get more experienced'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_experiencespinners.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3427475797901507191</id><published>2008-08-06T13:46:00.000+02:00</published><updated>2008-08-06T13:47:49.459+02:00</updated><title type='text'>The modern lack of tour games</title><content type='html'>People often complain (myself included!) about the lack of tour matches these days.&lt;br /&gt;&lt;br /&gt;Here's a breakdown of home side victories by Test number in the series (excl. Bangladesh and Zimbabwe):&lt;br /&gt;&lt;br /&gt;1st: 0.40 +/- 0.02&lt;br /&gt;2nd: 0.38 +/- 0.02&lt;br /&gt;3rd: 0.38 +/- 0.02&lt;br /&gt;4th: 0.39 +/- 0.03&lt;br /&gt;5th: 0.39 +/- 0.04&lt;br /&gt;6th: 0.35 +/- 0.12&lt;br /&gt;&lt;br /&gt;That's over all Test history.  There might be an extra slight advantage for the home side in the first Test of a series, but it's within error bars.  But perhaps that small advantage is due to a bigger advantage that's come about recently?  Here are the numbers since 2000:&lt;br /&gt;&lt;br /&gt;1st: 0.42 +/- 0.05&lt;br /&gt;2nd: 0.46 +/- 0.05&lt;br /&gt;3rd: 0.50 +/- 0.06&lt;br /&gt;4th: 0.55 +/- 0.11&lt;br /&gt;5th: 0.60 +/- 0.15&lt;br /&gt;&lt;br /&gt;That it's such a neat little increasing sequence is probably luck.  That the home side is winning more Tests is not surprising, since we're in a very result-heavy era.  But the main point to take home is that there's no evidence that touring teams get better as they get more used to the foreign conditions or whatever.  Perhaps those stingy boards are right not to schedule extra tour matches.&lt;br /&gt;&lt;br /&gt;This is a result that surprised me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3427475797901507191?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3427475797901507191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3427475797901507191' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3427475797901507191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3427475797901507191'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/08/modern-lack-of-tour-games.html' title='The modern lack of tour games'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5741908913922234973</id><published>2008-07-27T13:09:00.000+02:00</published><updated>2008-07-27T13:17:26.591+02:00</updated><title type='text'>It Figures</title><content type='html'>I have joined Cricinfo's stats blog &lt;a href="http://blogs.cricinfo.com/itfigures/"&gt;It Figures&lt;/a&gt;.   My &lt;a href="http://blogs.cricinfo.com/itfigures/archives/2008/07/how_much_do_wickets_matter_in.php"&gt;first post&lt;/a&gt; there is a more sophisticated version of the IPL bowling analysis that I did earlier.  &lt;br /&gt;&lt;br /&gt;I should be posting to It Figures once a week, which is the rate I've been posting here.  I'm not sure what will happen to this Blogspot blog.  Perhaps only analyses that are too technical or too minor for Cricinfo.  &lt;i&gt;Boring and complicated &amp;mdash; that'll bring in the visitors&lt;/i&gt;.  We'll see how it goes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5741908913922234973?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5741908913922234973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5741908913922234973' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5741908913922234973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5741908913922234973'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/it-figures.html' title='It Figures'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7120428123722108461</id><published>2008-07-27T12:30:00.000+02:00</published><updated>2008-07-27T12:35:18.818+02:00</updated><title type='text'>Bowleds, LBW's, and a little quiz</title><content type='html'>Whenever people study umpiring bias, they almost always look at LBW's.  There's not a lot else you can do from looking at scorecards &amp;mdash; other dismissal types are much more clear-cut.&lt;br /&gt;&lt;br /&gt;A paper by Trevor Ringrose in 2006 ('Neutral umpires and leg before wicket decisions in test cricket', J. R. Stat. Soc. A &lt;b&gt;169&lt;/b&gt;, 903) considered LBW rates by country and the presence of neutral umpires, and found that the neutral umpires made no difference to the home-side bias of LBW decisions that affects some sides.&lt;br /&gt;&lt;br /&gt;That paper's too technical for me to be bothered wading through this evening, so instead I'll talk about what Charles Davis did in &lt;i&gt;The Best of the Best&lt;/i&gt;.  For each team X, he calculated the difference between X's LBW percentage (that is, number of LBW's divided by number of wickets) and their opponent's LBW percentage, first for X's home Tests, then for X's away Tests.  Find the difference of those two values and you get the home-side bias in LBW decisions for team X.&lt;br /&gt;&lt;br /&gt;Pakistan is the major side that apparently gets favoured the most by home umpiring, clearly ahead of Australia.  But Davis points out that in addition to having lots of LBW's go their way, Pakistani bowlers (I'm sure we know which ones) in the 1990's also got a very high number of bowled wickets.  Correcting for this, the apparent umpiring bias in Pakistan becomes comparable to that in other countries.  It just seems worse because they hit the pads so often.&lt;br /&gt;&lt;br /&gt;So, following thinking along these lines, I took all bowlers with at least 100 Test wickets since World War II and plotted their LBW to caught ratio (i.e., number of LBW's divided by number of wickets caught) against bowled to LBW ratio.  There's no fancy regressions to the mean or anything, these are raw numbers.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/bowledlbwct.png" alt="Sonny Ramadhin - the only bowler with more wickets bowled than caught" title="Sonny Ramadhin - the only bowler with more wickets bowled than caught"&gt;&lt;br /&gt;&lt;br /&gt;There's a bit of a trend there, but plenty of scatter.  Ian Johnson (109 wickets) has one of the highest bowled to caught ratios (just over 0.7) but an LBW to caught of less than 0.2.&lt;br /&gt;&lt;br /&gt;The two W's are fairly easy to spot &amp;mdash; they're the ones fairly close together with LBW to caught ratios above 0.6, and bowled to caught ratios above 0.5.  So they indeed got plenty of bowled and plenty of LBW's. But there are several bowlers with lots of LBW's and not many bowleds.&lt;br /&gt;&lt;br /&gt;Now for that little quiz.  Waqar has the highest LBW to caught ratio at 0.68.  Wasim is third at 0.62.  Who's second?  He's the other data point quite high on that scatterplot, with a bowled to caught of 0.29.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7120428123722108461?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7120428123722108461/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7120428123722108461' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7120428123722108461'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7120428123722108461'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/bowleds-lbws-and-little-quiz.html' title='Bowleds, LBW&apos;s, and a little quiz'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_bowledlbwct.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8807622209713517223</id><published>2008-07-20T12:57:00.000+02:00</published><updated>2008-07-20T13:14:22.800+02:00</updated><title type='text'>Wickets broken down by ball in the over</title><content type='html'>Quick one today, I've been busy for reasons that will become clear in a couple of days.&lt;br /&gt;&lt;br /&gt;Here's the breakdown of wickets by ball in the over, in Tests since 1998 or so.&lt;br /&gt;&lt;br /&gt;1: 2448&lt;br /&gt;2: 2443&lt;br /&gt;3: 2537&lt;br /&gt;4: 2464&lt;br /&gt;5: 2639&lt;br /&gt;6: 2413&lt;br /&gt;&lt;br /&gt;Ball five is about 3.3 standard deviations above the mean, which is interesting and significant at p=0.003.  (Usually 3.3 standard deviations would correspond to p=0.0005, but there are six tests going on, which increases the likelihood that one of them will turn out significant.  So I multiplied that 0.0005 by 6, which I hope is the correct thing to do.)  I can't think of any obvious reason why the fifth ball in the over is relatively wicket-prone, so I'm leaning towards it just being a blip.  Perhaps those stalemates in which the top-order batsman bats with the tail-ender and holds the strike for the first four balls?  I don't know.&lt;br /&gt;&lt;br /&gt;Now for the IPL:&lt;br /&gt;&lt;br /&gt;1: 122&lt;br /&gt;2: 131&lt;br /&gt;3: 104&lt;br /&gt;4: 104&lt;br /&gt;5: 104&lt;br /&gt;6: 124&lt;br /&gt;&lt;br /&gt;The numbers are pretty small, but it's something to think about for when I gather more T20 data.  Perhaps batsmen take a couple of balls to get their eye in against new bowlers.  In Test cricket, these bowler changes happen less frequently, and also the batsmen are more watchful.  In T20, they might be slogging from ball one.  Just a thought, nothing concrete.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8807622209713517223?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8807622209713517223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8807622209713517223' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8807622209713517223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8807622209713517223'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/wickets-broken-down-by-ball-in-over.html' title='Wickets broken down by ball in the over'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-9014084971461613430</id><published>2008-07-12T07:44:00.004+02:00</published><updated>2008-07-12T12:02:34.567+02:00</updated><title type='text'>Michael Vaughan looks funny when he gets bowled, but that is all.</title><content type='html'>(&lt;b&gt;Edit&lt;/b&gt;: I just fixed a problem with the regression to the mean.  No major changes to the batsmen that made the extremes of the tables, but the regressed estimates of bowled proportions are now much more accurate.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Edit&lt;/b&gt;: I should say that the methods used in this post have been either inspired by or directly copied from the baseballers, particularly from the authors of &lt;a href="http://www.amazon.com/dp/1597971294?tag=tangotiger-20&amp;camp=14573&amp;creative=327641&amp;linkCode=as1&amp;creativeASIN=1597971294&amp;adid=0AQ9MZVA1SXCMMHBPXNN&amp;"&gt;&lt;i&gt;The Book&lt;/i&gt;&lt;/a&gt; - see their blog &lt;a href="http://www.insidethebook.com/ee/"&gt;here&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Some people in comments &lt;a href="http://blogs.guardian.co.uk/sport/2008/07/10/pietersen_perfect_as_the_test.html"&gt;here&lt;/a&gt; (starting with MacMillings) have been discussing Michael Vaughan getting out bowled a lot.  I was asked to have a look at it.&lt;br /&gt;&lt;br /&gt;Charles Davis, in &lt;i&gt;The Best of the Best&lt;/i&gt;, produced a graph similar to this one:&lt;br /&gt;&lt;br /&gt;&lt;img title="Yellow triangles with black borders - never before seen on Pappus' plane" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/wkttypesyear.png" alt="Yellow triangles with black borders - never before seen on Pappus' plane"&gt;&lt;br /&gt;&lt;br /&gt;That's a plot showing the proportion of various dismissal types over time, looking only at batsmen from 1 to 6 in the order.  Though I've started that graph at the end of World War II, the decline in bowled dismissals has been going on since the start of Test cricket.  Why that should be so is a bit of a mystery.  The slack's been taken up by catches and (sometimes) LBW's, the latter being influenced somewhat by changes in the Laws.&lt;br /&gt;&lt;br /&gt;It's not entirely accounted for by keepers standing back and taking more catches &amp;mdashes; though more and more wickets are coming from catches to the keeper, adding them to the bowleds still gives a clear decreasing trend.&lt;br /&gt;&lt;br /&gt;So, rather than wondering where Michael Vaughan stands in relation to batsmen from history in terms of getting bowled, we'll consider only batsmen from 1990.  The trend in bowleds from 1990 to the present is close enough to flat.&lt;br /&gt;&lt;br /&gt;The next thing to think about is whether or not differences in bowled proportions for batsmen is an inherent characteristic of the various batting styles, or simply due to random chance.&lt;br /&gt;&lt;br /&gt;I took all batsmen with at least 50 dismissals since 1990, and an adjusted average of at least 35.  Across all these wickets, about 15% were bowled.  Now, any wicket is either bowled or something else.  If this is random, then the proportion of bowleds for a batsman will follow a binomial distribution, with mean 0.15 and standard deviation sqrt(0.15*(1-0.15)/outs).  Here and below, 'outs' is the number of times a batsman is dismissed.&lt;br /&gt;&lt;br /&gt;Plugging those numbers in to get z-scores for each of the batsmen in the dataset (59 of them), we find 6 with a z-score more than 2 standard deviations from the mean (from random chance, you'd expect about 3), and 27 more than 1 standard deviation from the mean (you'd expect about 19).  The standard deviation of the z-scores is about 1.2 instead of 1.&lt;br /&gt;&lt;br /&gt;Now, the observed variance comes from two terms &amp;mdash; random luck, and the inherent 'true' differences between the players.  Since luck is independent of the actual differences, we have that var(observed) = var(true) + var(luck).  The observed variance is about 0.04&lt;sup&gt;2&lt;/sup&gt;; the variance due to luck is roughly 0.15*0.85/120 = 0.033&lt;sup&gt;2&lt;/sup&gt; (the denominator 120 being the average number of outs across the batsman in the dataset).  The var(true) is the difference, and so the standard deviation of the inherent differences is sqrt(0.04&lt;sup&gt;2&lt;/sup&gt; - 0.033&lt;sup&gt;2&lt;/sup&gt;) = 0.025. &lt;br /&gt;&lt;br /&gt;So, there are genuine differences between batsmen in terms of how often they get out bowled, and it's sensible to start comparing them.  But before I start doing so, we should regress each player's observed bowled proportion to the mean.  We have an estimate of the player's bowled proportion as p +/- sqrt(p*(1-p)/outs), and the player's coming from a distribution that goes like 0.15 +/- 0.025.  The estimate of the batsman's 'true' bowled proportion is calculated using the same formula as given &lt;a href="http://pappubahry.blogspot.com/2008/06/followup-on-accuracy-of-averages.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;First, does a high proportion of bowled dismissals make a bad batsman?&lt;br /&gt;&lt;br /&gt;&lt;img title="This graph is boring." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/bowledpropavg.png" alt="This graph is boring."&gt;&lt;br /&gt;&lt;br /&gt;There's no trend at all amongst good batsmen.  Tail-enders (not shown on the graph) do get out bowled more often though.&lt;br /&gt;&lt;br /&gt;Now for the batsmen who get out bowled the most and the least since 1990.  The 'b' column is the number of bowled dismissals.  The last two columns are the observed proportion of bowleds and that figure regressed to the mean.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                        bowled prop&lt;br /&gt;name            outs  b   avg   adj avg obs   reg&lt;br /&gt;HH Gibbs        147   35  42.0  36.9    0.238 0.179&lt;br /&gt;JH Kallis       168   37  57.0  49.8    0.220 0.176&lt;br /&gt;VVS Laxman      132   30  43.8  39.7    0.227 0.175&lt;br /&gt;RS Dravid       182   35  55.4  47.9    0.192 0.168&lt;br /&gt;AJ Stewart      214   40  39.5  39.7    0.187 0.167&lt;br /&gt;AR Border       62    15  43.3  39.9    0.242 0.166&lt;br /&gt;RA Smith        83    18  42.6  42.3    0.217 0.166&lt;br /&gt;SR Waugh        170   31  53.2  47.9    0.182 0.164&lt;br /&gt;SR Tendulkar    207   37  55.9  48.6    0.179 0.164&lt;br /&gt;ME Trescothick  133   25  43.8  41.1    0.188 0.164&lt;br /&gt;---&lt;br /&gt;Saeed Anwar     89    9   45.5  41.8    0.101 0.132&lt;br /&gt;ML Hayden       152   17  53.0  45.7    0.112 0.132&lt;br /&gt;RR Sarwan       121   13  40.4  36.6    0.107 0.132&lt;br /&gt;S Chanderpaul   163   18  49.1  45.9    0.110 0.131&lt;br /&gt;KC Sangakkara   111   11  55.2  46.6    0.099 0.129&lt;br /&gt;Younis Khan     98    9   49.1  45.5    0.092 0.126&lt;br /&gt;JC Adams        73    6   41.3  38.7    0.082 0.126&lt;br /&gt;CD McMillan     81    6   38.5  35.4    0.074 0.119&lt;br /&gt;PA de Silva     119   10  45.3  39.6    0.084 0.119&lt;br /&gt;CL Hooper       133   11  38.5  37.2    0.083 0.116&lt;/pre&gt;&lt;br /&gt;I wouldn't have picked Border to be near the top.  Though he was on the decline in his last few years (which is all the above table considers), his high bowled proportion was a feature throughout his career.&lt;br /&gt;&lt;br /&gt;Where's Michael Vaughan?  At an observed proportion of 0.157 (now 0.164 after his latest dismissal), regressed to 0.153.  Just above above the mean, nothing special or unusual at all.&lt;br /&gt;&lt;br /&gt;His technique does &lt;a href="http://republiquecricket.com/2008/03/11/all-aboard-the-fail-boat/"&gt;lend itself to jokes&lt;/a&gt; though.&lt;br /&gt;&lt;br /&gt;Lastly, there was some talk about whether or not bowleds are more common at lower scores.  Since 1990, dismissal proportions by score, amongst top six batsmen:&lt;br /&gt;&lt;br /&gt;&lt;img title="I love those distorted x's." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/wkttypesscore.png" alt="I love those distorted x's."&gt;&lt;br /&gt;&lt;br /&gt;The regression lines from top to bottom are caught by non-keepers, caught by keeper, LBW, bowled.&lt;br /&gt;&lt;br /&gt;Bowleds in fact stay pretty steady.  Catches at the wicket and LBW's decline, and catches to non-keepers become steadily more prevalent as the innings goes on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-9014084971461613430?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/9014084971461613430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=9014084971461613430' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9014084971461613430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9014084971461613430'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/michael-vaughan-looks-funny-when-he.html' title='Michael Vaughan looks funny when he gets bowled, but that is all.'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_wkttypesyear.png' height='72' width='72'/><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4996231189562970914</id><published>2008-07-10T13:27:00.001+02:00</published><updated>2008-07-10T13:31:00.782+02:00</updated><title type='text'>Bradman Day</title><content type='html'>As some of you may be aware, Don Bradman would be turning 100 this year if he were still alive.  He was born on 27 August 1908, and the simple thing to do, if we wanted to have a day to celebrate and remember Bradman, would be to do so on 27 August 2008.  &lt;br /&gt;&lt;br /&gt;But Andrew Samson &lt;a href="http://batandballbrimborion.blogspot.com/2007/10/bradman-celebration.html"&gt;suggested&lt;/a&gt; last year a much more appropriate date: 6 August 2008.  Why the sixth?  Because Bradman would have been 99.94 years old on that day.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4996231189562970914?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4996231189562970914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4996231189562970914' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4996231189562970914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4996231189562970914'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/bradman-day.html' title='Bradman Day'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8492890988055147042</id><published>2008-07-06T08:06:00.000+02:00</published><updated>2008-07-06T08:07:11.322+02:00</updated><title type='text'>Rugby and the ELV's</title><content type='html'>As the heading indicates, this post is not about cricket.&lt;br /&gt;&lt;br /&gt;Last night's rugby Test between Australia and France was won convincingly by Australia (40-10) despite the French having much more possession (I haven't seen a figure since mid-match, but it was somewhere around 65%).  This got me wondering about the relation between possession, territory, and winning in rugby.  I downloaded the last two seasons' worth of data for the Super 14 from &lt;a href="http://rugbystats.com.au/index.html"&gt;Rugby Stats&lt;/a&gt; to see what it said.  The Rugby Stats site gives all sorts of data (unfortunately not going back further than the last couple of years), but for this post I've just used taken the home team possession and territory for each game, along with the fraction of points scored by the home team.  So, eg, if the home team won 20-10, they had 0.667 of the points scored.&lt;br /&gt;&lt;br /&gt;I'll start with the 2007 season, which of course was played under the traditional rugby laws.  Here's some of what gretl had to say:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Model 1: OLS estimates using the 94 observations 1-94&lt;br /&gt;Dependent variable: h_score_percent&lt;br /&gt;&lt;br /&gt;      VARIABLE       COEFFICIENT        STDERROR      T STAT   P-VALUE&lt;br /&gt;&lt;br /&gt;  const                -0.379896         0.355244     -1.069   0.28772&lt;br /&gt;  h_poss                1.88078          0.688795      2.731   0.00759 ***&lt;br /&gt;  h_terr               -0.0305528        0.182781     -0.167   0.86762&lt;br /&gt;&lt;br /&gt;  Mean of dependent variable = 0.558623&lt;br /&gt;  Standard deviation of dep. var. = 0.194866&lt;br /&gt;  Sum of squared residuals = 3.26391&lt;br /&gt;  Standard error of residuals = 0.189386&lt;br /&gt;  Unadjusted R-squared = 0.0757645&lt;/pre&gt;&lt;br /&gt;So, having the ball helps &amp;mdash; for each extra percentage point of ball, you got almost two percentage points worth of the final score.  On average, about 44 points were scored (in total) each game, and 2% of 44 is 0.88 points.  Of course, when the home side gets a bigger slice of the points, the away side must lose the same amount, so it's really about a 2-point swing. (If you work with raw scores and not fractions of total points, you get a similar result.)  There's a lot of scatter in the data &amp;mdash; the R-squared is only 0.076.&lt;br /&gt;&lt;br /&gt;So, all other things equal, if the score is 27-17 with equal possession, it'd be (on average) 28-16 with 51-49 possession.&lt;br /&gt;&lt;br /&gt;Territory, on the other hand, doesn't make a difference.&lt;br /&gt;&lt;br /&gt;Now let's look at 2008, played under the ELV's.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Model 1: OLS estimates using the 94 observations 1-94&lt;br /&gt;Dependent variable: h_score_percent&lt;br /&gt;&lt;br /&gt;      VARIABLE       COEFFICIENT        STDERROR      T STAT   P-VALUE&lt;br /&gt;&lt;br /&gt;  const                -1.70310          0.333894     -5.101  &lt;0.00001 ***&lt;br /&gt;  h_poss                4.26355          0.656012      6.499  &lt;0.00001 ***&lt;br /&gt;  h_terr                0.228343         0.111618      2.046   0.04367 **&lt;br /&gt;&lt;br /&gt;  Mean of dependent variable = 0.539166&lt;br /&gt;  Standard deviation of dep. var. = 0.172713&lt;br /&gt;  Sum of squared residuals = 1.84663&lt;br /&gt;  Standard error of residuals = 0.142452&lt;br /&gt;  Unadjusted R-squared = 0.334345&lt;/pre&gt;&lt;br /&gt;The ELV's appear to have made possession much more important &amp;mdash; you end up with a 4-point swing in score for each percentage point of possession, rather than 2 points.  Also, territory seems to be mildly important and beneficial now.  The R-squared is 0.33, so possession and territory are much better at predicting the final result under the ELV's than they are under the old laws.&lt;br /&gt;&lt;br /&gt;If any of you are rugby fans, feel free to make any requests for rugby analysis.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8492890988055147042?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8492890988055147042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8492890988055147042' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8492890988055147042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8492890988055147042'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/07/rugby-and-elvs.html' title='Rugby and the ELV&apos;s'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4902203014673375506</id><published>2008-06-30T13:04:00.000+02:00</published><updated>2008-06-30T13:05:19.047+02:00</updated><title type='text'>Mini-orders</title><content type='html'>&lt;a href="http://eye-on-cricket.blogspot.com/"&gt;Samir&lt;/a&gt; &lt;a href="http://blogs.cricinfo.com/diffstrokes/"&gt;Chopra&lt;/a&gt; asked for a stats post on "mini-orders", and here it is.&lt;br /&gt;&lt;br /&gt;A mini-order is defined, for this post, as a block of three players at the same positions in the batting order. So, for instance, you could have Langer-Hayden-Ponting as a mini-order (with positions 1, 2, and 3).  Now, I could fill up pages with the various possibilities (123, 345, 456, 567, etc.), but that seems like it might be excessive.  So below I've listed the leading mini-orders by runs scored.  This is, of course, a list heavily biased towards recent teams.&lt;br /&gt;&lt;br /&gt;In the table below, the columns are the number of team innings in which the triple appeared; total runs made in those innings by the batsmen in that mini-order; their average in those innings; the number of runs made in partnerships between those three batsmen in those innings; and the average of those partnerships, adjusted for era and quality of the bowling (relative to an overall average of 31.5).  The regular average and partnership average are typically close to each other.  The partnership stats are not complete, since I ignore any team innings which look like they involved a retired hurt.&lt;br /&gt;&lt;br /&gt;Note that the order is strict &amp;mdash; Langer-Hayden-Ponting is considered separately from Hayden-Langer-Ponting.  The latter only happened twice, by my count.  If you ignore order, then Taylor-Slater-Boon would go into fifth place.  Taylor and Slater alternated almost perfectly in which of the two faced the first ball.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;pos name1          name2            name3           i   runs  avg   p-runs  adj part avg&lt;br /&gt;123 JL Langer      ML Hayden        RT Ponting      94  15034 60.6  10352   53.8&lt;br /&gt;345 RS Dravid      SR Tendulkar     SC Ganguly      78  11319 55.8  5624    49.5&lt;br /&gt;123 CG Greenidge   DL Haynes        RB Richardson   84  9778  43.8  6611    41.5&lt;br /&gt;123 MS Atapattu    ST Jayasuriya    KC Sangakkara   58  7352  46.5  4807    35.2&lt;br /&gt;456 SR Tendulkar   SC Ganguly       VVS Laxman      54  6742  49.9  2806    53.0&lt;br /&gt;345 JL Langer      ME Waugh         SR Waugh        51  6405  47.1  2340    34.9&lt;br /&gt;123 ME Trescothick MP Vaughan       MA Butcher      49  5956  44.8  4132    41.8&lt;br /&gt;456 ME Waugh       SR Waugh         RT Ponting      43  5466  48.4  2343    54.2&lt;br /&gt;456 PA de Silva    A Ranatunga      HP Tillakaratne 43  4688  40.1  2033    38.2&lt;br /&gt;345 JH Kallis      DJ Cullinan      WJ Cronje       37  4412  43.7  2190    37.6&lt;br /&gt;345 RR Sarwan      BC Lara          S Chanderpaul   35  4281  42.8  1560    40.3&lt;br /&gt;123 SM Gavaskar    CPS Chauhan      DB Vengsarkar   35  4038  40.4  2555    41.3&lt;br /&gt;456 DJ Cullinan    WJ Cronje        JN Rhodes       33  3999  46.0  1678    39.8&lt;br /&gt;345 KC Sangakkara  DPMD Jayawardene TT Samaraweera  32  3992  45.4  1464    30.7&lt;br /&gt;345 HM Amla        JH Kallis        AG Prince       29  3951  52.0  2209    46.4&lt;br /&gt;123 CG Greenidge   DL Haynes        IVA Richards    32  3813  42.8  2990    47.2&lt;br /&gt;345 Younis Khan    Inzamam-ul-Haq   Yousuf Youhana  25  3772  54.7  1778    48.5&lt;br /&gt;123 L Hutton       C Washbrook      WJ Edrich       28  3729  49.7  2407    47.8&lt;br /&gt;345 AP Gurusinha   PA de Silva      A Ranatunga     33  3721  42.8  2081    50.1&lt;br /&gt;123 GR Marsh       MA Taylor        DC Boon         30  3675  43.2  2779    44.3&lt;br /&gt;345 Younis Khan    Yousuf Youhana   Inzamam-ul-Haq  21  3600  66.7  1766    77.0&lt;/pre&gt;&lt;br /&gt;The constancy of the Australian batting lineup in recent years is well-known, of course, so it's perhaps no surprise to see that the Langer-Hayden-Ponting trio has appeared in more innings in that order than any other.  Even allowing for the high scoring these days, they come out easily better than Greenidge-Haynes-Richardson.&lt;br /&gt;&lt;br /&gt;Leading mini-order at each position by adjusted average of the batsmen, qualification 10 innings:&lt;br /&gt;123: Woodfull-Ponsford-Bradman, 13 innings, avg 81.7, adj avg 75.4&lt;br /&gt;345: Bradman-Kippax-McCabe, 12 innings, avg 78.8, adj avg 71.7&lt;br /&gt;456: Hussey-Clarke-Symonds, 16 innings, avg 64.3, adj avg 57.8&lt;br /&gt;567: Clarke-Symonds-Gilchrist, 11 innings, avg 53.1, adj avg 48.6&lt;br /&gt;&lt;br /&gt;A very Australian affair.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4902203014673375506?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4902203014673375506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4902203014673375506' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4902203014673375506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4902203014673375506'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/06/mini-orders.html' title='Mini-orders'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7238812939993483221</id><published>2008-06-29T11:09:00.000+02:00</published><updated>2008-06-29T11:41:10.096+02:00</updated><title type='text'>Followup on accuracy of averages</title><content type='html'>&lt;a href="http://deggles.csoft.net/"&gt;Russ&lt;/a&gt; pointed out a couple of things in the previous post.  For those who missed the comments thread, here are the revised formulas for calculating uncertainties.  &lt;br /&gt;&lt;br /&gt;Batting: 0.9 * average / sqrt(# innings)&lt;br /&gt;Bowling: 0.9 * average / sqrt(# wickets)&lt;br /&gt;&lt;br /&gt;So, e.g., Mike Hussey becomes 68.4 +/- 9.5.  About 68% of 'true' averages will lie within the range given.  You need to double it to get it up to 95%.  &lt;br /&gt;&lt;br /&gt;I haven't made much of an effort to work out the underlying distribution of Australian players that Hussey comes from.  To get a rough idea of what should happen, I found the mean and standard deviation of averages of Australian batsmen at batting positions 1 through 7, over the last ten years.  There's a bit of a problem about what to do with players who only played a couple of Tests and averaged (say) 5 &amp;mdash; clearly they could have averaged up around 20 or 30 if given more opportunities.&lt;br /&gt;&lt;br /&gt;Anyway, I bumped those guys up to 20, and the result was something like mean 42, standard deviation 12.  So, carrying on with the Hussey example, we crunch the numbers like this:&lt;br /&gt;&lt;br /&gt;regressed average = (68.4/9.5&lt;sup&gt;2&lt;/sup&gt; + 42 / 12&lt;sup&gt;2&lt;/sup&gt;) / (1/9.5&lt;sup&gt;2&lt;/sup&gt; + 1/12&lt;sup&gt;2&lt;/sup&gt;)&lt;br /&gt;&lt;br /&gt;uncertainty = 1 / sqrt(1/9.5&lt;sup&gt;2&lt;/sup&gt; + 1/12&lt;sup&gt;2&lt;/sup&gt;)&lt;br /&gt;&lt;br /&gt;to estimate Hussey's 'true' average as about 58 +/- 7.&lt;br /&gt;&lt;br /&gt;Let's just hope that he can score runs in India.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7238812939993483221?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7238812939993483221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7238812939993483221' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7238812939993483221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7238812939993483221'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/06/followup-on-accuracy-of-averages.html' title='Followup on accuracy of averages'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3567331187314343779</id><published>2008-06-22T08:26:00.003+02:00</published><updated>2008-06-29T11:42:23.213+02:00</updated><title type='text'>Accuracy of averages</title><content type='html'>Today I would like to relate some horrifying thoughts about averages.  I would like to be wrong, so if you think that there are mistakes with what I've done, do comment.  (&lt;b&gt;Update&lt;/b&gt;: See the comments thread, and &lt;a href="http://pappubahry.blogspot.com/2008/06/followup-on-accuracy-of-averages.html"&gt;followup&lt;/a&gt;.  The uncertainties I give below for batsmen are about twice as big as they should be.  For bowlers they are &lt;s&gt;about three times&lt;/s&gt; also about two times too big.)&lt;br /&gt;&lt;br /&gt;I started thinking about this as I started working my way through &lt;i&gt;The Book: Playing the Percentages in Baseball&lt;/i&gt; (the authors blog &lt;a href="http://www.insidethebook.com/ee/"&gt;here&lt;/a&gt;), trying to pick out the bits which can carry over to cricket, so that we don't have to re-invent wheels that the baseballers have already made for us.&lt;br /&gt;&lt;br /&gt;One key point that they make is that a player's raw statistics aren't the best estimates of his true talent &amp;mdash; you have to regress to the mean.  The less reliable the stat, the more you regress.  The less data you have, the more you regress.  (And vice versa.)  We know this intuitively in some cases &amp;mdash; much though I love him, no-one really thinks that Mike Hussey is an 80-average batsman, and indeed in the West Indies his average has dropped to below 70.&lt;br /&gt;&lt;br /&gt;But the question is, how many innings does a batsman have to bat before we can be confident that his average is accurately reflecting his talent (and not have to worry about regressing to the mean)?  The short answer appears to be something on the order of 10000 innings, if we want to nail the average down to within a run or so.  &lt;br /&gt;&lt;br /&gt;That's an appallingly large number of innings, completely counter-intuitive for me.  Averages seem to stabilise for batsmen after a hundred innings or so.  But that intuition we have is based on the wrong thing.  Career averages are stable because subsequent innings can't change the overall average much.  A better way of thinking is, what would happen if the player re-ran his career from the start (so same opponents, etc.) but with different luck?  Here, luck could be things like balls that beat the bat actually finding the edge (or vice versa), dropped catches, etc.&lt;br /&gt;&lt;br /&gt;At this point I still would have thought that over a couple of hundred innings, you'd get the same average, to within a run.  But the numbers are telling me different things.&lt;br /&gt;&lt;br /&gt;To take an artificial example, suppose that a batsman's scores are exponentially distributed with mean 50, and no not-outs.  I ran a few simulations of such a batsman over 300 innings, and here are the sample means that came out: 51.8, 54.4, 47.1, 48.4, 50.1.  &lt;br /&gt;&lt;br /&gt;That's quite a wide range, even for a longer career than any in Test history.  At 47.1, you're talking about a very good batsman.  At 54.4, he's an all-time great (perhaps not in today's batting-friendly world).  In practice, we would expect that it would be even worse than this, because batting scores are not exponentially distributed &amp;mdash; the standard deviation for real cricket scores tends to be higher than for exponential scores.&lt;br /&gt;&lt;br /&gt;So now let's look at some real cricket scores.  The way I'll do this is to take a player, and compare one half of his career to the other.  Now, you can't take the first half and second half of the career, because there might be a change in talent over that time (developing better technique, losing reflexes, etc.).  So instead, I split the innings into odds and evens (further splitting by first and second innings in matches &amp;mdash; I didn't do this perfectly, but it should be close enough).  This way, any genuine slumps or good years will be split evenly into the two halves for comparison.&lt;br /&gt;&lt;br /&gt;Allan Border in his 'even' innings (132 of them) averaged 49.5, and averaged 51.6 in his 133 odd innings.  That's not too bad, I suppose.  The two are pretty close together.&lt;br /&gt;&lt;br /&gt;But what about Steve Waugh, who was almost as prolific in terms of innings?  Evens 55.9, odds 46.3.  Tendulkar: evens 52.6, odds 58.0.  Viv Richards: evens 66.1, odds 36.5.&lt;br /&gt;&lt;br /&gt;Those are some hefty differences (Richards' being one of the most striking).  Here is a plot of the odds average against the evens average for all batsmen who played 50 or more Tests and averaged at least 30.&lt;br /&gt;&lt;br /&gt;&lt;img title="O the scatter." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/evensoddsbat.png" alt="O the scatter."&gt;&lt;br /&gt;&lt;br /&gt;That R-squared value drops even further (to 0.18) if you remove Bradman.  If there were no luck at all involved, then R-squared would be 1, and the dots would make a nice little y = x line.  Cricket is a lot more luck-filled than that.&lt;br /&gt;&lt;br /&gt;We would like some kind of estimate of the uncertainty involved in batting averages.  As we see from the graph above, they'll be pretty big.  I'm not entirely sure if what I did was the best way of doing things, so if any stat-heads amongst you can suggest improvements, please do.&lt;br /&gt;&lt;br /&gt;I took the odd averages, guessed an error that went like k * (odd avg) / sqrt(number of odd innings), and fiddled with the constant k until roughly 68% of the even averages fell within that margin.  I got k = 1.7 or so.  (If anyone could tell me where the 1.7 comes from, I'd be grateful.  The average co-efficient of variation for batsmen is about 1.05, so by the Central Limit Theorem I would have expected k = 1.05.)&lt;br /&gt;&lt;br /&gt;So, we can use this to estimate the uncertainty over whole careers, by 1.7 * avg / sqrt(innings).  &lt;br /&gt;&lt;br /&gt;Even for a career as long as Border's, that gives an uncertainty of about +/- 5.3 runs.  Mike Hussey comes out to 68.4 +/- 17.9.&lt;br /&gt;&lt;br /&gt;Now in Hussey's case, we'd lean much more towards the lower part of that estimated range &amp;mdash; he's not an 85-average batsman.  Why do we think that?  Because only one man in history has been that good, and no-one else has ever got close.  It's much more likely that Hussey is like everyone else than he's like Bradman.&lt;br /&gt;&lt;br /&gt;To make estimates of this sort more rigorous, we need to know the distribution of the batsmen that Hussey is a part of.  This won't be the overall mean and standard deviation of averages across all Test batsmen, because clearly the talent pool in Australia is much stronger than in Bangladesh.  Probably what I'll do is use my adjusted averages and work by country (and possibly era &amp;mdash; the standard deviation of averages is on a slow historical decline).  But this will be for a later post.&lt;br /&gt;&lt;br /&gt;I'll finish by saying that the story is similar for bowlers.  Here is the even-odds graph for bowlers with at least 3 wickets per Test over 50 Tests:&lt;br /&gt;&lt;br /&gt;&lt;img title="Zaheer Khan is that outlier." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/evensoddsbowl.png" alt="Zaheer Khan is that outlier."&gt;&lt;br /&gt;&lt;br /&gt;The uncertainties I make to be about 1.7 * avg / sqrt(wickets).  Warne (for instance) becomes 25.5 +/- 1.6.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3567331187314343779?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3567331187314343779/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3567331187314343779' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3567331187314343779'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3567331187314343779'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/06/accuracy-of-averages.html' title='Accuracy of averages'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_evensoddsbat.png' height='72' width='72'/><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6544984387094107221</id><published>2008-06-14T06:00:00.000+02:00</published><updated>2008-06-14T06:05:21.663+02:00</updated><title type='text'>Clarke when the pressure's off</title><content type='html'>Homer &lt;a href="http://dopaisekatamasha.blogspot.com/2008/06/putting-uncles-theory-to-test.html"&gt;broke down&lt;/a&gt; Michael Clarke's innings to see what happened when he came in with the score less than 150, and when he came in with the score greater than or equal to 150.  Clarke does much better when the going's easy.  But that's not a proof that Clarke is special &amp;mdash; we would expect that batsmen do better when the bowlers have been struggling to take wickets.&lt;br /&gt;&lt;br /&gt;So ran the numbers for all batsmen at 5 or 6.  I grouped the innings into those worse than 3/150 or 4/200 (these seem reasonably equivalent), and those better.  Then I took the difference of the averages.  Then, to get some mileage out of &lt;a href="http://pappubahry.blogspot.com/2008/04/slumps-is-there-problem-or-is-he-just.html"&gt;this old monstrosity post&lt;/a&gt; of mine, I got an estimate of the probability that the "going's easy" average would arise by chance, given the "going's not easy" average, and the number of innings in each category.  To give an example, Michael Clarke below gets a p-value of 0,20 &amp;mdash; only about one in five batsmen would have such a rise in average.  If there's an asterisk, then it means that the difference was too large for my estimation algorithm, and I got a senseless result.&lt;br /&gt;&lt;br /&gt;(In &lt;i&gt;The Best of the Best&lt;/i&gt;, Charles Davis defines a 'pressure average', which takes into account the state of the match &amp;mdash; 4/50 in the second innings isn't a pressure situation if you've got a lead of 250 on the first innings.  I can't be bothered going into this much detail.)&lt;br /&gt;&lt;br /&gt;Note that many of the batsmen below spent much of their career higher up the order.  Also note that my stats are a couple of months out of date.&lt;br /&gt;&lt;br /&gt;Qualification of at least 10 easy innings and at least 10 not-easy innings:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;             worse than 3/150    better than 3/150&lt;br /&gt;name          inns  runs  avg     inns  runs  avg   diff    p&lt;br /&gt;SC Ganguly    77    2285  32,2    43    2069  54,4  -22,3   *&lt;br /&gt;MJ Clarke     24    854   37,1    17    1037  74,1  -36,9   0,20&lt;br /&gt;MV Boucher    12    342   28,5    11    599   66,6  -38,1   0,26&lt;br /&gt;DR Martyn     23    619   31,0    14    787   60,5  -29,6   0,35&lt;br /&gt;PH Parfitt    18    632   39,5    11    696   87,0  -47,5   0,37&lt;br /&gt;DB Vengsarkar 16    439   33,8    11    581   72,6  -38,9   0,37&lt;br /&gt;TE Bailey     25    653   29,7    13    543   60,3  -30,7   0,38&lt;br /&gt;DI Gower      35    1262  39,4    16    926   71,2  -31,8   0,45&lt;br /&gt;KR Miller     32    978   34,9    19    1000  55,6  -20,6   0,49&lt;br /&gt;RP Arnold     13    215   16,5    11    331   30,1  -13,6   0,50&lt;/pre&gt;&lt;br /&gt;Clarke really has been pretty bad (well, sort of &amp;mdash; 37,1 is below average).  In terms of the raw difference, he's fifth worst (Les Ames is just off this table, difference of -37,3.).&lt;br /&gt;&lt;br /&gt;And now those rare batsmen who do worse when the pressure's off:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;             worse than 3/150    better than 3/150&lt;br /&gt;name          inns  runs  avg     inns  runs  avg   diff    p&lt;br /&gt;A Flower      80    3761  57,9    10    310   31,0  26,9    *&lt;br /&gt;CH Lloyd      78    3700  52,1    30    987   35,3  16,9    *&lt;br /&gt;ND McKenzie   29    1056  40,6    16    438   27,4  13,2    0,30&lt;br /&gt;A Symonds     10    389   43,2    10    233   29,1  14,1    0,44&lt;br /&gt;SJ McCabe     19    830   48,8    12    397   33,1  15,7    0,46&lt;br /&gt;SE Gregory    37    1015  28,2    12    205   18,6  9,6     0,56&lt;br /&gt;IVA Richards  45    2051  51,3    22    852   40,6  10,7    0,77&lt;br /&gt;RT Ponting    33    1604  51,7    17    570   40,7  11,0    0,81&lt;br /&gt;KD Walters    60    2653  51,0    28    1113  42,8  8,2     0,87&lt;br /&gt;KF Barrington 21    878   43,9    14    409   37,2  6,7     0,90&lt;/pre&gt;&lt;br /&gt;When the p-value is higher than 0,5, it means that such a 'slump' would occur in the career of one in two batsmen &amp;mdash; pretty unremarkable.  Clive Lloyd's record is probably the most remarkable of these, given the relatively large number of innings.&lt;br /&gt;&lt;br /&gt;In the set of 83 players, 52 have better averages in easy situations, and 31 in not-easy situations.&lt;br /&gt;&lt;br /&gt;Sorry for the no-post last weekend.  The problem with devoting only one day a week to cricket stats is that if I don't get something working, then it doesn't get done for a while.  I will try to return to IPL analysis next weekend.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6544984387094107221?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6544984387094107221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6544984387094107221' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6544984387094107221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6544984387094107221'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/06/clarke-when-pressures-off.html' title='Clarke when the pressure&apos;s off'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4146914303879313169</id><published>2008-05-31T08:02:00.002+02:00</published><updated>2008-05-31T12:26:10.166+02:00</updated><title type='text'>Rajasthan and Moneyball</title><content type='html'>Michael Atherton's &lt;a href="http://www.timesonline.co.uk/tol/sport/cricket/article4022575.ece"&gt;column&lt;/a&gt; in &lt;i&gt;The Times&lt;/i&gt; asks if Rajasthan are the Oakland A's of the IPL.  The Oakland A's are a low-budget team in Major League Baseball, who were nevertheless able to make the playoffs and compete well with much richer teams.  They did this by exploiting inefficiencies in the player market and clever drafting &amp;mdash; batters who earned lots of bases on balls were undervalued by other teams, and other teams tended to draft players straight out of high school, a much riskier strategy than drafting players who had proven themselves at college level.  The story behind this is detailed in the excellent book &lt;i&gt;Moneyball&lt;/i&gt;.  The publication of the book seems to have made life more difficult for the A's &amp;mdash; many other clubs now employ the same sort of statisticians as the A's did, using the same ideas.&lt;br /&gt;&lt;br /&gt;Rajasthan spent the least amount of money of all the IPL franchises at the player auctions, so in that sense they're similar to the A's.  But we shouldn't overstate how prescient they were, because they really weren't.  &lt;br /&gt;&lt;br /&gt;They flagrantly ignored &lt;i&gt;Moneyball&lt;/i&gt; principles early on.  Their big  success-from-obscurity has beeen Swapnil Asnodkar.  Asnodkar, indeed, averages over 40 in List A cricket, and could easily be selected based on that stat.  (Quite how he manages to do so with such a loose technique is not something the stats can shed any light on.)  But they didn't pick him in the XI until their fifth game.  Earlier, they had picked (for instance) Taruwar Kohli.  You can't get much less &lt;i&gt;Moneyball&lt;/i&gt; than that &amp;mdash; he was picked off his under-19 performances, without having even played a first-class or List A match.  Selection based on under-19 results!  Under-19 cricket is of much lower quality than senior domestic cricket, and you're much safer in going to players with proven senior records.  We don't need &lt;i&gt;Moneyball&lt;/i&gt; to tell us that.  &lt;br /&gt;&lt;br /&gt;Of course, there is the requirement to play four under-22's, but there should be some 21-year-olds playing first-class cricket to choose from.  Warne also picked the legspinner Salunkhe, who hadn't played a first-class game.  Perhaps the experience of playing with Warne on the field was good for Salunkhe in the long term (I don't know), but he wasn't particularly effective and was soon dropped.&lt;br /&gt;&lt;br /&gt;Some of the principles of &lt;i&gt;Moneyball&lt;/i&gt; should carry over to the player trading in the IPL.  The difficulty will be in evaluating the players.  It is easy to see how they performed in this tournament, but teams shouldn't be working out their trades purely on this tournament.  Some players were lucky (Marsh and Tanvir), and some unlucky (&lt;s&gt;Dhoni&lt;/s&gt; (&lt;b&gt;Edit&lt;/b&gt;: Check stats before posting!), Tendulkar, Misbah).  If teams are silly and give excessive weight to IPL stats, then it may be possible for teams to pick up some bargains with the big name stars who under-performed.  The converse also applies.&lt;br /&gt;&lt;br /&gt;Working out the balance between batting and bowling will also be important.  The top bowlers seem to be worth between five and ten runs a game, relative to an average bowler.  How much is a top batsman worth?  More importantly, what about the middle-level players?  I haven't answered this question, though the guys at Rediff might have.&lt;br /&gt;&lt;br /&gt;Things to think about.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4146914303879313169?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4146914303879313169/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4146914303879313169' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4146914303879313169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4146914303879313169'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/rajasthan-and-moneyball.html' title='Rajasthan and &lt;i&gt;Moneyball&lt;/i&gt;'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1010113248792908768</id><published>2008-05-31T07:59:00.001+02:00</published><updated>2008-05-31T08:57:17.546+02:00</updated><title type='text'>Rating IPL bowling</title><content type='html'>I have been saying in comments around the blogosphere that economy rate is probably much more important than bowling average in T20.  I decided to work out just how much wickets are worth.  &lt;br /&gt;&lt;br /&gt;Once I finished getting some numbers out, I realised that the method I'd used was quite close to Duckworth-Lewis, and I could probably have just adopted the old DL tables for these purposes.  Hopefully I'll get around to comparing them to what I got some time.  In the meantime, I figured that IPL innings might be different from the last 20 overs of ODI innings, and that you all probably wanted a nine-colour scatterplot.&lt;br /&gt;&lt;br /&gt;&lt;img title="I was very relieved to see the slope decrease at each wicket." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/iplresources.png" alt="I was very relieved to see the slope decrease at each wicket."&gt;&lt;br /&gt;&lt;br /&gt;Each dot represents a wicket in the first innings of the league stage of the IPL.  (I ignored second innings, since they don't always last 20 overs.)  I've fitted linear curves for each wicket, forcing it through the origin (you can't score any runs with zero balls left).  You'll note that the points near the origin tend to be above the best-fit lines &amp;mdash; that's because of late-over slogging.  That would be important if I wanted to adjust targets for a rain-rule method, but here I'm only interested in the gaps between the best-fit lines, to see what the wickets are worth.&lt;br /&gt;&lt;br /&gt;We see that the wickets aren't particularly important.  A wicket on the first ball of the innings reduces the final score, on average, by about two and a half runs.  This agrees with common sense &amp;mdash; with only twenty overs to bat, you can keep batting aggressively with the fall of a few wickets.&lt;br /&gt;&lt;br /&gt;The slopes of the regression lines (to more significant figures than are really justified...) are:&lt;br /&gt;0 (extrapolated from wickets 1 to 6): 1,378&lt;br /&gt;1: 1,357&lt;br /&gt;2: 1,329&lt;br /&gt;3: 1,298&lt;br /&gt;4: 1,271&lt;br /&gt;5: 1,249&lt;br /&gt;6: 1,233&lt;br /&gt;7: 1,027&lt;br /&gt;8: 0,459&lt;br /&gt;9: 0,172&lt;br /&gt;&lt;br /&gt;Now, we can use this to start evaluating the impact of bowlers.  Suppose a bowler takes the fifth wicket on the last ball of the tenth over.  With four wickets down with 60 balls left is worth, the batting team should score another 1,271*60 = 76,26 runs.  With five wickets down, they should score 1,249*60 = 74,94 runs.  The difference of 1,3 runs gets credited to the bowler.  Do this for all the bowler's wickets, and you can adjust his runs conceded and get an effective economy rate.&lt;br /&gt;&lt;br /&gt;There are a few points worth noting:&lt;br /&gt;- There's no consideration of how high-scoring the pitch/ground is.&lt;br /&gt;- The quality of the batsman dismissed is ignored.&lt;br /&gt;- The same crediting applies in both first and second innings.&lt;br /&gt;- If a team collapses quickly (say six wickets down by 10 or 12 overs), then the bowler who picks up the next wicket gets quite a lot of credit, since the difference between being in the tail and being in the recognised batsmen is large when there are still some overs left to bat.  This isn't really fair on the bowlers who took the early wickets, but it doesn't seem to cause too many problems when comparing bowlers who bowl regularly.&lt;br /&gt;&lt;br /&gt;The overall economy rate for bowlers during the IPL was about 1.36 runs per ball.  By taking the effective economy rate and comparing it to the average, you get a measure of how many runs the bowler was worth.  In the table below, I've called this the value-24: the number of runs above average the bowler is over 24 balls (kind of).  I'm not good at coming up for names of things.  I wanted to do this because I'm hoping to do something similar for batsmen (that is, get a run per game value for them), so that we can put batsmen and bowlers on the same scale.&lt;br /&gt;&lt;br /&gt;The top bowlers, qual. 144 balls (i.e., six four-over spells):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name           balls runs  wkts  cred    avg   econ  eff econ  value-24&lt;br /&gt;Sohail Tanvir  211   210   21    -50,34  10,0  5,97  4,54      14,40&lt;br /&gt;GD McGrath     300   319   12    -22,89  26,6  6,38  5,92      8,87&lt;br /&gt;SM Pollock     276   301   11    -25,34  27,4  6,54  5,99      8,59&lt;br /&gt;IK Pathan      294   326   14    -31,11  23,3  6,65  6,02      8,49&lt;br /&gt;MF Maharoof    192   215   12    -21,09  17,9  6,72  6,06      8,32&lt;br /&gt;AB Dinda       234   260   9     -20,40  28,9  6,67  6,14      7,99&lt;br /&gt;DW Steyn       228   252   10    -8,49   25,2  6,63  6,41      6,93&lt;br /&gt;A Nehra        269   348   12    -50,57  29,0  7,76  6,63      6,02&lt;br /&gt;AB Agarkar     156   207   8     -33,11  25,9  7,96  6,69      5,81&lt;br /&gt;M Muralitharan 300   346   8     -8,83   43,3  6,92  6,74      5,59&lt;br /&gt;DJ Bravo       170   232   11    -37,64  21,1  8,19  6,86      5,12&lt;br /&gt;SR Watson      283   344   13    -19,57  26,5  7,29  6,88      5,05&lt;br /&gt;SK Warne       264   349   17    -42,63  20,5  7,93  6,96      4,71&lt;br /&gt;M Ntini        162   198   5     -8,34   39,6  7,33  7,02      4,46&lt;br /&gt;Shahid Afridi  180   225   9     -13,55  25,0  7,50  7,05      4,37&lt;/pre&gt;&lt;br /&gt;Sohail Tanvir has, of course, been the stand-out bowler of the IPL.  McGrath has been much talked about, but fellow metronome Shaun Pollock not so much.&lt;br /&gt;&lt;br /&gt;Tanvir's also been lucky, of course.  He's almost certainly not that good.  I'll pick up this theme a little bit in the next post.&lt;br /&gt;&lt;br /&gt;Lastly, some guys over at Rediff (e.g., &lt;a href="http://in.rediff.com/cricket/2008/may/05index.htm"&gt;here&lt;/a&gt;) have been doing what look to be good statistical analyses of the IPL.  Unfortunately, they seem to sweep all the calculations under the carpet; if anyone happens to know how they calculate the MVP index, please share with us. (&lt;b&gt;Edit&lt;/b&gt;: &lt;a href="http://www.rediff.com/cricket/2007/sep/21mvp.htm"&gt;Here&lt;/a&gt;'s a description of it.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1010113248792908768?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1010113248792908768/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1010113248792908768' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1010113248792908768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1010113248792908768'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/rating-ipl-bowling.html' title='Rating IPL bowling'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_iplresources.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8884467473579389186</id><published>2008-05-31T07:53:00.000+02:00</published><updated>2008-05-31T07:54:15.227+02:00</updated><title type='text'>IPL results bits and pieces</title><content type='html'>The league stage of the IPL is over, and so it's time to start looking back at it.  This post looks at some overall results.&lt;br /&gt;&lt;br /&gt;Firstly, let's re-visit those &lt;a href="http://pappubahry.blogspot.com/2008/05/ipl-so-far.html"&gt;blog predictions&lt;/a&gt;, now with the final league standings:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Actual           Me          Q           Arjwiz&lt;br /&gt;1. Rajasthan     Rajasthan   =Delhi      Bangalore&lt;br /&gt;2. Punjab        Chennai     =Kolkata    =Delhi&lt;br /&gt;3. Chennai       Delhi       Deccan      =Kolkata&lt;br /&gt;4. Delhi         Deccan      =Chennai    =Deccan&lt;br /&gt;5. Mumbai        Bangalore   =Punjab     Chennai&lt;br /&gt;6. Kolkata       Kolkata     Mumbai      Punjab&lt;br /&gt;7. Bangalore     Punjab      Bangalore   Mumbai&lt;br /&gt;8. Deccan        Mumbai      Rajasthan   Rajasthan&lt;/pre&gt;&lt;br /&gt;In terms of Pearson's rho (1: perfectly right, -1: perfectly wrong), I won with a score of 0,33, followed by Q at -0,33 and Arjwiz -0,76.  Arjwiz is the only one of us to get a significant result.  Unfortunately for him, it's in the wrong direction.  &lt;br /&gt;&lt;br /&gt;Now let's compare the two halves of the IPL.  There are various ways of doing this, and I'm not really sure which is the best.  First up, home and away wins:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;team    home  away&lt;br /&gt;Ban     1     3&lt;br /&gt;Che     3     5&lt;br /&gt;Dec     0     2&lt;br /&gt;Del     4,5   3&lt;br /&gt;Kol     4     2,5&lt;br /&gt;Mum     4     3&lt;br /&gt;Pun     6     4&lt;br /&gt;Raj     7     4&lt;/pre&gt;&lt;br /&gt;If IPL matches are essentially just coin tosses, then the correlation between the two columns should be around zero.  The results are actually correlated more strongly than I would have expected &amp;mdash; r = 0,49.  To minimise any potential differences in home advantage between teams (not that you'd really expect too many; overall, home teams won 29 out of 55 games), I also split the matches into two round-robins, with one group having four home games and the other three (for each team).  That gave r = 0,28, though there are many more ways of splitting up the games.  Probably I should get the computer to do all of them and find the average.&lt;br /&gt;&lt;br /&gt;Anyway, it looks like IPL cricket is not just a coin-toss game, though just how much of the results is luck-based will take a few years to work out properly.  The positive correlations that we've seen would happen by chance about once every six or so tournaments.&lt;br /&gt;&lt;br /&gt;Of the 55 matches, the team batting second won 32 times.  That's a bit more than a standard deviation above the expectation of 50%, so nothing significant.  Until a more detailed analysis comes along, it seems safest to bowl first.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8884467473579389186?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8884467473579389186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8884467473579389186' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8884467473579389186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8884467473579389186'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/ipl-results-bits-and-pieces.html' title='IPL results bits and pieces'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1355825524283422551</id><published>2008-05-25T04:29:00.000+02:00</published><updated>2008-05-25T04:34:05.464+02:00</updated><title type='text'>Batting well with a batsman</title><content type='html'>That's right people, a new post!  Now that I'm back at uni, I have less time for cricket analysis, so I'll be aiming to get about one post per week, maybe two if I find something simple and interesting on Statsguru.&lt;br /&gt;&lt;br /&gt;Some long time ago at &lt;a href="htpp://www.wellpitched.com/"&gt;Well Pitched&lt;/a&gt;, there was a discussion on great batsmen and how they supposedly "lift" their teammates when batting with them.  I was sceptical about this being a real effect.  Analysing it properly will take at least a couple of posts, and this is the first one.&lt;br /&gt;&lt;br /&gt;Getting data on partnerships from summary scorecards always carries with it the problem of retired hurts.  It's not just a question of definition (if an opener retires hurt before the fall of the first wicket, do you have two first-wicket partnerships or a three-way partnership?).  The problem is that retired hurts are not always recorded on scorecards if the batsman in question returned to the crease.  (Certainly in my lazy database, this is never recorded.)  So it sometimes happens that you look at the FOW's to work out which partnerships happened and how many runs each was worth, subtract and you find that a batsman contributed negative runs during some passages of play.&lt;br /&gt;&lt;br /&gt;So before I started gathering partnership data, I did my best to get rid of innings where there was a retired hurt.  Innings were deleted if:&lt;br /&gt;&lt;br /&gt;- a batsman finished retired hurt;&lt;br /&gt;&lt;br /&gt;- the number three was the first wicket to fall, etc.;&lt;br /&gt;&lt;br /&gt;- reconstructing the FOW's from the minutes batted by each batsman (where possible) disagreed with the actual FOW's;&lt;br /&gt;&lt;br /&gt;- any partnerships required negative runs from one batsman to make sense.&lt;br /&gt;&lt;br /&gt;Point three is an interesting one, because careful traces through of the minutes batted can identify both the presence of retired hurts and also of errors in the minutes as given.  One curious error is in &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/61/61211.html"&gt;this Test&lt;/a&gt;, in which Manoj Prabhakar apparently batted for 304 minutes, while the rest of the batsmen combined for only 274.&lt;br /&gt;&lt;br /&gt;Anyway, the above procedure isn't perfect &amp;mdash; it won't pick up all retired hurts, especially if the minutes aren't recorded, and there are probably some innings where the anomalous minutes are just scorer/Cricinfo/CricketArchive errors and not actually showing retired hurts.  But it seems to do a reasonable job, and about 430 innings were removed.&lt;br /&gt;&lt;br /&gt;Now to the analysis proper.  For each batsman and each innings, I took the runs in his partnerships and subtracted off his own score, so that we're left with the runs scored by his partners and extras while he was at the crease.  Then you count how many times he saw his partners get out, and you have the average of his partners (plus extras) when he was at the crease.&lt;br /&gt;&lt;br /&gt;To get an expected average, I added up the averages of all his partners, and divided by the total number of partnerships.&lt;br /&gt;&lt;br /&gt;Then divided the actual partner-average by the expected partner-average, and you get a measure of how well people bat with him, relative to their careers.&lt;br /&gt;&lt;br /&gt;When you do this, you find that players with short careers have much more variation than players with longer careers.  Graph (qual. 20 innings, batsman's average at least 30):&lt;br /&gt;&lt;br /&gt;&lt;img title="Look at those ugly decimal points." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/partneravgs.png" alt="Look at those ugly decimal points."&gt;&lt;br /&gt;&lt;br /&gt;(The average ratio across all these batsmen is about 1,1.)&lt;br /&gt;&lt;br /&gt;Now, what I think I &lt;i&gt;should&lt;/i&gt; do at this point is to work out how reliable the statistic is (i.e., how much of it is skill, and how much just luck), and then regress each player to the mean appropriately.  (I'm learning from the baseballers, who do this sort of thing a lot.)  But working out how reliable this stat is will require some thought (you're welcome to do the thinking for me).  One problem is that part of what it measures might be called flat-track-bully-ness.  If a batsman does disproportionately well on flat tracks, then it might be the case that he is part of many big partnerships which bloat his partner average.  &lt;br /&gt;&lt;br /&gt;But I will ignore this for now, and instead find z-scores.  I ordered the batsmen in order of innings batted, found the moving standard deviation of the next 30 ratios, and then fitted a curve to it.  It goes a bit like 1,3/sqrt(no. inns), for those interested.  Then for each batsman, you use this as the standard deviation, and find how many standard deviations from the overall mean his ratio is.&lt;br /&gt;&lt;br /&gt;(In terms of the reliability, the question is: Does being a standard deviation above the mean after 20 innings mean that you'll probably be a standard deviation above the mean after 200 innings?)&lt;br /&gt;&lt;br /&gt;In the table below are the batsman's average, innings batted (having excised team innings probably involving retired hurts), runs by partners (incl. extras), total number of partnerships, expected partner average, actual partner average, ratio, z-score.  Note that the partnership average is not just the partner runs divided by the number of partnerships &amp;mdash; it's the partner runs divided by the number of times the batsman saw partners dismissed.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                            partner-avg&lt;br /&gt;name            avg   inns  p-runs  pships  exp   act   ratio z&lt;br /&gt;RT Ponting      58,6  183   9622    308     43,7  63,7  1,46  3,94&lt;br /&gt;RL Dias         36,7  33    1260    54      28,9  54,8  1,89  3,67&lt;br /&gt;DS Lehmann      45,0  42    1733    62      43,8  78,8  1,80  3,66&lt;br /&gt;DJ Bravo        33,0  44    1601    70      33,7  59,3  1,76  3,52&lt;br /&gt;RWT Key         31,0  25    895     36      39,1  74,6  1,91  3,24&lt;br /&gt;RT Robinson     36,4  45    1971    72      37,0  61,6  1,67  3,06&lt;br /&gt;HH Dippenaar    30,1  60    2237    88      43,0  67,8  1,58  2,96&lt;br /&gt;ME Trescothick  43,8  136   5890    234     38,8  54,5  1,41  2,88&lt;br /&gt;Shoaib Mohammad 44,3  65    3673    123     37,1  56,5  1,52  2,73&lt;br /&gt;G Pullar        43,9  44    1898    67      43,7  70,3  1,61  2,70&lt;br /&gt;CL Cairns       33,5  97    2815    158     30,0  42,7  1,42  2,55&lt;br /&gt;FA Iredale      36,7  22    930     42      25,1  44,3  1,76  2,48&lt;br /&gt;V Sehwag        53,8  82    2794    128     39,7  57,0  1,44  2,44&lt;br /&gt;Javed Miandad   52,6  172   8715    335     35,8  47,6  1,33  2,42&lt;br /&gt;MLC Foster      30,5  23    624     26      45,2  78,0  1,72  2,39&lt;br /&gt;GC Smith        49,5  107   4905    191     40,0  55,1  1,38  2,31&lt;br /&gt;Habibul Bashar  30,9  96    2775    189     21,3  29,5  1,38  2,22&lt;br /&gt;M Prabhakar     32,7  57    2064    89      35,8  51,6  1,44  2,06&lt;br /&gt;CG Greenidge    44,7  175   7559    301     41,8  54,0  1,29  2,00&lt;br /&gt;AH Jones        44,3  71    3560    144     31,9  44,5  1,39  1,97&lt;/pre&gt;&lt;br /&gt;Make of that what you will....&lt;br /&gt;&lt;br /&gt;The bottom-end, those who apparently make their partners bat badly:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                            partner-avg&lt;br /&gt;name            avg   inns  p-runs  pships  exp   act   ratio z&lt;br /&gt;Saeed Anwar     45,5  84    3287    194     34,3  29,3  0,86  -1,92&lt;br /&gt;DJ Cullinan     44,2  111   3965    217     38,3  33,9  0,89  -1,95&lt;br /&gt;WR Hammond      58,5  129   6160    284     40,1  36,0  0,90  -1,98&lt;br /&gt;RA McLean       30,3  66    1049    105     31,0  25,0  0,81  -2,02&lt;br /&gt;FE Woolley      36,1  92    2404    162     37,0  31,2  0,84  -2,10&lt;br /&gt;SM Pollock      32,3  151   3704    249     30,3  27,2  0,90  -2,16&lt;br /&gt;JT Tyldesley    30,8  54    1655    129     29,0  21,8  0,75  -2,16&lt;br /&gt;AG Chipperfield 32,5  20    431     52      24,4  12,3  0,50  -2,19&lt;br /&gt;MA Noble        30,3  70    2277    162     29,7  23,0  0,77  -2,30&lt;br /&gt;WJ Cronje       36,4  105   3764    219     36,7  30,6  0,83  -2,32&lt;/pre&gt;&lt;br /&gt;Well if Hansie Cronje coming last on this statistic isn't the most appropriate thing I've ever put on this blog, then I don't know what is!  Good to see his worshipper Shaun Pollock also down there.&lt;br /&gt;&lt;br /&gt;Two names mentioned in the Well Pitched discussion were Steve Waugh and Inzamam-ul-Haq.  They are at z = -1,22 and z = -0,40 respectively.&lt;br /&gt;&lt;br /&gt;When I next attack this problem, I will also check to see if there are any patterns with batting position, and also look at batting with the tail.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1355825524283422551?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1355825524283422551/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1355825524283422551' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1355825524283422551'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1355825524283422551'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/batting-well-with-batsman.html' title='Batting well with a batsman'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_partneravgs.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1270460557817012331</id><published>2008-05-17T23:49:00.000+02:00</published><updated>2008-05-17T23:51:49.204+02:00</updated><title type='text'>Back in Australia</title><content type='html'>Sorry for the interruption to posting.  I'm now back in Brisbane, and things should be organised enough to return to cricket blogging in a couple of days.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1270460557817012331?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1270460557817012331/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1270460557817012331' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1270460557817012331'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1270460557817012331'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/back-in-australia.html' title='Back in Australia'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-9140525931416927281</id><published>2008-05-12T13:19:00.002+02:00</published><updated>2008-05-12T14:19:44.294+02:00</updated><title type='text'>Learning from baseball: Pitchf/x</title><content type='html'>While we're on the subject of baseball, I thought I'd outline a simple idea used in baseball that would be useful and fun in cricket.  In short: put Hawkeye data on the web for anyone to download.&lt;br /&gt;&lt;br /&gt;In Major League Baseball, they have a system called Pitchf/x, which we can basically think of as Hawkeye.  They don't have it at every game (only about a quarter, I think), but since there are over a thousand games a season, that's still a lot of pitching data.  The raw data gets put on the MLB website, and you can download big pitch-by-pitch tables, with each pitch described by release point, start speed, end speed, break length, break angle, etc.&lt;br /&gt;&lt;br /&gt;Classifying the pitch type can be difficult, but by using enough of the variables in the table, people who've studied this problem are getting reasonable results (for an introduction to it, see &lt;a href="http://www.hardballtimes.com/main/article/pitch-identification-tutorial/"&gt;here&lt;/a&gt;).  Here's an example, taken from &lt;a href="http://www.hardballtimes.com/main/article/anatomy-of-a-player-jake-peavy/"&gt;this article&lt;/a&gt; on Jake Peavy by Pitchf/x'er Josh Kalk:&lt;br /&gt;&lt;br /&gt;&lt;img title="Pretty colours!" src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/Jake_Peavy.gif" alt="Pretty colours!"&gt;&lt;br /&gt;&lt;br /&gt;If you have a look at the linked article, you'll see other graphs, plotting different variables.&lt;br /&gt;&lt;br /&gt;It's a gold mine for baseball analysis, and it would be the same in cricket.  There are all sorts of things you could look at, at the level of an individual bowler, or looking at the characteristics of the ground &amp;mdash; length of the ball, amount of swing, amount of turn, how much bounce there is in the pitch, etc.&lt;br /&gt;&lt;br /&gt;To get it to work, we'd want something like the following recorded for each ball (it may be possible to make this more efficient with some knowledge of cricket ball physics, but this should give the idea):&lt;br /&gt;&lt;br /&gt;bowler, batsman, age of ball, did ball hit bat?, number runs scored off the ball or type of wicket, and then x-, y-, z-components of position and velocity at: release point, just before pitching, just after pitching, contact with bat/batsman, crossing the stumps (projected if necessary).&lt;br /&gt;&lt;br /&gt;I will happily plug the first broadcaster that puts this data on the web.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-9140525931416927281?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/9140525931416927281/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=9140525931416927281' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9140525931416927281'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/9140525931416927281'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/learning-from-baseball-pitchfx.html' title='Learning from baseball: Pitchf/x'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_Jake_Peavy.gif' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3510683110889370953</id><published>2008-05-10T09:57:00.001+02:00</published><updated>2008-05-10T09:59:36.791+02:00</updated><title type='text'>John Buchanan and The Guardian article</title><content type='html'>Hello to those of you who've come here from &lt;a href="http://www.guardian.co.uk/sport/2008/may/08/cricket1"&gt;Andy Bull's piece&lt;/a&gt; in &lt;i&gt;The Guardian&lt;/i&gt;.  I hope you find something interesting here.  &lt;br /&gt;&lt;br /&gt;There are a couple of ideas in that article that I think are worthy of more detailed discussion.&lt;br /&gt;&lt;br /&gt;What John Buchanan says is interesting, but it seems to me that he's taking a purely coaching perspective.  He says:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;1) Ignore existing cricket statistics - these are just the 'outcome numbers' of a process of getting there.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;If I were a coach, I would probably agree with this.  Buchanan goes on to give the example of strike rate.  It would be no good a coach saying to a player, "Hey, you're averaging 35 at a strike rate of 70.  I want you to average 40 at a strike rate of 80."  You need to break batting down into its parts and make improvements at that level.&lt;br /&gt;&lt;br /&gt;That's where the ball-by-ball analysis comes in &amp;mdash; what Buchanan calls 'process numbers'.  (Buchanan is very big on processes, I gather.  I've seen him talk about them elsewhere.)  You look at the dot balls, try to improve shot selection on them, etc.  You hope that you'll end up scoring more runs at a higher rate.&lt;br /&gt;&lt;br /&gt;That's what the coach does.  From a &lt;i&gt;selection&lt;/i&gt; perspective, the outcome numbers are still going to be important.  No-one cares what your percentage of dot balls is if you average 25, and no batsman will hold down a spot in the national side with such a low outcome number.  Cricket games are won by the team that scores the most runs, and we shouldn't lose sight of that.  All the 'processes' work is no good if it doesn't improve averages (or strike rates, in limited overs cricket).  &lt;br /&gt;&lt;br /&gt;Now, there are times when process numbers might be useful in selection &amp;mdash; if a batsman has bad process numbers, then perhaps with coaching he might improve a lot more than a batsman who's already largely optimised his game.  I don't know.  Without seeing the figures involved and knowing what improvements are usually made, it's hard to say how useful such an approach would be.&lt;br /&gt;&lt;br /&gt;Now onto one of the questions Bull posed at the end of the column:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Could we see teams selected through statistical proof rather than the current woolly combination of gut instinct, vague notions about character and compromised measures such as batting averages?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I will be very surprised if, in the forseeable future, detailed statistics will be better at team selection than human experts with regular stats.  In terms of working out when to drop players, they might be.  (I said &lt;a href="http://pappubahry.blogspot.com/2008/04/slumps-is-there-problem-or-is-he-just.html"&gt;here&lt;/a&gt; that selectors are probably best off with their gut on dropping players.  Perhaps with detailed process stats you could do better, I don't know.)  &lt;br /&gt;&lt;br /&gt;But when it comes to finding the best players in domestic cricket, I doubt if a computer would do better than Duncan Fletcher, for example (if you haven't read &lt;a href="http://www.timesonline.co.uk/tol/sport/cricket/article3700869.ece"&gt;Andrew Strauss's thoughts&lt;/a&gt; on Fletcher, I recommend doing so).  Fletcher famously picked Michael Vaughan for the 1998/9 tour of South Africa on 'temperament'.  His record in county cricket was not great &amp;mdash; his first-class averages in the previous two seasons were 34 and 41.  His average for Yorkshire is still well under 40.  But despite that, in England colours he turned himself into a good batsman, doing better against Test sides than against county sides.&lt;br /&gt;&lt;br /&gt;Now, it's possible that with sufficient process numbers from his county games, you would be able to tell him apart from the rest of the county hacks averaging high 30's.  But I'd be surprised if it were so.  &lt;br /&gt;&lt;br /&gt;Obviously you'll want to be paying attention to stats when picking national sides &amp;mdash; you won't consider batsmen averaging under 30, and you'll certainly be looking at those averaging 60 &amp;mdash; but since the quality of the players is significantly lower in domestic cricket, you'll want humans watching them, gauging their technique and judging if they'll hold up against 90mph pace bowling or top-class spinners.&lt;br /&gt;&lt;br /&gt;They don't always get it right, of course, but I think that they do better than a computer (or a person looking only at numbers) would do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3510683110889370953?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3510683110889370953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3510683110889370953' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3510683110889370953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3510683110889370953'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/john-buchanan-and-guardian-article.html' title='John Buchanan and The Guardian article'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1832924859393185501</id><published>2008-05-06T12:01:00.001+02:00</published><updated>2008-05-06T12:01:30.139+02:00</updated><title type='text'>Australia batting first in ODI's</title><content type='html'>There's an interesting comment by &lt;a href="http://nestaquin.wordpress.com/"&gt;Nesta&lt;/a&gt; on my &lt;a href="http://pappubahry.blogspot.com/2008/05/maximising-runs-or-wins.html"&gt;rambly post&lt;/a&gt; about batting-first strategies.  Essentially, Nesta reckons that Australia have come close to perfecting the art of batting first in 50-over cricket.  &lt;br /&gt;&lt;br /&gt;Since there's much more scope for variation in batting-first strategies than batting-second strategies (in the latter, everyone know how many runs they need), you might conjecture that this will show up in the results.  And it looks like it does.&lt;br /&gt;&lt;br /&gt;I considered ODI's between the top eight sides in the 2000's.  I split them into day games and day-night games, because the two are markedly different (day games strongly favour the team batting second; day-night games favour the team batting first).&lt;br /&gt;&lt;br /&gt;In day games, Australia has won 73% of matches when batting first (ignoring no-results).  Second is Sri Lanka at 49% &amp;mdash; a whopping 24 percentage points!  Australia has won 78% of matches batting second, with South Africa second at 71% &amp;mdash; only seven percentage points behind.&lt;br /&gt;&lt;br /&gt;In day-night games, batting first: Aus 76%, South Africa 63%; batting second: Aus 62%, South Africa and Pakistan 55%.  Once again, a bigger difference in batting first results.&lt;br /&gt;&lt;br /&gt;So it does look like Australia have an advantage over their rivals when it comes to batting first, above and beyond their general cricket superiority.&lt;br /&gt;&lt;br /&gt;Now for some tables.  For each team, I give the number of matches (actually this column includes no-results because I was lazy when doing the copy-paste), the win fraction batting first, the win fraction batting second, and the ratio.  First up, day games:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;team          mats  1st   2nd   ratio&lt;br /&gt;Pakistan      41    0,40  0,37  0,92&lt;br /&gt;Australia     42    0,73  0,78  1,07&lt;br /&gt;Sri Lanka     53    0,49  0,61  1,25&lt;br /&gt;India         47    0,39  0,58  1,49&lt;br /&gt;West Indies   47    0,29  0,50  1,73&lt;br /&gt;South Africa  39    0,38  0,71  1,85&lt;br /&gt;New Zealand   39    0,29  0,62  2,14&lt;br /&gt;England       38    0,19  0,56  2,94&lt;/pre&gt;&lt;br /&gt;Only Pakistan does better batting first in day games, but that is probably noise, given where Pakistan is on the next table.  Australia is second, with only a small improvement when chasing.&lt;br /&gt;&lt;br /&gt;Day-nighters:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;team          mats  1st   2nd   ratio&lt;br /&gt;Sri Lanka     59    0,58  0,34  0,59&lt;br /&gt;Australia     70    0,76  0,62  0,82&lt;br /&gt;England       42    0,37  0,31  0,85&lt;br /&gt;South Africa  43    0,63  0,55  0,88&lt;br /&gt;India         50    0,42  0,39  0,92&lt;br /&gt;Pakistan      55    0,58  0,55  0,93&lt;br /&gt;West Indies   22    0,25  0,25  1,00&lt;br /&gt;New Zealand   41    0,43  0,43  1,01&lt;/pre&gt;&lt;br /&gt;Australia once again second &amp;mdash; it's interesting to see Sri Lanka in the top three in both tables as well.  Only New Zealand have a better record chasing in day-nighters.&lt;br /&gt;&lt;br /&gt;It's worth pointing out that this could do with a more detailed analysis &amp;mdash; Australian grounds may be more bat-first-friendly in day-nighters than others, which would explain Australia's high position in the second table.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1832924859393185501?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1832924859393185501/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1832924859393185501' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1832924859393185501'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1832924859393185501'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/australia-batting-first-in-odis.html' title='Australia batting first in ODI&apos;s'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5771091905884286314</id><published>2008-05-04T22:13:00.001+02:00</published><updated>2008-05-04T22:15:10.396+02:00</updated><title type='text'>Luck</title><content type='html'>I thought I'd simulate a double-round-robin tournament with eight teams, to model the IPL.  So Teams A to H each play 14 games.  Here is the final ladder, ordered by number of wins:&lt;br /&gt;&lt;br /&gt;C: 10&lt;br /&gt;F: 10&lt;br /&gt;B: 9&lt;br /&gt;G: 7&lt;br /&gt;H: 7&lt;br /&gt;D: 6&lt;br /&gt;A: 4&lt;br /&gt;E: 3&lt;br /&gt;&lt;br /&gt;Team E'll be looking for a new coach &amp;mdash; only three wins out of fourteen.... Anyway, as the title of this post will suggest, the result of each match was decided by a (virtual) coin toss.  The point here is that if all teams are perfectly evenly matched and results come down to the luck of the day, you'll still end up with teams at the top of the ladder having much better records, over 14 matches, than the teams at the bottom.&lt;br /&gt;&lt;br /&gt;Now of course there is skill involved in cricket, and some teams in the IPL are better than others.  But can we tell which team is the best just from the results?  Probably not from just one season (unless they put in a really dominant performance &amp;mdash; lots of wins, by big margins).  And more importantly, it'll be impossible to say how good each team actually is.  To explain this point, I'm going to borrow the notation from American sports (since that's how I think of it in my head &amp;mdash; much of what I write here can be found somewhere in the archives of &lt;a href="http://sabermetricresearch.blogspot.com/"&gt;this blog&lt;/a&gt; and &lt;a href="http://insidethebook.com/ee/"&gt;this blog&lt;/a&gt;).  A .500 team ("five hundred") is a team that wins 50% of its matches.  A 0.600 team wins 60%, and so on.&lt;br /&gt;&lt;br /&gt;To work out if a team is really a .600 team (say), you'd need an infinite number of matches to prove it.  Of course, we could get by with a large number &amp;mdash; just how large depends on how much luck is involved in each game.  The problem with T20 is that we don't know how much luck there is.  So we're going to be fumbling around in the dark somewhat &amp;mdash; once we've had a few seasons (to get enough data), we'll be able to look at the win-loss records of the teams and see if the how much greater the variance is than that expected by chance.&lt;br /&gt;&lt;br /&gt;I worked out some numbers for ODI's and Tests &lt;a href="http://pappubahry.blogspot.com/2008/03/meaningfulness-of-tests-and-odis.html"&gt;here&lt;/a&gt;; T20 will have more luck involved than fifty-over cricket, but the IPL complicates things as the foreigners are dominant, and there's only four of them per side.  If the long-term variance in win percentage is the same in the IPL as it is for ODI's (a big if), then you'll need each team to play about 17 or more games before the skill will demonstrably be playing a part in the results.&lt;br /&gt;&lt;br /&gt;One season of IPL isn't going to be enough.  In the coin-toss example above, every team was a .500 team.  Only G and H ended with .500 records.  Teams above them were lucky, teams below (especially E) were unlucky.&lt;br /&gt;&lt;br /&gt;If we look at the IPL table today, Rajasthan are at .833.  Are they genuinely an .833 team?  They could be.  Or they could be a .900 team that happened to lose one of their first six matches, or a .500 team that's had a bit of luck.&lt;br /&gt;&lt;br /&gt;Let's not forget, Zimbabwe beat Australia not long ago in a T20 game.  We should expect bad teams to win matches.  And sometimes, mediocre teams will string together a few wins on the trot.  Conversely, good teams will lose some.  Does anyone really believe that Deccan (Gilchrist, Afridi, et al.) is a .167 team?&lt;br /&gt;&lt;br /&gt;One way of seeing how much skill is involved will be to compare the first half of the tournament with the second, and see what correlation there is.  Unfortunately, the coming and going of lots of big stars will make this really muddy, but I'll still do it at the end of the tournament.&lt;br /&gt;&lt;br /&gt;So my message is, don't read too much into individual results.  Don't say that the team on top of the ladder is the best simply because they're coming first &amp;mdash; they might be the best team, but they might just be lucky.  Go and read &lt;a href="http://blogs.cricinfo.com/tourdiaries/archives/2008/04/momentum_is_ove.php"&gt;this excellent piece&lt;/a&gt; by Lawrence Booth at Cricinfo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5771091905884286314?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5771091905884286314/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5771091905884286314' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5771091905884286314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5771091905884286314'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/luck.html' title='Luck'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8282947723026062076</id><published>2008-05-03T12:41:00.002+02:00</published><updated>2008-05-03T13:01:34.706+02:00</updated><title type='text'>The IPL so far</title><content type='html'>Each team in the IPL has now played five matches.  I thought I'd have a look at the points table.  Really the only reason I'm doing this is because I end up looking prescient, and if I let it go too long the results might start turning against me.&lt;br /&gt;&lt;br /&gt;Near the end of &lt;a href="http://pappubahry.blogspot.com/2008/02/ipl-player-auction.html"&gt;this post&lt;/a&gt;, I came up with some half-baked ratings on how clever each team's bidding was.  Contrary to just about everyone else, Jaipur (ie, Rajasthan) came out best.  I didn't even really believe it myself, so I don't want you to go back and read the paragraph afterwards in that post.  Just pay attention to the numbers.&lt;br /&gt;&lt;br /&gt;Q gave his auction ratings &lt;a href="http://www.wellpitched.com/2008/02/bid-o-meter-who-were-smartest-bidders.html"&gt;here&lt;/a&gt;, while Arjwiz gave his &lt;a href="http://cricketstatistics.in/2008/03/ipl-auction-ratings-teams.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;b&gt;Actual           Me          Q           Arjwiz&lt;/b&gt;&lt;br /&gt;1. Delhi         Rajasthan   =Delhi      Bangalore&lt;br /&gt;2. Chennai       Chennai     =Kolkata    =Delhi&lt;br /&gt;3. Rajasthan     Delhi       Deccan      =Kolkata&lt;br /&gt;4. Punjab        Deccan      =Chennai    =Deccan&lt;br /&gt;5. Kolkata       Bangalore   =Punjab     Chennai&lt;br /&gt;6. Deccan        Kolkata     Mumbai      Punjab&lt;br /&gt;7. Mumbai        Punjab      Bangalore   Mumbai&lt;br /&gt;8. Bangalore     Mumbai      Rajasthan   Rajasthan&lt;/pre&gt;&lt;br /&gt;I've got the top three!  Albeit in the wrong order, because of net run rate.  In terms of Pearson's rho (-1: perfectly wrong, 1: perfectly right), I'm at 0.62, Q's at 0.34, and Arjwiz is -0.27.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8282947723026062076?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8282947723026062076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8282947723026062076' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8282947723026062076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8282947723026062076'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/ipl-so-far.html' title='The IPL so far'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3225235788062736038</id><published>2008-05-01T21:33:00.000+02:00</published><updated>2008-05-01T21:34:17.107+02:00</updated><title type='text'>Maximising runs or wins</title><content type='html'>In a &lt;a href="http://nestaquin.wordpress.com/2008/05/01/bangalore-buckle/"&gt;post&lt;/a&gt; at 99.94, I took the comments thread off on a long tangent that was only just related to the original post.&lt;br /&gt;&lt;br /&gt;It got me thinking about batting strategies (at a conceptual level) in limited-overs cricket.  Batting second, it's simple: choose the strategy to maximise your chance of reaching the target.  Every team does this instinctively &amp;mdash; chasing 350, they go for broke, and often end up losing by a lot.&lt;br /&gt;&lt;br /&gt;Batting first, I'm not sure what the optimal strategy is.  Instinctively, I at first thought that you should choose the strategy to maximise the expected number of runs that you score.  But scoring runs isn't actually the end goal &amp;mdash; it's winning the game.  And increasing the average number of runs you score won't always improve your win/loss ratio.&lt;br /&gt;&lt;br /&gt;To take an extreme example, suppose you're a really bad team like Bangladesh, up against a team like Australia.  Whenever Bangladesh bats first, they choose the run-maximising strategy.  The results might be a bell curve centred around 180.  So a lot of scores around 170-190, a few past 200, a few below 160, etc.  &lt;br /&gt;&lt;br /&gt;Now Australia has no problem chasing any of those.  Australia's only going to have problems when the target's up over 250.  So while the Bangladeshi averages will be best-served by going with the run-maximising strategy, they may end up losing every game.&lt;br /&gt;&lt;br /&gt;On the other hand, if they play more aggressively, then sometimes their batsmen will have a bit of luck and they'll end up with a big score.  In their long series of matches with Australia, they'll have loads of heavy defeats, after making scores like 120 and 150 and so on, but every now and then, they'll make 250 and have a chance at winning.  So their averages will suffer, but their win/loss ratio will improve.&lt;br /&gt;&lt;br /&gt;It'd be a public relations disaster, of course &amp;mdash; all those thrashings.&lt;br /&gt;&lt;br /&gt;If you've got two more evenly-matched sides, choosing the win-maximising strategy when batting first becomes problematic.  Maybe you've studied the opposition's batting and concluded that you're best-off aiming for 270+.  But maybe the pitch is not so good, and you don't know how to adjust that 270 score.  You'll probably go back to a run-maximising strategy.&lt;br /&gt;&lt;br /&gt;Nevertheless, I think with a very careful analysis, there's scope for improving win/loss ratios.  I think it's most applicable in T20, because it's so short.  If you bat first and lose early wickets, what do you do?  Go for broke (hoping for 140 but probably getting 90), perhaps, rather than slowly batting out the overs (and getting 120)?  It'll probably need a few years of IPL before we have enough data to say.&lt;br /&gt;&lt;br /&gt;On an unrelated topic, the latest post &lt;a href="http://sportstats.com.au/bloghome.html"&gt;chez Z-Score&lt;/a&gt; has a teaser question: What is the highest Test partnership for a pair who only batted once together in Tests?  The hints are that they aren't Australian, and that the partnership is higher than 320.  For those who don't want to search for it themselves, feed this into &lt;a href="http://www.rot13.com/index.php"&gt;ROT13&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;yrauhggbanaqznhevpryrlynaqjuraratynaqznqrbireavaruhaqerq.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3225235788062736038?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3225235788062736038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3225235788062736038' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3225235788062736038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3225235788062736038'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/05/maximising-runs-or-wins.html' title='Maximising runs or wins'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-949720148590725460</id><published>2008-04-27T17:14:00.001+02:00</published><updated>2008-04-27T17:14:45.917+02:00</updated><title type='text'>WG Grace</title><content type='html'>WG Grace had a very long career &amp;mdash; he played a long time after his peak.  That's why, when looking at his career averages (unadjusted first-class averages from CricketArchive: bat 39,45; bowl 18,14), you don't see why he's such a huge figure in the history of the game.  His aggregates are huge, sure, but it looks like he was a great who played for a long time, rather than a rival to Bradman as the greatest ever.&lt;br /&gt;&lt;br /&gt;To see where this latter perception comes from, I plotted his cumulative adjusted averages (weighting innings according to the quality of the attack, relative to an overall average of 24,5) against time.  I considered only first-class matches in England (since I know that that part of my database works &amp;mdash; I didn't want to spend half a week debugging Australian matches that I haven't tested yet).&lt;br /&gt;&lt;br /&gt;&lt;img title="I plotted all matches for each season at the same x-value, which is why the curve is funny." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/gracebatting.png" alt="I plotted all matches for each season at the same x-value, which is why the curve is funny."&gt;&lt;br /&gt;&lt;br /&gt;&lt;img title="I don't know why Excel made a funny loop at around 1866." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/gracebowling.png" alt="I don't know why Excel made a funny loop at around 1866."&gt;&lt;br /&gt;&lt;br /&gt;To give a feel for the batting scale: Bradman is at 98,9; Headley 65,8; Ranji 60,5; Merchant 56,3 (those are the top four); Mike Hussey 52,2; Barry Richards 45,3.&lt;br /&gt;&lt;br /&gt;Bowling scale: Murali 13,7; Lindwall 14,1 (top two); Darren Gough 20,9; Eddie Hemmings 25,6.&lt;br /&gt;&lt;br /&gt;All references to averages below are adjusted ones.&lt;br /&gt;&lt;br /&gt;Grace's (adjusted) batting average peaked at the end of the 1873 season at 92,8; at this time he had scored over 10000 first-class runs.  Batting doesn't get much more Bradmanesque.  By 1880, he had 19560 runs at 74,5.  This also marks the start of his decline as a bowler.  By the end of 1880, he had 1335 wickets at 21,9.  If Grace had stopped playing then, his ratio of batting average to bowling average (3,4) would have been well clear of second place (Keith Miller at 2,8).  &lt;br /&gt;&lt;br /&gt;Even in 1886, though (more than 20 years after the start of his first-class career), his batting average was higher than Headley's.&lt;br /&gt;&lt;br /&gt;The decline in batting average is very marked &amp;mdash; it almost falls to 50 by the end of his career.  The rise in his bowling average is much more gentle, because as he got older he bowled less, not bowling more than 4000 balls in a season after 1888.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-949720148590725460?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/949720148590725460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=949720148590725460' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/949720148590725460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/949720148590725460'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/wg-grace.html' title='WG Grace'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_gracebatting.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-251174080037498252</id><published>2008-04-26T20:21:00.003+02:00</published><updated>2008-04-26T20:24:45.254+02:00</updated><title type='text'>Bowler workloads</title><content type='html'>Over at &lt;a href="http://cricketfansforum.net/"&gt;CFF&lt;/a&gt;, there was a debate over spinners' averages.  One poster said that spinners bowl disproportionately many overs on flat pitches, while lazy pacemen rotate at the other end.  This has the effect of bloating out spinners' averages unfairly.  Another poster responded by saying that spinners also bowl disproportionately many overs on raging turners, which would help their averages.&lt;br /&gt;&lt;br /&gt;Which factor is the dominant one?  To answer this, I considered every innings, and scaled each bowlers' figures so that he effectively bowled a quarter of the overs.  So, for instance, if a team batted for 100 overs, and one bowler took 1/80 from 20 overs, that would be scaled up to 1,25/100 from 25 overs.  If another bowler took 1/60 from 30, it would become 0,83/50.  So each bowler's average in any given innings won't change, but we'll see any effects of bowling or not bowling in tough or easy conditions.&lt;br /&gt;&lt;br /&gt;Now, sometimes a bowler might, say, bowl one over in an innings and take a wicket in it, or bowl one over and get hit for 15 runs.  Obviously it's not realistic that he would have taken 25 wickets or gone for 375 runs from 25 overs, so if the number of balls bowled by a bowler was less than 60, I didn't do any adjustment.  It's a bit arbitrary where you put the cut-off, but pushing it back to 30 balls doesn't change the overall trends.&lt;br /&gt;&lt;br /&gt;One very stunning result comes out of this analysis.  Every major wicket-taker (at least 100 Test wickets), except for Vanburn Holder, has his average increase.  I'm still wondering a little bit if it's a bug in my code, but since I can't find one, and it passes various sanity checks, I'm reasonably confident that these results are true.&lt;br /&gt;&lt;br /&gt;Assuming that I haven't made some silly mistake, there's a simple explanation for this phenomenon &amp;mdash; captains can tell which bowlers are being effective on a given day and which ones aren't, and they make the effective bowlers bowl more overs.  There could also be a bit of luck involved &amp;mdash; say a bowler is a bit unlucky and goes for fifteen wicketless overs.  If he'd been given another ten, he might have picked up a wicket or two.  But since he hadn't, he didn't get to bowl again.&lt;br /&gt;&lt;br /&gt;Let's have a look at the top and bottom of the table, ordered by the difference in weighted (by the average of the batsmen dismissed) average.  Qualification: 100 wickets.  Columns of the table are: wickets, regular average, weighted average, scaled regular average, scaled weighted average, difference between scaled regular average and regular average, difference between scaled weighted average and weighted average.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                          avg      scaled avg    diff&lt;br /&gt;name             wkts  reg   wtd   reg   wtd   reg   wtd&lt;br /&gt;VA Holder        109   33,3  37,0  33,1  36,8  -0,2  -0,2&lt;br /&gt;DA Allen         122   31,0  30,1  32,4  30,6  1,4   0,4&lt;br /&gt;PM Pollock       116   24,2  26,5  24,9  27,0  0,7   0,5&lt;br /&gt;M Dillon         131   33,6  31,4  34,3  32,1  0,7   0,6&lt;br /&gt;AN Connolly      102   29,2  27,1  29,3  27,9  0,0   0,7&lt;br /&gt;CEH Croft        125   23,3  23,3  24,3  24,1  1,0   0,8&lt;br /&gt;NAT Adcock       104   21,1  23,5  22,1  24,3  1,0   0,9&lt;br /&gt;M Muralitharan   724   21,8  24,6  22,5  25,5  0,6   0,9&lt;br /&gt;MW Tate          155   26,2  26,3  26,9  27,3  0,8   1,0&lt;br /&gt;WJ O'Reilly      144   22,6  22,8  23,1  23,8  0,5   1,0&lt;br /&gt;---&lt;br /&gt;C Blythe         100   18,6  23,8  22,5  28,9  3,9   5,1&lt;br /&gt;H Trumble        141   21,8  26,5  25,8  31,6  4,0   5,1&lt;br /&gt;Mohammad Rafique 100   40,8  40,6  45,4  45,8  4,7   5,2&lt;br /&gt;N Boje           100   42,7  38,9  48,4  44,0  5,8   5,2&lt;br /&gt;Intikhab Alam    125   36,0  37,4  41,8  42,9  5,8   5,5&lt;br /&gt;W Rhodes         127   27,0  32,7  31,4  38,3  4,4   5,6&lt;br /&gt;J Briggs         118   17,8  32,4  21,9  38,3  4,1   5,9&lt;br /&gt;AF Giles         143   40,6  37,7  46,9  43,7  6,3   6,0&lt;br /&gt;TE Bailey        132   29,2  30,6  35,6  37,1  6,4   6,4&lt;br /&gt;AL Valentine     139   30,3  33,2  35,9  39,6  5,6   6,4&lt;/pre&gt;&lt;br /&gt;Those near the top of the table are the ones who bowl in the tough conditions or don't bowl so often in favourable ones; those near the bottom don't bowl so much in the tough conditions but do when things are going well.&lt;br /&gt;&lt;br /&gt;The results aren't what I would have expected.  Murali's position near the top is easy to explain &amp;mdash; he does a huge amount of bowling for Sri Lanka come what may.  Generally, though, it's pacemen at the top and spinners at the bottom.&lt;br /&gt;&lt;br /&gt;The keen-eyed amongst you will note that, with the exception of Murali, none of the bowlers listed above took more than 155 wickets.  There is much more variation for bowlers with lower numbers of wickets:&lt;br /&gt;&lt;br /&gt;&lt;img title="I should probably bin these and find z-scores." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/scaledbowlingworkload.png" alt="I should probably bin these and find z-scores."&gt;&lt;br /&gt;&lt;br /&gt;I've shown a quadratic fit because it looks better than a linear one.  The general trend is clearly downward &amp;mdash; bowlers who take lots of wickets tend to get hidden less from flat pitches than bowlers who don't.&lt;br /&gt;&lt;br /&gt;Now of course, considering only bowlers with large career wicket hauls will pick out good bowlers, but what about great bowlers from the olden days who didn't play so many Tests?  If you take the ratio of scaled weighted average to weighted average, and plot against the weighted average, you get only a very very slight positive correlation (y = 0,0008x + 1,07; R-squared = 0,01).  &lt;br /&gt;&lt;br /&gt;(If you plot the difference, rather than the ratio, you get a strong positive correlation.  I think it's more accurate to work with ratios here.)&lt;br /&gt;&lt;br /&gt;So, my conclusions so far are:&lt;br /&gt;&lt;br /&gt;- Most of the variation is to do with small samples.&lt;br /&gt;- But better bowlers do have slightly smaller differences (or ratios) when you scale their workloads to one quarter of each innings' overs.&lt;br /&gt;&lt;br /&gt;Murali definitely belongs near the top of the table &amp;mdash; such a small difference after taking over 700 wickets is clearly a genuine (and easily explained) trait of his bowling, and not statistical noise.  This looks to me like a good way of trying to answer the question, "How would Warne have done if he had Murali's workload?"  I don't think it's reasonable to do this for most pairs of bowlers, but if they have a large number of wickets (as with Warne and Murali), then the difference between a pair of players is likely to be genuine.  And in this case, Murali comes out easily the better.  &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;               wtd avg   sc wtd avg&lt;br /&gt;M Muralidaran  24,6      25,5&lt;br /&gt;SK Warne       27,9      30,7&lt;/pre&gt;&lt;br /&gt;Now to the question that started this all &amp;mdash; spinners v pacemen.  For pacemen, the average ratio of scaled weighted average to weighted average is 1,08.  For spinners it's 1,11.  So it looks like spinners getting to bowl on raging turners is a bigger factor for their averages than having to shoulder the workload on flat tracks.  Doing the diff v wkts plot as above for spinners and quicks separately clearly shows the quicks (on average) having smaller differences across all lengths of career.&lt;br /&gt;&lt;br /&gt;Lastly, if you only scale downwards (i.e., if they bowled more than a quarter of the overs, then scale back to a quarter, else do nothing), then the ratios become 1,05 for quicks and 1,08 for spinners.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-251174080037498252?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/251174080037498252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=251174080037498252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/251174080037498252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/251174080037498252'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/bowler-workloads.html' title='Bowler workloads'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_scaledbowlingworkload.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1758299359642342375</id><published>2008-04-22T21:44:00.001+02:00</published><updated>2008-04-22T21:51:56.053+02:00</updated><title type='text'>More on getting your eye in</title><content type='html'>I've left things too late for a comprehensive post, but since I'm disappearing again for a few days (Toulouse and Carcassonne this time), I thought I'd do a little post on those effective average curves without coming to any sweeping conclusions.&lt;br /&gt;&lt;br /&gt;Some of the fits seem to have fallen into a local minimum that doesn't describe the effective average curve very well.  Most obvious of these is Mark Richardson, who apparently always bats like he averages 57 but really only averages 45 (eyeballing his empirical hazard function, I think his effective average starts high and decreases).  Such clearly wrong curves seem to be pretty rare, so hopefully they don't poison the overall trends too much.&lt;br /&gt;&lt;br /&gt;Best and worst on nought (qualifications: average of 35, and some minimum number of innings, probably 50):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          µ(0)  µ(1)  µ(10) µ(30) avg&lt;br /&gt;MH Richardson 57,4  57,4  57,4  57,4  44,8&lt;br /&gt;AN Cook       53,3  49,5  46,5  44,3  43,5&lt;br /&gt;AL Hassett    52,1  52,1  52,1  48,0  46,6&lt;br /&gt;CL Walcott    47,1  47,2  50,1  62,8  56,7&lt;br /&gt;CH Lloyd      42,6  46,7  47,7  48,3  46,7&lt;br /&gt;H Sutcliffe   41,0  73,8  75,0  75,6  60,7&lt;br /&gt;Saeed Ahmed   39,1  41,3  41,3  41,3  40,4&lt;br /&gt;CC McDonald   36,8  41,1  41,1  41,1  39,3&lt;br /&gt;PE Richardson 35,8  36,8  36,8  36,8  37,5&lt;br /&gt;GM Ritchie    35,8  16,9  33,4  34,9  35,2&lt;br /&gt;---&lt;br /&gt;DL Amiss      7,8   29,4  39,6  46,2  46,3&lt;br /&gt;NS Sidhu      7,7   37,0  46,8  52,6  42,1&lt;br /&gt;MW Gatting    7,6   26,2  37,1  44,4  35,6&lt;br /&gt;VL Manjrekar  7,4   42,5  42,5  42,5  39,1&lt;br /&gt;DM Jones      7,1   35,7  46,1  52,2  46,6&lt;br /&gt;FMM Worrell   6,9   37,6  51,5  60,2  49,5&lt;br /&gt;JA Rudolph    6,9   23,0  33,9  41,4  36,2&lt;br /&gt;FE Woolley    6,5   41,1  44,3  46,0  36,1&lt;br /&gt;CG Borde      6,5   36,7  41,1  43,5  35,6&lt;br /&gt;MS Atapattu   6,1   32,5  36,4  38,4  39,0&lt;/pre&gt;&lt;br /&gt;But as I said in my previous post, much of the variation in how players do on nought can be put down to chance.  Not all of it, though, so some of those names in the table above should belong around where you see them.&lt;br /&gt;&lt;br /&gt;Here's a plot of effective average at nought against regular average:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/mu0avg.png"&gt;&lt;br /&gt;&lt;br /&gt;Now the effective average on 1 against regular average (there's much less scatter because there's basically no smoothing done at nought, but there is at 1):&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/mu1avg.png"&gt;&lt;br /&gt;&lt;br /&gt;Effective average at 10:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/mu10avg.png"&gt;&lt;br /&gt;&lt;br /&gt;Effective average at 30:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/mu30avg.png"&gt;&lt;br /&gt;&lt;br /&gt;The effective average at 10 is very good at predicting the overall average &amp;mdash; the scatter is noticeably less than at 1 or 30 (I haven't tried numbers in between to see where the minimum actually is).  Part of this might be an artefact of the model, but I think that part of this is a real effect, especially the larger scatter at 30.  Good players should have their eye in by 30 and do much better than their overall average.  Bad players don't improve so much from how they did earlier in their innings.  Why don't they improve?  Why do they get as much eye in as possible (which is not very much) early in their innings?  Questions for psychologists perhaps.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1758299359642342375?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1758299359642342375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1758299359642342375' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1758299359642342375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1758299359642342375'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/more-on-getting-your-eye-in.html' title='More on getting your eye in'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_mu0avg.png' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6680613293682823461</id><published>2008-04-14T11:12:00.001+02:00</published><updated>2008-04-14T11:12:53.307+02:00</updated><title type='text'>London trip</title><content type='html'>Tomorrow I'll be heading to London, where I'll be until the end of the week.  For the first time in almost a year, I'll &lt;i&gt;actually watch a day of cricket&lt;/i&gt;.  I don't mean that I haven't been to a cricket ground for a year (it's been longer than that) &amp;mdash; I'm including television as well.  It's been a while.  So let's hope that the rains stay away on Wednesday for the first day of the Championship, and in particular Surrey v Lancashire.&lt;br /&gt;&lt;br /&gt;So before I disappear for a week, here's a quick run-down on what I've done on getting your eye in in the last couple of days.  I worked out how to script gretl, so I've now got effective average curves for pretty much all major batsmen (there are a few with really bizarre hazard functions that refused have my curve type fitted to them).&lt;br /&gt;&lt;br /&gt;There doesn't appear to be any substantial differences in effective average on nought between openers, all-rounders, and others.&lt;br /&gt;&lt;br /&gt;All-rounders &lt;i&gt;might&lt;/i&gt; behave differently in terms of how they perform once they're off the mark, but I need to have a bit of a more careful look.&lt;br /&gt;&lt;br /&gt;I'm wondering how much of the variation in effective average on nought is due to luck.  Looking at batsmen who average over 40, the average proportion of ducks is about 0,061.  Using that and applying the binomial theorem with 120 innings (the average number of innings across the dataset), you get an expected standard deviation of 0,022.  The actual standard deviation is 0,024.  About 60% are within one standard deviation of the mean, a little less than would be predicted (68,7%) by the normal distribution.  (I'm assuming there's enough innings for the normal distribution to be a good approximation.)  So it looks like there are real differences between batsmen in terms of ducks, but they might not be so big.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6680613293682823461?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6680613293682823461/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6680613293682823461' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6680613293682823461'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6680613293682823461'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/london-trip.html' title='London trip'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7436219366585148685</id><published>2008-04-11T21:57:00.004+02:00</published><updated>2008-08-31T09:59:30.444+02:00</updated><title type='text'>Getting your eye in</title><content type='html'>(&lt;b&gt;Update&lt;/b&gt;: See &lt;a href="http://pappubahry.blogspot.com/2008/04/london-trip.html"&gt;these&lt;/a&gt; &lt;a href="http://pappubahry.blogspot.com/2008/04/more-on-getting-your-eye-in.html"&gt;two&lt;/a&gt; followup posts.)&lt;br /&gt;&lt;br /&gt;An anonymous commenter pointed me to &lt;a href="http://arxiv.org/abs/0801.4408"&gt;arXiv:0801.4408v1&lt;/a&gt;, a paper by Brendon Brewer called "Getting Your Eye In: A Bayesian Analysis of Early Dismissals in Cricket".  &lt;br /&gt;&lt;br /&gt;Before starting the discussion, I'll define the hazard function.  These seem to be used all over the place in the (pretty small) academic literature on cricket scores.  The hazard function, written H(x), is defined as the probability that the batsman will be dismissed at score x.&lt;br /&gt;&lt;br /&gt;Simple enough.  But Brewer points out a very neat interpretation of it (he may not be the first to do so, but it's the first time I've seen it).  If the hazard function is constant (i.e., always equal probability of getting out), you get a geometric distribution of scores (or, in the continuous limit, the exponential distribution that I mention every couple of weeks).  In particular, a hazard value H is related to a batting average µ by µ = 1/H - 1.&lt;br /&gt;&lt;br /&gt;So (here's the important bit), given a particular value of H for some batsman (say H(0) = 0,06 &amp;mdash; the probability of making a duck), we can say that, on zero, he bats like someone with an average of 1/0,06 - 1 = 15,67.  If you're not convinced that this is useful, you should be by the end of this post.&lt;br /&gt;&lt;br /&gt;Technical details follow.  Feel free to skip to the tables below.&lt;br /&gt;&lt;br /&gt;The methods used in the paper are too technical for me to be bothered understanding them all, but here is a brief summary:&lt;br /&gt;&lt;br /&gt;- Assume that the hazard function is of a particular type depending on various parameters.&lt;br /&gt;- Estimate what those parameters are.&lt;br /&gt;&lt;br /&gt;I think that the main problem with the assumptions is that they don't take into account how important getting off the mark is.  It's assumed that it's a pretty smooth transition from zero to some higher score.  But, if you work out the hazard directly (I took all batsmen who average over 40 in Tests), you get this:&lt;br /&gt;&lt;br /&gt;&lt;img title="The bump at x=4 should be reversed for weaker batsmen, but I haven't checked." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/empiricalhazard.png" alt="The bump at x=4 should be reversed for weaker batsmen, but I haven't checked."&gt;&lt;br /&gt;&lt;br /&gt;There's an almost 20-run jump in effective average from a score of zero to a score of one.  (Also, it curve doesn't level off for a long time.)&lt;br /&gt;&lt;br /&gt;So, I instead took assumed that the average associated with the hazard goes like:&lt;br /&gt;&lt;br /&gt;µ = a + k*(b - a)*x&lt;sup&gt;p&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;I originally intended for all of those parameters to have nice interpretations, but the actual results made a mockery of that idea.  Anyway, if you fit the graph above up to x = 30, you get the following (fit parameters: a = 15,2; b = 49,4; k = 0,63; p = 0,17):&lt;br /&gt;&lt;br /&gt;&lt;img title="Fit YAAAAA!  OK, so it has four parameters, so it *should* be a good fit.  But it took me hours to get it all working." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/hazardfit.png" alt="Fit YAAAAA!  OK, so it has four parameters, so it *should* be a good fit.  But it took me hours to get it all working."&gt;&lt;br /&gt;&lt;br /&gt;Technical notes: I did the fits using gretl's non-linear least squares abilities.  You do the fits with the hazard function and then convert to an average, since sometimes the hazard function gets close to or equal to zero.  If you convert to an average before fitting, you get some data points heading off to infinity and nothing works.  I used scores from 0 to 49.  The way the equation's set up, the parameter 'a' basically picks out the empirical hazard at zero.  I think that this is reasonable, since getting off the mark is so important.  But it's debatable.  &lt;br /&gt;&lt;br /&gt;Unfortunately, I don't know how to automate everything, so I have to do one batsman at a time.  Maybe next time I'm listening to football on the radio I'll go through and process a bunch of batsmen.&lt;br /&gt;&lt;br /&gt;Now, the parameter 'a' does have a nice interpretation: it's the effective batting average at zero.  The parameter 'p' tells you how flat the curve is (near zero: very flat).  While I give the values of the parameters for each player, the more important thing is the effective average µ at various scores.  I've given µ(0) (which is just 'a'), µ(1) (effective average at 1), µ(10), and µ(30).  There's a bit of round-off error (I went gretl -&gt; pen and paper -&gt; Excel), but it's nothing serious.  I've also given the regular average in the last column.&lt;br /&gt;&lt;br /&gt;To start with, here's the very atypical example of Don Bradman:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;player        b     k     p     µ(0)  µ(1)  µ(10) µ(30) avg&lt;br /&gt;DG Bradman    71,3  0,96  0,13  10,4  68,9  89,3  101,4 99,9&lt;/pre&gt;&lt;br /&gt;Don't let Bradman get off the mark!  On zero he batted like someone who averaged 10, but on one he was almost as good as Mike Hussey.  Very soon he batted like the best batsmen ever.  Bradman's apparent woefulness before he got off the mark seems pretty typical.  Let's have a look at some others.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;player        b     k     p     µ(0)  µ(1)  µ(10) µ(30) avg&lt;br /&gt;SR Waugh      50,3  0,56  0,22  10,7  32,9  47,5  57,6  51,1&lt;br /&gt;N Hussain     38,2  0,53  0,29  11,3  25,6  39,1  49,5  37,2&lt;br /&gt;JL Langer     56,3  0,63  0,026 16,9  41,7  43,3  44,0  45,7&lt;br /&gt;G Kirsten     61,6  0,24  0,41  12,4  24,2  42,8  60,0  45,3&lt;br /&gt;BC Lara       199,0 0,11  0,25  12,5  33,0  49,0  60,5  53,2&lt;br /&gt;ME Waugh      70,6  0,39  0,18  10,0  33,6  45,8  53,6  41,8&lt;/pre&gt;&lt;br /&gt;Interestingly, the two best batsmen on nought are Langer and Kirsten &amp;mdash; both openers.  Unfortunately, this is a (very) limited dataset, so we'll put this aside for further study.&lt;br /&gt;&lt;br /&gt;In Brewer's own small dataset, he saw that the two all-rounders had little change from "before eye in" to "eye in".  Part of that I think is due to the wrong shape of the hazard function, but all-rounders doing well on nought does seem to be a real effect (again, pending a more thorough study):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;player        b     k     p     µ(0)  µ(1)  µ(10) µ(30) avg&lt;br /&gt;SM Pollock    44,2  0,53  0,026 16,2  31,0  32,0  32,4  32,3&lt;br /&gt;CL Cairns     43,2  0,39  0,28  14,2  25,5  35,8  43,5  33,5&lt;br /&gt;GStA Sobers   68,2  0,75  1E-08 12,4  54,3  54,3  54,3  57,8&lt;br /&gt;Imran Khan    48,7  0,55  0,052 14,9  33,5  35,9  37,1  37,7&lt;br /&gt;GA Faulkner   31,7  0,13  1,02  26,4  27,1  33,6  48,5  40,8&lt;br /&gt;JH Kallis     68,2  0,56  0,13  18,6  46,4  56,1  61,8  57,1&lt;br /&gt;KR Miller     50,8  0,48  0,086 16,6  33,0  36,6  38,6  37,0&lt;/pre&gt;&lt;br /&gt;In terms of flatness of the effective average (once off the mark), I think the most important factor is the regular average.  Sorta-all-right batsmen who average 30 will typically get lots of starts but not go on with them.  Two examples (again it'd be nice to have more, but I didn't cherrypick &amp;mdash; they were the only two I looked at):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;player        b     k     p     µ(0)  µ(1)  µ(10) µ(30) avg&lt;br /&gt;MR Ramprakash 52,3  0,49  1E-08 6,7   29,1  29,1  29,1  27,3&lt;br /&gt;RS Mahanama   51,1  0,40  0,037 11,7  27,5  28,9  29,6  29,3&lt;/pre&gt;&lt;br /&gt;So players like Ramprakash and Mahanama have got their eye in once they're off the mark, but that's as far as it goes.  Better batsmen continue to improve, but these ones don't, for whatever reasons.&lt;br /&gt;&lt;br /&gt;That's all I have for now.  Feel free to make requests, and to make it interesting, say what you think the results will be for each batsman (e.g., terrible on zero, good on zero, etc.).  &lt;br /&gt;&lt;br /&gt;And I hope you're all convinced that the effective average is a wonderful number for this exercise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7436219366585148685?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7436219366585148685/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7436219366585148685' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7436219366585148685'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7436219366585148685'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/getting-your-eye-in.html' title='Getting your eye in'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_empiricalhazard.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2878469408387270471</id><published>2008-04-10T12:32:00.002+02:00</published><updated>2008-04-10T13:28:52.368+02:00</updated><title type='text'>Adjusting ODI averages for not-outs</title><content type='html'>In my &lt;a href="http://pappubahry.blogspot.com/2008/03/adjusting-averages-for-not-outs-take.html"&gt;post&lt;/a&gt; on adjusting averages for not-outs in Tests, commenter Rich asked about doing the same in ODI's.  At first I thought that this would be too difficult, but I decided that with an hour of mindless copy-pasting from Statsguru, I could at least get it working for individual players.&lt;br /&gt;&lt;br /&gt;(Usually I'd rather watch paint dry than copy-paste Statsguru data for an hour, but it's not so bad you're listening to the Champions League football on the radio.)&lt;br /&gt;&lt;br /&gt;There's one very important difference for this exercise between ODI's and Tests.  In Tests, pretty much all the top batsmen can expect to bat out their innings most of the time.  In ODI's, the top order can usually do this, but the middle order often have to slog at the end.  So whereas an opener can get a start of 50 and carry on to a century, the number six who gets to 50 will often get out soon afterwards.&lt;br /&gt;&lt;br /&gt;So I split the analysis into two parts: one for the top order (1-3), and one for the middle order (4-7).  Perhaps 1-4 and 5-7 would have been better, but I can hardly be bothered re-gathering the data.&lt;br /&gt;&lt;br /&gt;I only considered batsmen with an average of 35 or more, and only considered innings in the last ten years, since there's been a big explosion in ODI run habits recently.&lt;br /&gt;&lt;br /&gt;First up, projected increases for the top order:&lt;br /&gt;&lt;br /&gt;&lt;img title="Lots of batsmen get out at 101, it seems." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/notoutprojodi1-3.png" alt="Lots of batsmen get out at 101, it seems." /&gt;&lt;br /&gt;&lt;br /&gt;This is similar to the Test graph — batsmen clearly get their eye in after scoring some runs — but the downward trend starts much earlier, as is expected.  After about 60 runs, the average increases are less than the overall average for this dataset (almost exactly 40).  So not-outs tend to deflate averages when the score is below 60, but inflates them afterwards.&lt;br /&gt;&lt;br /&gt;And now for the middle order:&lt;br /&gt;&lt;br /&gt;&lt;img title="I reckon that looks like a three-fingered hand." src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/notoutprojodi4-7.png" alt="I reckon that looks like a three-fingered hand." /&gt;&lt;br /&gt;&lt;br /&gt;I wouldn't pay much attention to the curve out past 100, since there's not much data there.  It won't make too much difference, since there aren't that many unbeaten centuries in the middle order.&lt;br /&gt;&lt;br /&gt;The curve is quite different from that of the top order, in roughly the way we would expect.  Not-outs have a deflating effect on averages only up to 25 runs or so, and after that they inflate averages.&lt;br /&gt;&lt;br /&gt;Now, I haven't done a thorough analysis on all batsmen, since I don't have that data handy.  I've just done some selected cases.  Some caveats: Some of the batsmen played earlier than ten years ago, and perhaps the average increase curves was different then.  Also, I've applied either top order or middle order adjustments to each batsman, and not both.  This won't have too much of an effect, but to do it properly you'd want to split the innings into top-order and middle-order and do them separately.  If a batsman's highest score was a not-out, I added the average increase to it.  (For Tests, I added the batsman's regular average, but doing so in ODI's is much less accurate.)&lt;br /&gt;&lt;br /&gt;In the table below there are four averages presented: the regular average, one adjusted based purely on the batsman's own scores, one based purely on the relevant graph above (shifted up or down to match the batsman's regular average), and one mixture of the two, giving more weight to the graph when the batsman doesn't have many scores greater than or equal to the not-out being projected.  I've called these &lt;b&gt;reg&lt;/b&gt;, &lt;b&gt;ind&lt;/b&gt;, &lt;b&gt;gph&lt;/b&gt;, &lt;b&gt;mix&lt;/b&gt;.  The latter one is the one I'd go with.  There's two openers, and the rest from the middle order.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;player        inns  no  reg   ind   gph   mix&lt;br /&gt;SR Tendulkar  407   38  44,3  43,4  43,9  43,5&lt;br /&gt;SC Ganguly    300   23  41,0  40,6  40,6  40,6&lt;br /&gt;---&lt;br /&gt;MG Bevan      196   67  53,6  48,1  52,4  48,8&lt;br /&gt;L Klusener    137   50  41,1  41,7  40,1  41,4&lt;br /&gt;MEK Hussey    64    26  55,6  55,0  54,4  54,9&lt;br /&gt;A Symonds     154   32  39,7  40,1  39,0  39,9&lt;br /&gt;DR Martyn     182   51  40,8  42,1  40,0  41,6&lt;br /&gt;RP Arnold     155   43  35,3  33,5  34,6  33,8&lt;/pre&gt;&lt;br /&gt;Bevan's average has, contrary to my expectations, been pulled back quite a bit, down below 49.  Nevertheless, it's still a lot higher than most Bevan sceptics would have it.  I wouldn't want to draw too many general conclusions from what was a deliberately biased set of batsmen (all with fairly high not-out proportions), but it looks like ODI averages can be and sometimes are inflated by not-outs much more than Test averages.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2878469408387270471?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2878469408387270471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2878469408387270471' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2878469408387270471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2878469408387270471'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/adjusting-odi-averages-for-not-outs.html' title='Adjusting ODI averages for not-outs'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_notoutprojodi1-3.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3594035491211982855</id><published>2008-04-06T22:37:00.000+02:00</published><updated>2008-04-06T22:38:01.111+02:00</updated><title type='text'>Largest deficits to win</title><content type='html'>Something a little less full-on today, back to first-class trivia.  The largest first innings deficit leading to a win is 402 runs, in &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/100/100870.html"&gt;this match&lt;/a&gt;.  But it was a contrived result: Central Districts made 464, Northern Districts replied with 2dec/62, Central 0dec/26, Northern 8/429 won by two wickets.  Ignoring matches where the team batting second declared behind, and also ignoring the &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/85/85941.html"&gt;forfeited Test&lt;/a&gt;, the top five deficits to win are (up to the end of the 2007 English season):&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/12/12180.html"&gt;384&lt;/a&gt;: Barbados 175 &amp;amp; 7dec/726 def. Trinidad 559 &amp;amp; 217, 1926/7&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/56/56335.html"&gt;291&lt;/a&gt;: Australia 256 &amp;amp; 471 def. Sri Lanka 8dec/547 &amp;amp; 164, 1992/3&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/74/74212.html"&gt;287&lt;/a&gt;: Manicaland 9dec/513 &amp;amp; 146 lost to Mashonaland 226 &amp;amp; 506 (f/o), 2001/2&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/38/38698.html"&gt;279&lt;/a&gt;: Western Province 460 &amp;amp; 3dec/219 lost to South African Universities 181 &amp;amp; 7/500, 1978/9&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/25/25014.html"&gt;277&lt;/a&gt;: Nottinghamshire 4dec/431 &amp;amp; 0dec/94 lost to Lancashire 154 &amp;amp; 4/372, 1961&lt;br /&gt;&lt;br /&gt;I can only find two instances of a batsman being out stumped X bowled Y in the first innings, and stumped Y bowled X in the second.  They are Frederick Thackeray in &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/602.html"&gt;this match&lt;/a&gt; (st: Cobbett b: Bayley in the first innings, st: Bayley b: Cobbett in the second) in 1839, and Ron Oxenham in &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/11/11382.html"&gt;this match&lt;/a&gt; (st: Strudwick b: Freemand in the first innings, st: Freemand b: Strudwick in the second) in 1924/5.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3594035491211982855?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3594035491211982855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3594035491211982855' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3594035491211982855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3594035491211982855'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/largest-deficits-to-win.html' title='Largest deficits to win'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1977758530389490205</id><published>2008-04-05T15:55:00.000+02:00</published><updated>2008-04-05T15:56:14.089+02:00</updated><title type='text'>Slumps - Is there a problem or is he just unlucky?</title><content type='html'>I've had a bit of a think about "form slumps", and how we might go about seeing if they're just due to chance (sometimes a batsmen will be dismissed for a string of low scores) or due to some genuine problem (a technique flaw, or a weakness found by opposition bowlers).  This post becomes more technical than usual later on, so feel free to fall asleep after the first table.  This may not be the best or fastest way of going about this problem, but it's they way I &lt;i&gt;did&lt;/i&gt; go about it &amp;mdash; sort of like stream of consciousness statistics.  The important thing is that it works, and it doesn't require anything more than Excel.&lt;br /&gt;&lt;br /&gt;The first thing we need to know is the distribution of individual innings scores.  Actually we don't need that &amp;mdash; we just need to know what the standard deviation is, relative to the mean.  So, for each batsman with 50 Test innings or more and an average of at least 35, I calculated the coefficient of variation (the standard deviation divided by the mean).  This is a measure of the consistency of a batsman.  My own opinion is that consistency is over-rated &amp;mdash; a batsman who goes 0, 100, 0, 100, etc. is just as useful as a batsman who goes 50, 50, 50, etc.  But it is still interesting to see which batsmen are more consistent than others, so here's the top and bottom of the table.  A higher co-efficient of variation means less consistency.&lt;br /&gt;&lt;br /&gt;(Technical note: for not-outs, I just added the batsman's average and considered it a regular innings.  Not the best thing to do, but it's close enough, and we'll be doing far worse later on.)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name            inns  avg     sd      cv&lt;br /&gt;MH Richardson   65    44,77   35,72   0,80&lt;br /&gt;H Sutcliffe     84    60,73   48,52   0,80&lt;br /&gt;JB Hobbs        102   56,95   48,53   0,85&lt;br /&gt;AN Cook         51    43,47   37,83   0,87&lt;br /&gt;PE Richardson   56    37,47   32,91   0,88&lt;br /&gt;A Ranatunga     155   35,70   31,40   0,88&lt;br /&gt;IR Redpath      120   43,46   38,62   0,89&lt;br /&gt;NC O'Neill      69    45,56   40,72   0,89&lt;br /&gt;KF Barrington   131   58,67   52,50   0,90&lt;br /&gt;JB Stollmeyer   56    42,33   37,98   0,90&lt;br /&gt;---&lt;br /&gt;Ijaz Ahmed      92    37,67   46,07   1,22&lt;br /&gt;DN Sardesai     55    39,24   48,54   1,24&lt;br /&gt;V Sehwag        90    53,76   66,76   1,24&lt;br /&gt;VT Trumper      89    39,05   48,57   1,24&lt;br /&gt;Zaheer Abbas    124   44,80   56,86   1,27&lt;br /&gt;Hanif Mohammad  97    43,99   55,81   1,27&lt;br /&gt;DL Amiss        88    46,31   60,59   1,31&lt;br /&gt;JA Rudolph      63    36,21   47,99   1,33&lt;br /&gt;W Jaffer        54    35,68   48,09   1,35&lt;br /&gt;MS Atapattu     156   39,02   52,81   1,35&lt;/pre&gt;&lt;br /&gt;There is only a very slight trend (R-squared = 0,022) showing higher averages associated with lower co-efficients of variation. &lt;br /&gt;&lt;br /&gt;The average co-efficient of variation is about 1,05.  From now on we'll ignore individual differences between batsmen and just assume that the scores of each batsmen can be treated as random variables, coming from a distribution with mean µ (i.e., his average) and standard deviation 1,05µ.&lt;br /&gt;&lt;br /&gt;No, we're interested in slumps.  So instead of considering individual innings, we'll be considering groups of innings.  We don't know what the distribution of individual innings is exactly, but the distribution of groups of innings will be approximately normal, by the central limit theorem.  In particular, the mean of a group of n innings will be approximately normally distributed with mean µ (the same as the career average) and standard deviation 1,05µ/sqrt(n). &lt;br /&gt;&lt;br /&gt;Note that while I said that µ was the career average, in order for things to work out properly, we'll actually use the career average &lt;i&gt;apart from the slump in question&lt;/i&gt;. &lt;br /&gt;&lt;br /&gt;So, to define how bad a slump of n innings is, we'll calculate a z-score.  Let µ&lt;sub&gt;c&lt;/sub&gt; be the career average (excluding the slump) and µ&lt;sub&gt;s&lt;/sub&gt; be the average during the slump.  Then define z = (µ&lt;sub&gt;s&lt;/sub&gt; - µ&lt;sub&gt;c&lt;/sub&gt;) * sqrt(n) / (1,05 * µ&lt;sub&gt;c&lt;/sub&gt;).&lt;br /&gt;&lt;br /&gt;For example, suppose a batsman was averaging 45.  Then he had a slump where for 25 innings he averaged 30.  The z-score here would be (30 - 45) * sqrt(25) / (1,05 * 45) = -1,59.  I'll call this a z = -1,59 slump.&lt;br /&gt;&lt;br /&gt;How rare is this sort of slump?  We can look up the answer in either a cumulative normal distribution table or use Excel (or something fancier).  In Excel, the relevant function in French is LOI.NORMALE.STANDARD(z).  For that small minority of you who use computers set to English, I think the function is called NORMSDIST.&lt;br /&gt;&lt;br /&gt;Anyway, plug in -1,59 and you get 0,056.  So, that tells us that the probability that he'd have a slump that bad (or worse) right now is about 5,6%.  Does the theory match reality?  For each batsman, I considered blocks of 26 innings (26 and not 25 for reasons that were important to me when I was testing this but don't make any difference now) that didn't overlap.  It's important that they don't overlap, because otherwise the probabilities will be dependent.  Add up the number of slumps worse than a particular z, divide by the total number of blocks sampled, and you get a probability.  Here are the results, for varying z's, with the observed probability and that derived from the normal distribution.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;z     obs     loi normale&lt;br /&gt;-3,0  0       0,001&lt;br /&gt;-2,9  0       0,002&lt;br /&gt;-2,8  0,000   0,003&lt;br /&gt;-2,7  0,001   0,003&lt;br /&gt;-2,6  0,001   0,005&lt;br /&gt;-2,5  0,002   0,006&lt;br /&gt;-2,4  0,004   0,008&lt;br /&gt;-2,3  0,006   0,011&lt;br /&gt;-2,2  0,008   0,014&lt;br /&gt;-2,1  0,012   0,018&lt;br /&gt;-2,0  0,018   0,023&lt;br /&gt;-1,9  0,023   0,029&lt;br /&gt;-1,8  0,032   0,036&lt;br /&gt;-1,7  0,041   0,045&lt;br /&gt;-1,6  0,053   0,055&lt;br /&gt;-1,5  0,066   0,067&lt;br /&gt;-1,4  0,080   0,081&lt;br /&gt;-1,3  0,095   0,097&lt;br /&gt;-1,2  0,111   0,115&lt;br /&gt;-1,1  0,134   0,136&lt;/pre&gt;&lt;br /&gt;That's not too bad.  The observed probabilities are low in the tail, but we'd be talking about really really bad slumps out there, so that's not too important.  For the z=-1,59 slump in the example above (something reasonably typical), we see that it matches pretty closely.  From now on, we'll just use the normal distribution instead of the empirically derived probabilities.&lt;br /&gt;&lt;br /&gt;So the theory works well enough.  But we can't stop here.  The probability of a z = -1,59 slump &lt;i&gt;right now&lt;/i&gt; is about 5,6%.  But that's not what we're interested in.  We'd like to know the probability that at some point during a batsman's career, he'll have a z = -1,59 slump. This is a very different thing entirely!  The probability of a batsman making a duck is about 6,5%, but you wouldn't call it a slump if he's just made a duck, because ducks are just going to happen sometimes.&lt;br /&gt;&lt;br /&gt;So how do we find the probability, given a career of M innings, that there'll be a z = -1,59 slump in there somewhere.  To answer this (perhaps not in the best way), let's review some basic probability.&lt;br /&gt;&lt;br /&gt;Imagine you roll a die five times.  What's the probability that you get at least one 6?  You &lt;i&gt;can't&lt;/i&gt; say: "The probability of a 6 on any roll is 1/6, so the probability of getting a 6 after five rolls is 5/6."  That would be the expectation of how many 6's you get.  What you need to do is find the probability that you &lt;i&gt;won't&lt;/i&gt; get a 6 on each roll &amp;mdash; 5/6 &amp;mdash; raise that to the fifth power (probability that you won't get a 6 in five rolls), then subtract it from 1. The answer is about 0,598. &lt;br /&gt;&lt;br /&gt;So, suppose a career is 80 innings long.  A slump of 20 innings could start at innings 1, innings 2, ..., innings 61.  So, do we take our probability of 0,056 from above, subtract from 1, raise to the power of 61, subtract from 1?  No!  Because when you take blocks of innings that overlap with each other, you're looking at &lt;i&gt;dependent&lt;/i&gt; events.  Two rolls of the die won't affect each other.  But the average of innings 2 to 21 will be highly dependent on the average from innings 1 to 20 &amp;mdash; after all, 19 of the innings are in both blocks.&lt;br /&gt;&lt;br /&gt;So we can't just raise that probability to the 61st power.  So what can we do?  I don't know what the best way is, but I decided to numerically work out what power you &lt;i&gt;should&lt;/i&gt; raise (1 - 0,056) to.&lt;br /&gt;&lt;br /&gt;So, summary of the procedure so far:&lt;br /&gt;1. Find z.&lt;br /&gt;2. Get associated probability p&lt;sub&gt;now&lt;/sub&gt; = LOI.NORMALE.STANDARD(z).&lt;br /&gt;3. Raise (1 - p&lt;sub&gt;now&lt;/sub&gt;) to some power x, to be determined numerically.&lt;br /&gt;4. Find 1 - (1 - p&lt;sub&gt;now&lt;/sub&gt;)&lt;sup&gt;x&lt;/sup&gt;.  This is the probability of having a z-slump at some point during the career.&lt;br /&gt;&lt;br /&gt;But, when I was working through this, I accidentally cut out the last step.  Happily, I get much better fits this way (I did try doing following the above procedure afterwards), but the exponent that you get doesn't have the same nice interpretation as the one above. &lt;br /&gt;&lt;br /&gt;So, actual procedure:&lt;br /&gt;1. Find z.&lt;br /&gt;2. Get associated probability p&lt;sub&gt;now&lt;/sub&gt; = LOI.NORMALE.STANDARD(z).&lt;br /&gt;3. Raise (1 - p&lt;sub&gt;now&lt;/sub&gt;) to some power x, to be determined numerically.  This is the probability of having a z-slump at some point during the career.&lt;br /&gt;&lt;br /&gt;So, time to work out x.  The first thing we'll see is that the ratio of the career length to slump length is important &amp;mdash; for a given ratio, it doesn't matter so much what the length of the slump is.  So x will be the same for a 40-innings slump in a 120-innings career as for a 20-innings slump in a 60-innings career.&lt;br /&gt;&lt;br /&gt;If N is the length of the career and n the length of the slump, then the ratio I actually worked with (just by 'historical accident) was (N + 1 - n) / N.&lt;br /&gt;&lt;br /&gt;Let's plot x against z for 20-innings slumps in 79-innings careers:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/slump20block60.png" /&gt;&lt;br /&gt;&lt;br /&gt;It's a nice exponential fit.  Repeating the procedure for other block sizes (always with length ratio of 3), you get the following table (fit parameters here are the A and k in x = Ae&lt;sup&gt;-kz&lt;/sup&gt;):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;slump length  A       k&lt;br /&gt;20            0,0026  -4,97&lt;br /&gt;26            0,0018  -5,05&lt;br /&gt;30            0,0018  -5,08&lt;br /&gt;40            0,0033  -4,90&lt;/pre&gt;&lt;br /&gt;The co-efficients can be sensitive to how much of the tail you let in, but basically they don't change too much.  You might be thinking, "Hey! There's almost a factor of 2 difference there!"  We'll ignore that and see what happens later.&lt;br /&gt;&lt;br /&gt;Now we'll hold the slump length constant (at 26) and vary the ratio.  Resulting graph of the fit parameter A:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/slumpfitA.png" /&gt;&lt;br /&gt;&lt;br /&gt;It's another exponential decay.&lt;br /&gt;&lt;br /&gt;Resulting graph for fit parameter k:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/slumpfitk.png" /&gt;&lt;br /&gt;&lt;br /&gt;It's linear.&lt;br /&gt;&lt;br /&gt;You can see that I've gone up to a ratio of 5.  Much past that and I start running into lack of data problems, though I could probably have kept going with more thought and patience.&lt;br /&gt;&lt;br /&gt;So, now we have all we need.  Full procedure for finding the probability that a batsmen will have a particular z-slump of length n during his career of length N:&lt;br /&gt;&lt;br /&gt;1. Calculate his average over the slump µ&lt;sub&gt;s&lt;/sub&gt;, and his career average excluding the slump µ&lt;sub&gt;c&lt;/sub&gt;.&lt;br /&gt;2. Calculate z = (µ&lt;sub&gt;s&lt;/sub&gt; - µ&lt;sub&gt;c&lt;/sub&gt;) * sqrt(n) / (1,05 * µ&lt;sub&gt;c&lt;/sub&gt;).&lt;br /&gt;3. Find p&lt;sub&gt;now&lt;/sub&gt; = LOI.NORMALE.STANDARD(z).&lt;br /&gt;4. Find q = (N + 1 - n) / n.&lt;br /&gt;5. Find A = 0,25 * e&lt;sup&gt;-1,4*q&lt;/sup&gt;.&lt;br /&gt;6. Find k = -(0,62 * q + 2,96).&lt;br /&gt;7. Find x = A * e&lt;sup&gt;k * z&lt;/sup&gt;.&lt;br /&gt;8. Find p = (1 - p&lt;sub&gt;now&lt;/sub&gt;)&lt;sup&gt;x&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;So does it work?  Let's have a look at observed probabilites and the p as calculated above for a 20-innings slump in a 79-innings career:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;z     obs     p&lt;br /&gt;-3,0  0       0,000&lt;br /&gt;-2,9  0       0,000&lt;br /&gt;-2,8  0       0,001&lt;br /&gt;-2,7  0,008   0,003&lt;br /&gt;-2,6  0,008   0,008&lt;br /&gt;-2,5  0,039   0,018&lt;br /&gt;-2,4  0,086   0,038&lt;br /&gt;-2,3  0,117   0,071&lt;br /&gt;-2,2  0,148   0,120&lt;br /&gt;-2,1  0,211   0,186&lt;br /&gt;-2,0  0,289   0,265&lt;br /&gt;-1,9  0,336   0,354&lt;br /&gt;-1,8  0,398   0,447&lt;br /&gt;-1,7  0,531   0,538&lt;br /&gt;-1,6  0,602   0,623&lt;br /&gt;-1,5  0,695   0,699&lt;br /&gt;-1,4  0,734   0,764&lt;br /&gt;-1,3  0,813   0,818&lt;br /&gt;-1,2  0,859   0,861&lt;br /&gt;-1,1  0,969   0,896&lt;/pre&gt;&lt;br /&gt;Wrong way out in the tail, but pretty good for z greater than -2 or so.&lt;br /&gt;&lt;br /&gt;What about a 15-innings slump in a 134-innings career?  That's outside the regions we used to derive the fit parameters.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-2,1  0,390   0,353&lt;br /&gt;-2,0  0,542   0,548&lt;br /&gt;-1,9  0,678   0,708&lt;br /&gt;-1,8  0,780   0,821&lt;br /&gt;-1,7  0,814   0,895&lt;br /&gt;-1,6  0,848   0,940&lt;br /&gt;-1,5  0,864   0,966&lt;br /&gt;-1,4  0,966   0,981&lt;br /&gt;-1,3  0,983   0,990&lt;br /&gt;-1,2  0,983   0,994&lt;/pre&gt;&lt;br /&gt;Pretty good.&lt;br /&gt;&lt;br /&gt;Now, if I were a selector who never actually watched the players and only looked at the scores they made, how would I used this in practice?  Well, if a particular batsmen was in a slump, I'd want to know how likely it is that that sort of slump would happen in a career as long as his.  If it's more than 50%, then I'd let him keep going.  If it's less than 50%, I'd drop him.&lt;br /&gt;&lt;br /&gt;A practical example: Andrew Strauss.  Before his much-publicised recent slump, he averaged 46,39.  During the slump, up to the second Test against New Zealand, he averaged 28,10.  The slump lasted 29 innings; his career up to then had lasted 83 innings.&lt;br /&gt;&lt;br /&gt;So:&lt;br /&gt;1. µ&lt;sub&gt;s&lt;/sub&gt; = 28,10; µ&lt;sub&gt;c&lt;/sub&gt; = 46,39.&lt;br /&gt;2. z = (28,10 - 46,39) * sqrt(29) / (1,05 * 46,39) = -2,02.&lt;br /&gt;3. p&lt;sub&gt;now&lt;/sub&gt; = LOI.NORMALE.STANDARD(-2,02) = 0,0217.&lt;br /&gt;4. q = (83 + 1 - 29) / 29 = 1,90.&lt;br /&gt;5. A = 0,25 * e&lt;sup&gt;-1,4 * 1,90&lt;/sup&gt; = 0,0176.&lt;br /&gt;6. k = -(0,62 * 1,90 + 2,96) * (-2,02) = 8,359.&lt;br /&gt;7. x = A * e&lt;sup&gt;k&lt;/sup&gt; = 75,1.&lt;br /&gt;8. p = (1 - 0,0217)&lt;sup&gt;75,1&lt;/sup&gt; = 0,192.&lt;br /&gt;&lt;br /&gt;So slumps as bad as Strauss's should only happen to about one player in five, in a career as long as his.  So based on numbers alone, I would have dropped him.  Of course, he hit 177 in his next Test.&lt;br /&gt;&lt;br /&gt;Now I think that this has been a useful exercise, but I'm not sure how much use it has in practice.  You don't pick cricket teams based purely on statistics &amp;mdash; you have to watch the players as well.  If (say) a batsman is regularly getting out LBW early in his innings, you don't want to let it keep happening until p becomes 0,5 before dropping him.  You want to get in early, and either drop him or work on his technique.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1977758530389490205?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1977758530389490205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1977758530389490205' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1977758530389490205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1977758530389490205'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/04/slumps-is-there-problem-or-is-he-just.html' title='Slumps - Is there a problem or is he just unlucky?'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_slump20block60.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3636634042728068866</id><published>2008-03-30T22:16:00.001+02:00</published><updated>2008-03-30T22:18:36.327+02:00</updated><title type='text'>The largest winning margins in first-class cricket</title><content type='html'>I've done the bulk of the work in putting together a comprehensive first-class database, containing all matches up until the end of the 2007 season.  (There are a few holes near the end; some of the 2007/8 seasons started before the 2007 season finished.)&lt;br /&gt;&lt;br /&gt;I will no doubt soon go back to statistical analysis, but for the next few posts I'll probably do statistical lists.  Usually I find these boring, but since I don't know of such first-class lists online, I thought that they'd be of interest.  If they are already on the Internet somewhere and you know where they are, please let me know.&lt;br /&gt;&lt;br /&gt;We'll start with the largest margins of victory, first of all by innings.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/27/27171.html"&gt;Railways v Dera Ismail Khan&lt;/a&gt;, 1964/5: Railways 6dec/910 def. Dera Ismail Khan 32 &amp;amp; 27 (f/o) by an innings and 851 runs.&lt;br /&gt;&lt;br /&gt;I have no idea how Dera Ismail Khan came to be classified as a first-class team.  They clearly weren't one.  In the 1980's they had another string of losses by an innings before their best performance, a loss by only a handful of runs &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/45/45282.html"&gt;to Hazara&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/10/10684.html"&gt;Victoria v Tasmania&lt;/a&gt;, 1922/3&lt;br /&gt;Tas 217 &amp;amp; 176 lost to Vic 1059 by an innings and 666 runs.  This was the match when Bill Ponsford made his 429.&lt;br /&gt;&lt;br /&gt;There are quite a few Australian matches from the timeless era in these lists.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/12/12150.html"&gt;Victora v New South Wales&lt;/a&gt;, 1926/7&lt;br /&gt;NSW 221 &amp;amp; 230 lost to Vic 1107 by an innings and 656 runs.  Victoria's innings remains the highest ever in first-class cricket.  Ponsford made 352.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/5/5597.html"&gt;New South Wales v South Australia&lt;/a&gt;, 1900/1&lt;br /&gt;SA 157 &amp;amp; 156 lost to NSW 918 by an innings and 605 runs.  Remarkably enough, the highest score in NSW's 918 was Syd Gregory's 168.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/16/16792.html"&gt;England v Australia&lt;/a&gt;, 1938&lt;br /&gt;Eng 7dec/903 def. Aus 201 &amp;amp; 123 (f/o) by an innings and 579 runs.  Hutton 364.&lt;br /&gt;&lt;br /&gt;And now the top five by runs.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/13/13341.html"&gt;New South Wales v Queensland&lt;/a&gt;, 1929/30.&lt;br /&gt;NSW 235 &amp;amp; 8dec/761 def. Qld 227 &amp;amp; 84 by 685 runs.  Bradman 452 not out.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/12/12925.html"&gt;Australia v England&lt;/a&gt;, 1928/9.&lt;br /&gt;Eng 521 &amp;amp; 8dec/342 def. Aus 122 &amp;amp; 66 by 675 runs.  Bradman's Test debut.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/10/10009.html"&gt;South Australia v New South Wales&lt;/a&gt;, 1920/1&lt;br /&gt;NSW 304 &amp;amp; 770 def. SA 265 &amp;amp; 171 by 638 runs.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/37/37874.html"&gt;Muslim Commercial Bank v Water and Power Development Authority&lt;/a&gt;&lt;br /&gt;MCBA 575 &amp;amp; 0dec/282 def. WPDA 98 &amp;amp; 150 by 609 runs.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/38/38930.html"&gt;Sargodha v Lahore Municipal Corporation&lt;/a&gt;&lt;br /&gt;Sar 336 &amp;amp; 416 def. LMC 77 &amp;amp; 90 by 585 runs.&lt;br /&gt;&lt;br /&gt;On an unrelated note, there are a few games from the early parts of the 19th century where either some players, whole teams, or even the team scores are unknown.  Such gaps in the first-class record looked to have ended in 1829, but they made a re-appearance in Sri Lanka in the 1990's.  &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/55/55903.html"&gt;1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/56/56265.html"&gt;2&lt;/a&gt; (Colombo only had eight players!), &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/56/56896.html"&gt;3&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/67/67440.html"&gt;4&lt;/a&gt;.  That last match was in 1999.&lt;br /&gt;&lt;br /&gt;I won't be counting any of these incomplete scorecards for my stats.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3636634042728068866?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3636634042728068866/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3636634042728068866' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3636634042728068866'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3636634042728068866'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/largest-winning-margins-in-first-class.html' title='The largest winning margins in first-class cricket'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2024025285869908340</id><published>2008-03-25T14:17:00.000+01:00</published><updated>2008-03-25T14:19:42.116+01:00</updated><title type='text'>Toiling away</title><content type='html'>&lt;a href="http://tcwj.blogspot.com/"&gt;Soulberry&lt;/a&gt; asked for the best fast bowlers on flat pitches since 1970.  Defining a flat pitch is not easy, so I've taken a short-cut to make my life easier.  I've just totted up the averages for bowlers in innings where the opposition scores at least 450.  It's a bit artificial, but it should do. &lt;br /&gt;&lt;br /&gt;Here we go.  This is actually all Test bowlers, qualification 15 wickets in these high-scoring innings.  They're ranked by the weighted averages, where wickets are weighted by the average of the batsmen dismissed.  This is particularly useful in this exercise, as we're not interested in who picks up cheap tail-end wickets.  I've bolded the bowlers who satisfy Soulberry's criteria. &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name              runs  wkts  wtd wkts  avg     wtd avg&lt;br /&gt;Mushtaq Mohammad  629   21    21,2      29,95   29,61&lt;br /&gt;WJ O'Reilly       932   24    29,6      38,83   31,53&lt;br /&gt;MW Tate           1124  30    34,2      37,47   32,83&lt;br /&gt;&lt;b&gt;C White           792   17    23,6      46,59   33,63&lt;/b&gt;&lt;br /&gt;&lt;b&gt;SM Pollock        1376  40    40,2      34,40   34,20&lt;/b&gt;&lt;br /&gt;NJN Hawke         629   18    18,3      34,94   34,32&lt;br /&gt;&lt;b&gt;MHN Walker        552   15    15,3      36,80   36,01&lt;/b&gt;&lt;br /&gt;&lt;b&gt;RGD Willis        1386  33    38,4      42,00   36,09&lt;/b&gt;&lt;br /&gt;&lt;b&gt;BA Reid           867   24    23,9      36,13   36,31&lt;/b&gt;&lt;br /&gt;&lt;b&gt;JN Gillespie      871   17    22,9      51,24   38,07&lt;/b&gt;&lt;br /&gt;&lt;b&gt;M Dillon          1413  28    37,0      50,46   38,21&lt;/b&gt;&lt;br /&gt;&lt;b&gt;DK Lillee         956   26    24,8      36,77   38,52&lt;/b&gt;&lt;br /&gt;&lt;b&gt;B Lee             1191  27    30,4      44,11   39,23&lt;/b&gt;&lt;br /&gt;JC Laker          846   20    21,5      42,30   39,29&lt;br /&gt;&lt;b&gt;CEL Ambrose       667   16    16,9      41,69   39,40&lt;/b&gt;&lt;br /&gt;&lt;b&gt;CE Cuffy          571   15    14,5      38,07   39,49&lt;/b&gt;&lt;br /&gt;DA Allen          966   20    24,4      48,30   39,66&lt;br /&gt;AJ Bell           503   15    12,5      33,53   40,21&lt;br /&gt;DR Hadlee         890   19    22,1      46,84   40,25&lt;br /&gt;FS Trueman        644   15    16,0      42,93   40,38&lt;/pre&gt;&lt;br /&gt;I'm not sure how much I'd want to read into these numbers, since the wicket tallies are generally quite low.  But Shaun Pollock looks like he deserves to be near the top.&lt;br /&gt;&lt;br /&gt;And now the bottom end:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;EAS Prasanna      1573  21    21,8      74,90   72,02&lt;br /&gt;GS Sobers         1520  21    20,9      72,38   72,73&lt;br /&gt;&lt;b&gt;Mohammad Sami     1507  17    20,6      88,65   73,26&lt;/b&gt;&lt;br /&gt;&lt;b&gt;SJ Harmison       1825  24    24,9      76,04   73,32&lt;/b&gt;&lt;br /&gt;SP Gupte          1120  18    15,1      62,22   74,17&lt;br /&gt;DL Underwood      1593  18    21,2      88,50   75,11&lt;br /&gt;&lt;b&gt;FH Edwards        1136  15    15,1      75,73   75,38&lt;/b&gt;&lt;br /&gt;PCR Tufnell       1339  15    16,3      89,27   82,35&lt;br /&gt;Mushtaq Ahmed     1266  15    15,1      84,40   83,62&lt;br /&gt;RJ Shastri        1719  17    19,4      101,12  88,52&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2024025285869908340?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2024025285869908340/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2024025285869908340' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2024025285869908340'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2024025285869908340'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/toiling-away.html' title='Toiling away'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-688994204735849212</id><published>2008-03-21T10:11:00.000+01:00</published><updated>2008-03-21T10:12:00.094+01:00</updated><title type='text'>What is a chuck?</title><content type='html'>This is a bit different from my usual fare, but I thought it deserved its own post, rather than just being in a couple of comments threads around the place.  Thanks to &lt;a href="http://aftergrogblog.blogs.com/agb/"&gt;AGB&lt;/a&gt; commenter Professor Rosseforp for bringing my attention to the Ferdinands and Kersting paper.&lt;br /&gt;&lt;br /&gt;For a long time, the definition of an illegal delivery action was that the ball couldn't be thrown or jerked.  (Stuart gave us details &lt;a href="http://historyofcricket.blogspot.com/2007/11/history-of-chucking-part-one.html"&gt;here&lt;/a&gt; and &lt;a href="http://historyofcricket.blogspot.com/2007/11/history-of-chucking-part-ii.html"&gt;here&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;A recent paper suggests that we might be able to return to a sort of 'jerkiness' definition, only this time backed up by some science.  I'll go through a bit of background first.  The key goal that we want is for science to come up with a criterion whereby bowlers who look like chuckers &lt;i&gt;are&lt;/i&gt; chuckers.  An exception to this is Murali, who can bowl in a brace (so he can't possibly chuck) and still look bad.  But the problem of a chucking definition is much bigger than just Murali, and it should be possible to get science to agree with the naked eye, at least most of the time, for bowlers with 'normal' arms.&lt;br /&gt;&lt;br /&gt;The ICC's current tolerance of 15 degree elbow straightening (or 'extension'? not really sure of the difference) was based on a study done by Porter, Elliott, and Hurrion during the 2004 Champions Trophy.  Unfortunately, the full details of the study haven't been released to the general public.  The reason given by the ICC is confidentiality issues: "We do not think it would be correct to release the figures publicly without the prior consent of the individual bowlers and the researchers themselves. This consent has not been obtained."&lt;br /&gt;&lt;br /&gt;I don't understand why the researchers, who have published many biomechanics studies publicly, should care.  I can see why the ICC might not want to give names and elbow straightenings, because there'd be a torrent of allegations of chucking all over the place.&lt;br /&gt;&lt;br /&gt;Nevertheless, the ICC's secrecy over the matter means that we don't really know much about elbow straightenings, except that Sarwan's was zero, and Pollock and McGrath up to 12 degrees or so.&lt;br /&gt;&lt;br /&gt;But luckily there was a paper published (in &lt;i&gt;Sports Biomechanics&lt;/i&gt;, 'Fast Bowling Arm Actions and the Illegal Delivery Law in Men's High Performance Cricket Matches') by Portus, Rosemond, and Rath in 2006 which does give us some numbers.  They also don't name names, but they did study thirty-four deliveries by twenty-one fast bowlers, from Test, ODI, and tour matches.  None of the bowlers had had any questions raised over their actions.  These sorts of analyses take a long time, which is why so few balls were studied.&lt;br /&gt;&lt;br /&gt;The errors in the measurements are +/- 1 degree.&lt;br /&gt;&lt;br /&gt;Of the thirty-four balls bowled, three were by two bowlers with hyperextended elbows, so we'll ignore them.  Of the remaining thirty-one, six had elbow straightenings larger than 15 degrees.  These were spread across four bowlers out of nineteen.  If you go by the 15-degree rule, then those are chucks.  Many of the bowlers only had one ball recorded, so it's not clear if there were any more chuckers-under-the-15-degree-rule in the sample.  Looking at the numbers in the table, you'd guess that at least a couple of them go past 15 degrees sometimes.&lt;br /&gt;&lt;br /&gt;So, to summarise: &lt;b&gt;Under the 15-degree rule, one in four or five fast bowlers sometimes chuck, often more than once per over.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;These are, remember, bowlers whose actions haven't been questioned.&lt;br /&gt;&lt;br /&gt;So here we have the first problem of the 15-degree rule: many bowlers who should be deemed legitimate are breaking the rule.&lt;br /&gt;&lt;br /&gt;Now we get onto the other problem of the 15-degree rule: you can chuck without straightening your elbow 15 degrees.  &lt;br /&gt;&lt;br /&gt;This takes us to the paper by Ferdinands and Kersting, also published in &lt;i&gt;Sports Biomechanics&lt;/i&gt;, in 2007 ('An evaluation of biomechanical measures of bowling action legality in cricket').  The technical details are a bit over my head, but this is what they say:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;If bowlers adopted a similar action to throwing, where the elbow remains flexed at release, then it may be possible to utilize effective humeral internal rotation in bowling while satisfying the current 15° elbow extension angle limit.  For instance, if in bowling the elbow can be flexed 26° at release, which has been achieved in professional Cuban baseball pitchers (Escamilla et al., 2001), then a bowler can theoretically have an elbow flexion angle of 41° and extend 15° before ball release.  Any amount of elbow extension is allowed after release.  Such a bowling technique would use a bowling arm with a lower absolute elbow angle about the flexion-extension axis (more flexed), which would extend rapidly before and after release.  This technique would share some of the characteristics of a throwing-type action, but still remain legal according to the current elbow extension angle limit.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;But it's not just speculation about new 'bowling' techniques that would be blatant chucks without breaking 15 degrees.  Ferdinands and Kersting studied bowlers (in the lab, not in match conditions) from club level in New Zealand to international level, some of whom had had their actions reported.  There were 'fast' bowlers and spinners.  One of the limitations of the study is that their 'fast' bowlers weren't fast by international standards, and this is the obvious place where the next bit of research should go.&lt;br /&gt;&lt;br /&gt;There were six bowlers studied with suspect actions.  Five of these had mean elbow extensions less than 15 degrees, and indeed at least 75% of the balls bowled by those with suspect actions passed the 15 degree test (it's not clear precisely how many from the graphs).&lt;br /&gt;&lt;br /&gt;Let's have a look at the box-and-whisker plot for the various groups considered:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/ferdinands07-1.png"&gt;&lt;br /&gt;&lt;br /&gt;The boxes represent the middle 50% of deliveries, in terms of how much elbow extension there is.  The horizontal lines in them show the medians.&lt;br /&gt;&lt;br /&gt;The suspect box starts higher and finishes higher than the others (so there is some correlation between elbow straightening and apparent chucking), but there's a big overlap with spinners and fast bowlers.  So just going by the elbow straightening isn't very good at distinguishing those with bad-looking actions from those with clean actions.&lt;br /&gt;&lt;br /&gt;Now (at last!) here comes the key point.  In addition to just measuring the total straightening, they also measured the &lt;i&gt;rate&lt;/i&gt; of the elbow extension, the 'elbow extension angular velocity'.  Now the box-and-whisker plot clearly shows up the chuckers:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/ferdinands07-2.png"&gt;&lt;br /&gt;&lt;br /&gt;The bottom of the suspect box is at around 200 degrees per second, and the top of any of the other boxes is around 100 degrees per second.&lt;br /&gt;&lt;br /&gt;The implication is clear: measuring the elbow extension angular velocity gives much, much better agreement with the naked eye than just going by the total straughtening.  It's not perfect, and there were some deliveries in the non-suspect groups that went above their suggested threshold of 200 degrees per second, but it's a lot better than what we currently do.  And importantly, it gives us an objective definition that generally agrees with what our eyes tell us is a clean action and what is not.&lt;br /&gt;&lt;br /&gt;As I said earlier, we need to see this research done on a large group of international-class bowlers before applying it to international cricket.  But the results are very promising.  At the very least, the 200-degree-per-second cutoff is better than the 15-degree cutoff, which clearly doesn't work.  &lt;br /&gt;&lt;br /&gt;I'll finish by noting that this gets back to the old 'jerk' definition.  A gradual straightening caused by general stress on the elbow during the bowling action isn't jerky, but a rapid bit of straightening just before release &lt;i&gt;is&lt;/i&gt; jerky.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-688994204735849212?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/688994204735849212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=688994204735849212' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/688994204735849212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/688994204735849212'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/what-is-chuck.html' title='What is a chuck?'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_ferdinands07-1.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-369060881977624895</id><published>2008-03-20T10:32:00.000+01:00</published><updated>2008-03-20T10:33:34.656+01:00</updated><title type='text'>Adjusting averages for not-outs, take three</title><content type='html'>This is my third attempt to implement a good method to deal with no-outs when calculating batting averages.  The first two happened before this blog started (I've backdated one &lt;a href="http://pappubahry.blogspot.com/2007/11/not-outs-and-batting-averages.html"&gt;here&lt;/a&gt;), but both had flaws.  The flaw in the one I just linked to is a subtle one, and I only realised it after reading &lt;a href="http://blogs.cricinfo.com/itfigures/archives/2008/03/hanging_in_there_after_a_hundr.php"&gt;this post&lt;/a&gt; by Charles Davis.&lt;br /&gt;&lt;br /&gt;He was interested in calculating the average number of runs scored once you reach a century.  This is basically the same question I have in projecting not-outs forward.  If a batsman who averages 40 finishes 100 not out, how many extra runs would he have scored?&lt;br /&gt;&lt;br /&gt;The way I did this originally was like this:&lt;br /&gt;1. Take all innings greater than or equal to 100.&lt;br /&gt;2. Take their average.&lt;br /&gt;3. Subtract 100.&lt;br /&gt;&lt;br /&gt;This seems reasonable, but Davis points out an anomaly.  Suppose a batsman has scores of 100 not out, 100 not out, and 100.  Then his average calculated by this method is 300 - 100 = 200.  But he's never scored a run past 100.  So what the procedure should be is:&lt;br /&gt;1. Take all innings greater than or equal to 100.&lt;br /&gt;2. Subtract 100 from each.&lt;br /&gt;3. Take their average.&lt;br /&gt;&lt;br /&gt;Now this example is extreme, but the problem is a significant one when you do this over all batsmen at all scores, because there are a lot of not-outs at each score.  If you're interested, compare the graph below to the bad one in my earlier post.&lt;br /&gt;&lt;br /&gt;For this graph, I took each batsman with a Test average of at least 40, and computed their average increases at each run (up to their highest score), and then took the average over all players at each score.  If a batsman's highest score was a not-out, I added the batsman's average to it and turned it into an 'out'.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/notoutproj.png"&gt;&lt;br /&gt;&lt;br /&gt;The average increase from zero (i.e., the overall average) is 47,5.  The average increase from 1 is 49,8.  So in a sense, your first run is worth three.  This, along with the steady increase in the curve that you see until about 85, is just the effect of getting your eye in, and batting becoming easier as you continue to score runs.  &lt;br /&gt;&lt;br /&gt;The dip either side of 100 is what you might call a psychological feature &amp;mdash; it's there because batsmen often drop their concentration once reaching a century and get out soon afterwards.  The curve rises again until about 125, and then there's a pretty steady downward trend, with two more psychological dips around 200 and 250.  There also looks to be one around 300, but there aren't many data points there.&lt;br /&gt;&lt;br /&gt;The curve has a lot of noise in it, and before using it to project not-outs forward, it's worth smoothing out the non-psychological bits.  I didn't spend too much time doing this, and there are a couple of ugly splices, and in one place scoring a run actually sends you backwards by one run.  That shouldn't be too serious in the grand scheme of things.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/notoutprojsmoothed.png"&gt;&lt;br /&gt;&lt;br /&gt;Now, you wouldn't want to just use this curve to project not-outs, because obviously some batsmen are better than others at making large scores.  Steve Waugh v Mark Waugh is an obvious example.  On the other hand, if you're projecting a not-out, and there's only one innings higher to work with, then that higher innings is probably not representative, and it's useful to use the overall average increase given in the graph.&lt;br /&gt;&lt;br /&gt;Note that when using the graph on an individual batsman, I move it up or down so that the average increase from zero matches his average.&lt;br /&gt;&lt;br /&gt;To compromise between just going by the individual and just going by the graph, I used the following formula, where &lt;i&gt;n&lt;/i&gt; is the number of innings larger than the not-out to be projected:&lt;br /&gt;proj = 1/sqrt(n+1) * proj_by_individual + (1 - 1/sqrt(n+1)) * proj_by_overall.&lt;br /&gt;&lt;br /&gt;The co-efficients here are arbitrary, but I think they look OK.  If there's one innings to work with, it gets about a 30% weight, and the graph gets 70% weight.  If there are three innings, it's 50-50.&lt;br /&gt;&lt;br /&gt;Now for some results.  In the following table I've listed the top 20 batsmen as measured by this adjusted average.  There's no adjustment for era or quality of bowling.  The 'diff' is the difference between regular average and adjusted average.  It's positive is the regular average is higher (i.e., inflated by not-outs), and negative if the regular average is lower (deflated by not-outs).  The rank is the rank by regular average, which lets you see how the batsmen have shuffled around.  Qualification 20 innings.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          inns  no  runs  avg    adj    diff   rank&lt;br /&gt;DG Bradman    80    10  6996  99,94  99,38  +0,57  1&lt;br /&gt;MEK Hussey    36    8   2188  78,14  74,25  +3,90  2&lt;br /&gt;RG Pollock    41    4   2256  60,97  61,60  -0,63  3&lt;br /&gt;GA Headley    40    4   2190  60,83  60,95  -0,12  4&lt;br /&gt;WR Hammond    140   16  7249  58,46  60,04  -1,58  10&lt;br /&gt;H Sutcliffe   84    9   4555  60,73  59,62  +1,11  5&lt;br /&gt;GS Sobers     160   21  8032  57,78  59,02  -1,24  11&lt;br /&gt;E Paynter     31    5   1540  59,23  58,80  +0,44  6&lt;br /&gt;RT Ponting    191   26  9676  58,64  58,76  -0,11  8&lt;br /&gt;ED Weekes     81    5   4455  58,62  58,49  +0,13  9&lt;br /&gt;KF Barrington 131   15  6806  58,67  57,83  +0,84  7&lt;br /&gt;KC Sangakkara 114   10  5914  56,87  57,18  -0,32  14&lt;br /&gt;SR Tendulkar  236   26  11851 56,43  57,06  -0,63  18&lt;br /&gt;L Hutton      138   15  6971  56,68  57,06  -0,38  16&lt;br /&gt;JH Kallis     195   32  9394  57,63  56,78  +0,85  12&lt;br /&gt;CL Walcott    74    7   3798  56,69  56,53  +0,16  15&lt;br /&gt;JB Hobbs      102   7   5410  56,95  56,51  +0,43  13&lt;br /&gt;RS Dravid     202   25  10015 56,58  56,45  +0,13  17&lt;br /&gt;Mohd Yousuf   138   12  7009  55,63  55,59  +0,04  19&lt;br /&gt;VG Kambli     21    1   1084  54,20  54,55  -0,35  22&lt;/pre&gt;&lt;br /&gt;Overall there's not much change.  Hammand and Sobers move up several places, but otherwise we're dealing with fairly small adjustments to the average.  &lt;br /&gt;&lt;br /&gt;Mike Hussey's adjustment is the largest of any batsman with an average over 40.  That adjustment will likely come down as his career continues and his stats become more like those of other players.&lt;br /&gt;&lt;br /&gt;Considering only batsmen who average over 40 with at least 50 innings, the average difference is -0,16.  So on average, not-outs deflate averages by about a sixth of a run.  There's a very slight (and noisy) trend saying that batsmen with a high proportion of not-outs have their averages deflated more, which also agrees with the idea that not-outs tend to deflate averages.&lt;br /&gt;&lt;br /&gt;As I said, there's a lot of noise.  In that latter dataset, there are 52 batsmen whose averages seem to be inflated by not-outs and 75 whose averages are deflated.  But in almost all cases the differences are pretty small.&lt;br /&gt;&lt;br /&gt;The moral of the story is not to worry about not-outs when looking at a batsman's stats.&lt;br /&gt;&lt;br /&gt;One last comment.  A paper by Clive Loader in 1996 considered Allan Border's career and the effects of not-outs on his average.  It was only one example in the paper, which looked at something called censoring in various contexts.  Using some kind of binomial model, he estimated that not-outs had deflated Border's average by between 1 or 2 runs.  My numbers say that his average was inflated by about two fifths of a run.  An unfortunate disagreement, and I probably won't get to the bottom of it without learning a good deal more statistics, because that paper uses methods beyond my current knowledge.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-369060881977624895?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/369060881977624895/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=369060881977624895' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/369060881977624895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/369060881977624895'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/adjusting-averages-for-not-outs-take.html' title='Adjusting averages for not-outs, take three'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_notoutproj.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4942612618560799622</id><published>2008-03-16T21:28:00.000+01:00</published><updated>2008-03-16T21:30:29.972+01:00</updated><title type='text'>Teams with good spinners bat well against spin and pace.</title><content type='html'>It's useful to check the common wisdom, or things that seem obvious.  Often they're true, but sometimes they're not, and even if true, sometimes an analysis reveals surprising related results.  This post falls into the latter category.  &lt;br /&gt;&lt;br /&gt;If a team has good spinners, then their batsmen should do better when playing against spinners.  This might be because the country generally produces lots of spinners and so batsmen grow up playing lots of them, or because the batsmen get to practice in the nets against good spinners, or some combination of the two.  The same principle should apply to pace bowling.  &lt;br /&gt;&lt;br /&gt;So let's check.  I went to Statsguru and got each of the top eight teams's averages against pacemen, and each team's overall pacemen's average.  To start, I considered only the 2000's.  The results are plotted below:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/pacepace.png"&gt;&lt;br /&gt;&lt;br /&gt;That's quite a strong trend, but the direction agrees with the common sense &amp;mdash; teams whose pace bowlers have high averages don't score as many runs against pace bowlers, and vice versa.  As I said, the trend is very strong, and much of this is due to the luck of world cricket over the past decade.  Australia has had excellent batsmen and excellent pacemen, the West Indies the opposite.  I repeated the exercise for earlier decades (90's, 80's, and 70's; the results for the 1960's go a bit haywire because three out of the seven teams were weak), and the direction of the trend is the same, but the magnitude varied from a slope of -0.17 to -0.8.&lt;br /&gt;&lt;br /&gt;Still, a verification of what we thought we knew, and some ideas of how strong the effect is.  Now let's do the same for spinners.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/spinspin.png"&gt;&lt;br /&gt;&lt;br /&gt;The same basic trend, though with a gentler slope.  The results for the earlier decades are similar, with the slopes ranging from -0.3 (the one shown above) to -0.75.&lt;br /&gt;&lt;br /&gt;Now let's see if we can get null results.  If a team has good pacemen, that should tell us nothing about how well they play spin, right?&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/pacespin.png"&gt;&lt;br /&gt;&lt;br /&gt;Right.  No trend at all.&lt;br /&gt;&lt;br /&gt;Similarly, if a team has good spinners, that should tell us nothing about how they play pace.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/spinpace.png"&gt;&lt;br /&gt;&lt;br /&gt;Erm, wrong.  Teams with good spinners tend to play pace well.  The same trend exists for the earlier decades, with slopes ranging from -0.18 to -0.33 (above).  I don't know what the p-value is, but it looks like it's not just luck.&lt;br /&gt;&lt;br /&gt;I don't know how to explain this.  I have two ideas:&lt;br /&gt;1) learning how to play high-quality spin makes you a better batsman in general;&lt;br /&gt;2) spinners with good pacemen in the team do better than those without, and so the trend is really just "average against pace" v "pace average" in disguise.&lt;br /&gt;&lt;br /&gt;I'm inclined to think that the second of these two ideas is a strong factor, but there's no correlation between spin average and pace average in either of the last two decades.  For the 2000's:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/spinpaceavgavg.png"&gt;&lt;br /&gt;&lt;br /&gt;For the 1980's and 1970's, there is a positive correlation between the two.&lt;br /&gt;&lt;br /&gt;From my post on &lt;a href="http://pappubahry.blogspot.com/2008/02/bowler-support.html"&gt;bowler support&lt;/a&gt;, a useful rule of thumb in dealing with aggregated data like this is that for every run lower a bowler's teammates' bowling averages are, the bowler's average should go down by a quarter of a run.  This works pretty well for the 1980's and 1970's graphs, but doesn't help explain what's going on in the last two decades.&lt;br /&gt;&lt;br /&gt;It's a bit of a puzzle, and I don't know what the answer is.  Teams that have good spinners tend to play pace well.&lt;br /&gt;&lt;br /&gt;(A post-script: I was going to do another post on captaincy today, based on some suggestions from &lt;a href="http://leftarmchinaman.blogspot.com/"&gt;The Atheist&lt;/a&gt;, but my regression failed rather miserably.  I'll have another try later.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4942612618560799622?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4942612618560799622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4942612618560799622' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4942612618560799622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4942612618560799622'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/teams-with-good-spinners-bat-well.html' title='Teams with good spinners bat well against spin and pace.'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_pacepace.png' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1734342390260325865</id><published>2008-03-11T21:35:00.000+01:00</published><updated>2008-03-11T21:36:18.477+01:00</updated><title type='text'>Evaluating captaincy</title><content type='html'>Captaincy is one area of cricket that does not receive much statistical scrutiny.  It is not hard to figure out why &amp;mdash; about the only thing you could easily compute would be how many wins the team had, and this figure is strongly dependent on the quality of the team.  To remedy this, I've come up with a way of estimating how many wins, draws, and losses that a captain would be expected to have, given the strength of his side, the strength of the opponents, and whether they're at home or away.  Once we have these, we can compare to the captain's actual record, and see who does better or worse.&lt;br /&gt;&lt;br /&gt;First, I'll explain how to get the expected results.  This method is a bit rough and can certainly be improved in places, but overall I think it does a good job.  For each team, I calculate the average batting average (averages &lt;a href="http://pappubahry.blogspot.com/2007/12/modified-batting-averages.html"&gt;weighted by the averages of the bowlers faced&lt;/a&gt;), and average bowling average (with wickets &lt;a href="http://pappubahry.blogspot.com/2007/11/weighted-bowling-averages.html"&gt;wickets weighted by the batting average of the batsman dismissed&lt;/a&gt;).  The latter is a little bit tricky &amp;mdash; some teams use more bowlers than others.  So, for each innings, I weighted the bowling averages by the number of balls bowled by each bowler.  Then, if two innings were bowled, I took the average of the two.&lt;br /&gt;&lt;br /&gt;Then you subtract the average bowling average from the average batting average, and you get a rating for the team.  Do the same for the other side, and you get a measure of the difference in strength between the two sides.&lt;br /&gt;&lt;br /&gt;Next you go through all Tests, calculate the difference in strength (to make things consistent, I did home team rating minus away team rating), and find how many wins, draws, and losses there are at various differences in strengths.  I did this by binning all Tests into 20 bins.  Plotted on the graph below are the expected fraction of wins and draws.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/windrawfractions.png"&gt;&lt;br /&gt;&lt;br /&gt;The fractions of wins does basically what we'd expect &amp;mdash; it starts out flat and very low for teams that are outclassed, before rising steadily before plateauing.  There are always going to be some draws (because of rain), so the fraction of wins won't hit zero or one.  Even the weakest of home teams can achieve a draw rate of about 30% (well, maybe not Bangladesh), whereas very weak teams away can only draw about 20% of Tests.&lt;br /&gt;&lt;br /&gt;The trend in draws is a bit different.  It seems to go gently upwards until the teams are evenly matched, and then more sharply downwards as the home team becomes stronger.&lt;br /&gt;&lt;br /&gt;I approximated these curves with piecewise linear functions.  For the draws, it's flat for x less than -27, then upwards so that it hits the y-axis at y = 0,424, then downwards until x = 17, and then flat, at a value of 0,185.&lt;br /&gt;&lt;br /&gt;For the wins, it's flat at 0,031 below x = -13,7, then upwards until x = 17,2, and then flat at a value of 0,785.&lt;br /&gt;&lt;br /&gt;So now, for each Test, I calculate the difference in strength.  Then I plug that number into the fitted graphs to get a fraction of a win, draw, and loss.  For example, suppose that the teams are evenly matched.  Then the home side gets 0,366 wins; 0,424 draws; 0,21 losses.  The wins and losses for the away side are flipped: 0,21 wins and 0,366 losses.&lt;br /&gt;&lt;br /&gt;You do this for each Test that a captain plays, and add up the expected wins, draws, and losses.  Now we can compare to the actual record.&lt;br /&gt;&lt;br /&gt;There's a question here about how to deal with draws.  I decided to ignore them, for a couple of reasons.  The first is that teams which score runs faster should have less draws, but I didn't take strike rate into account when doing the regressions above (I don't have strike rate data for all Test batsmen).  Also, all Tests in Australia (as well as some elsewhere) were played to a finish between 1882/3 and World War II &amp;mdash; no draws in a major cricketing country for over sixty years!  &lt;br /&gt;&lt;br /&gt;So instead I calculated the fraction of wins out of matches that ended in a result, that is: wins / (wins + losses).  Do this for the actual value, divide by the expected value, and you get a ratio saying how much better or worse the captain's record is compared to what would be expected.&lt;br /&gt;&lt;br /&gt;Whether or not it is reasonable to ascribe all the difference to the captain is certainly debatable, but it seems the best thing to do for now.  Here are the best captains, as measured by this statistic.  Qualification 20 Tests as captain.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                    ----expected----  --actual--  exp   act&lt;br /&gt;name            mat w     d     l     w   d   l   w%    w%    ratio&lt;br /&gt;Abdul Kardar    23  5,1   7,1   10,8  6   11  6   0,32  0,50  1,56&lt;br /&gt;GP Howarth      30  7,9   11,1  11,1  11  12  7   0,42  0,61  1,47&lt;br /&gt;J Darling       21  5,8   7,8   7,4   7   10  4   0,44  0,64  1,44&lt;br /&gt;JM Brearley     31  11,3  12,0  7,7   18  9   4   0,60  0,82  1,37&lt;br /&gt;Inzamam-ul-Haq  33  6,8   11,9  14,3  10  10  13  0,32  0,43  1,35&lt;br /&gt;MP Vaughan      41  15,0  14,5  11,5  21  11  9   0,57  0,70  1,24&lt;br /&gt;RB Richardson   24  8,4   8,2   7,4   11  7   6   0,53  0,65  1,21&lt;br /&gt;GA Gooch        34  8,1   12,6  13,3  10  12  12  0,38  0,45  1,20&lt;br /&gt;CA Walsh        22  5,7   7,6   8,7   6   9   7   0,40  0,46  1,16&lt;br /&gt;DG Bradman      24  11,7  7,7   4,6   15  6   3   0,72  0,83  1,16&lt;br /&gt;SP Fleming      80  23,6  27,1  29,3  28  25  27  0,45  0,51  1,14&lt;br /&gt;RB Simpson      39  10,9  14,2  13,9  12  15  12  0,44  0,50  1,14&lt;br /&gt;IVA Richards    50  21,7  18,2  10,1  27  15  8   0,68  0,77  1,13&lt;br /&gt;CH Lloyd        74  31,3  27,0  15,7  36  26  12  0,67  0,75  1,13&lt;br /&gt;N Hussain       45  14,0  15,5  15,5  17  13  15  0,48  0,53  1,12&lt;/pre&gt;&lt;br /&gt;Abdul Kardar, Pakistan's first Test captain, comes out on top.  You can see that he didn't actually win many Tests, but his team managed to draw a lot that they "should" have lost.  It is reassuring to see Mike Brearley so high up.  Inzamam and Michael Vaughan are fifth and sixth.&lt;br /&gt;&lt;br /&gt;Most of those high in the table did not have extended careers as captain, so perhaps some of them were just lucky and are higher than they should be.  Of those with at least 50 Tests, Stephen Fleming is the best, just shading Viv Richards and Clive Lloyd.&lt;br /&gt;&lt;br /&gt;At the other end, we have those who took their teams and un-inspired them to ineptitude:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                    ----expected----  --actual--  exp   act&lt;br /&gt;name            mat w     d     l     w   d   l   w%    w%    ratio&lt;br /&gt;M Azharuddin    47  18,5  17,2  11,3  14  19  14  0,62  0,50  0,80&lt;br /&gt;HH Streak       21  5,0   6,1   9,9   4   6   11  0,33  0,27  0,80&lt;br /&gt;MW Gatting      23  5,1   8,9   9,1   2   16  5   0,36  0,29  0,79&lt;br /&gt;CL Hooper       22  5,2   7,8   8,9   4   7   11  0,37  0,27  0,72&lt;br /&gt;BS Bedi         22  7,0   7,8   7,2   6   5   11  0,49  0,35  0,72&lt;br /&gt;DI Gower        32  7,0   12,0  13,0  5   9   18  0,35  0,22  0,62&lt;br /&gt;JR Reid         34  5,3   11,1  17,6  3   13  18  0,23  0,14  0,62&lt;br /&gt;AC MacLaren     22  6,6   7,9   7,5   4   7   11  0,47  0,27  0,57&lt;br /&gt;KJ Hughes       28  7,3   10,3  10,4  4   11  13  0,41  0,24  0,57&lt;br /&gt;A Flower        20  3,5   6,4   10,1  1   9   10  0,26  0,09  0,36&lt;/pre&gt;&lt;br /&gt;Kim Hughes was pretty lucky to get to captain Australia in 28 Test matches.&lt;br /&gt;&lt;br /&gt;A couple of others to finish (feel free to request others).&lt;br /&gt;&lt;br /&gt;Mark Taylor 0,93, but probably disproportionately many of his losses were in dead rubbers.  He also only had 11 draws (expected 17,5), in the era before Waugh made draws almost extinct for Australia.&lt;br /&gt;&lt;br /&gt;Sunil Gavaskar 1,10 is the best Indian captain.  He achieved this record by turning five expected wins into draws, and seven expected losses into draws.  Not exciting stuff, but it gave him an overall positive record (9 wins, 30 draws, 8 losses).&lt;br /&gt;&lt;br /&gt;Imran Khan at 1,08 didn't do that much better than he should have, and was also pretty drawish (14-26-8, expected 17,6-18,0-12,3).  He did a bit better than Javed, who scores 0,99.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1734342390260325865?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1734342390260325865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1734342390260325865' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1734342390260325865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1734342390260325865'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/evaluating-captaincy.html' title='Evaluating captaincy'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_windrawfractions.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4323880689785038775</id><published>2008-03-09T12:39:00.000+01:00</published><updated>2008-03-09T12:40:05.790+01:00</updated><title type='text'>Wicket-keepers and byes</title><content type='html'>In my &lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;post&lt;/a&gt; on first-class wicket-keepers in England, I pointed out a curious trend &amp;mdash; there were quite a lot of keepers from the 1980's near the top of the table of byes as a percentage of team runs.  I guessed that it was because of the higher run rates during this time, compared with previous decades (and because of the trend afterwards to choose keepers based on their batting).  Using byes per 600 balls gives more reasonable results, but it should still be biased by the prevailing run rate &amp;mdash; if the run rate is higher, the batsmen are probably hitting more balls, so the keeper's bye rate should be lower.  &lt;br /&gt;&lt;br /&gt;Ideally, we'd have a stat for byes per balls that pass the batsman.  But since we'd need ball-by-ball data to find this, we have to make do without.  Yesterday I had an idea of how to adjust for run rates.  I thought it was a good idea, but it didn't work.  First, I'll explain what I wanted to do and how it should have worked.&lt;br /&gt;&lt;br /&gt;You can't use bye rates and run rates and look for a correlation over all matches &amp;mdash; the average standard of wicket-keeping can (and does) change with era.  But the standard of an individual keeper should be fairly consistent over his career, and every keeper will keep in innings where the opposition scores heavily, and in innings where the scoring is slow.  Since some keepers in England play many hundreds of matches, there should be enough data at an individual level to work out what the trend is.&lt;br /&gt;&lt;br /&gt;So for each keeper, I took all innings kept and ordered them by the run rate (when calculating the run rate, I ignored byes).  To avoid the problem of what to do with very short innings (and to make the graphs nicer), I aggregated the data into ten bins, and found the overall average number of byes per 600 balls for each bin.  The result for Alan Knott is below.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/knottbyesrunrate.png"&gt;&lt;br /&gt;&lt;br /&gt;It's not perfect, but the overall trend is pretty clear &amp;mdash; at higher run rates, Knotty gave away less byes.  &lt;br /&gt;&lt;br /&gt;That was the idea.  It turns out that not all keepers have this trend.  Here's Jack Russell:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/russellbyesrunrate.png"&gt;&lt;br /&gt;&lt;br /&gt;It's a nice trend, &lt;i&gt;in the wrong direction&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Taking keepers who kept in at least 100 innings, there are actually more with positive slopes than negative.  Some of this effect might be noise, so I did the same calculation but calculated the regression lines using only the middle eight bins (perhaps short innings come up disproportionately often in the first and last bins, and so the data's less reliable).  &lt;br /&gt;&lt;br /&gt;Then, I considered only keepers whose career began after World War II, and who kept in at least 300 innings.  The result?  Twenty-nine keepers with positive slope, twenty-eight with negative, one flat.  The average slope for this set of keepers was 0,01.&lt;br /&gt;&lt;br /&gt;There does appear to be a slight tendency towards negative slopes for those with very long careers (i.e., more than 725 innings), but that might just be noise, and it's still not a hard-and-fast rule &amp;mdash; Bob Taylor kept in 976 innings, and has a slope of 0,235.&lt;br /&gt;&lt;br /&gt;So that's a bit of a surprising dead end of a conclusion.  Wicket-keepers don't generally give away less byes when the batsmen score runs faster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4323880689785038775?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4323880689785038775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4323880689785038775' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4323880689785038775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4323880689785038775'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/wicket-keepers-and-byes.html' title='Wicket-keepers and byes'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_knottbyesrunrate.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-522247967036790067</id><published>2008-03-08T08:41:00.001+01:00</published><updated>2008-03-08T08:46:06.631+01:00</updated><title type='text'>The meaningfulness of Tests and ODI's</title><content type='html'>Today I want to statistically show what is obvious and logical &amp;mdash; that there's a lot more luck in ODI's than in Tests.  But something not quite so obvious comes up later.&lt;br /&gt;&lt;br /&gt;I got the idea for this sort of analysis from &lt;a href="http://www.insidethebook.com/ee/index.php/site/comments/true_talent_levels_for_sports_leagues/"&gt;this post&lt;/a&gt; by Tangotiger, who usually studies baseball statistics.  (In that post, he gives a method for finding the standard deviation of the talent distribution in a league.  I think that he actually estimates a lower bound for this quantity.)&lt;br /&gt;&lt;br /&gt;I took all Tests and all ODI's between the top eight nations since 2003.  (I did this in Statsguru.  I didn't realise it before, but in the advanced filter, if you click on the 'view' near the right edge of the table, you get checkboxes rather than a dropdown menu.  Do this for Team and Opposition, check the major nations, and in eighteen clicks you've excluded Bangladesh, Zimbabwe, and all the weird teams.)  I then threw away any draws, ties, or no-results, and calculated the fraction of wins for each team.  (So, e.g., a team with five wins, five losses, and five draws will have a fraction of wins of 0,500 &amp;mdash; not 0,333.)&lt;br /&gt;&lt;br /&gt;Tests:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;team         w   l   w%&lt;br /&gt;Australia    35  7   0,833&lt;br /&gt;England      25  16  0,610&lt;br /&gt;South Africa 21  18  0,538&lt;br /&gt;India        14  11  0,560&lt;br /&gt;Pakistan     13  16  0,448&lt;br /&gt;Sri Lanka    10  14  0,417&lt;br /&gt;New Zealand  5   14  0,263&lt;br /&gt;West Indies  4   31  0,114&lt;/pre&gt;&lt;br /&gt;ODI's:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Australia    93  34  0,732&lt;br /&gt;India        61  69  0,469&lt;br /&gt;South Africa 53  41  0,564&lt;br /&gt;New Zealand  50  52  0,490&lt;br /&gt;Pakistan     50  53  0,485&lt;br /&gt;Sri Lanka    50  56  0,472&lt;br /&gt;England      38  61  0,384&lt;br /&gt;West Indies  30  59  0,337&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The gap in winning percentage between top and bottom is much larger in Tests than in ODI's, which is what we would expect &amp;mdash; Tests better differentiate between the quality of the teams.  The standard deviation of the win percentage for Tests is 0,219; that for ODI's is 0,119.&lt;br /&gt;&lt;br /&gt;So far, nothing you wouldn't have guessed.  But it's interesting to compare this to what you'd expect from chance.  That is, if every match that ends in a result were decided by a coin toss, what standard deviation would you expect?  The SD for the number of wins out of n games would be sqrt(n/4), from the binomial distribution.  The variance, being the square of the SD, would be n/4.  The fraction of wins is the number of wins divided by n.  Now, Var(aX) = a²Var(X), so the variance in the fraction of wins would be 1/(4n).  So the SD would be sqrt(1/(4n)).&lt;br /&gt;&lt;br /&gt;If you take n as the average number (for each team) of result matches in each set (31,75 for Tests; 106,25 for ODI's), you get the SD's expected from chance as 0,088 for Tests and 0,049 for ODI's.&lt;br /&gt;&lt;br /&gt;What you'd like in a distribution of winning percentages is that it's clearly wider than what you'd expect from chance (so that you can conclude that the differences between teams are due to differences in the quality of their play, not just the luck of the day).  Since the SD you'd expect from chance for ODI's is smaller than that for Tests (because more ODI's are played), you actually don't need the real SD for ODI's to be as large as that for Tests, in order to sort the teams out.&lt;br /&gt;&lt;br /&gt;A simple way to quantify this (there may be a better way) is to take the observed SD divided by the SD expected from chance.  For Tests, this is 0,219/0,088 = 2,47.  For ODI's, it's 0,119/0,049 = 2,46.&lt;br /&gt;&lt;br /&gt;Almost exactly the same!  That's probably a little bit lucky &amp;mdash; those numbers would probably be a bit further apart if I'd picked a different period &amp;mdash; but it shows that, in terms of sorting out which the ranking of the teams, the balance between Tests and ODI's is about right.  There were, over this period, about 2,4 ODI's played between these teams for every Test.&lt;br /&gt;&lt;br /&gt;That doesn't mean I like all these ODI's!  Each one of them is, in itself, much more meaningless than a Test match (at least outside World Cups, and except for draws, which are happily a minority of Tests these days).  And even though they take up fewer playing days in total than Tests, each match is independent of the others.  A bad day for a team doesn't matter &amp;mdash; the teams start from scratch again in two days' time.  In a Test match, of course, a bad day directly affects the remainder of the match.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-522247967036790067?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/522247967036790067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=522247967036790067' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/522247967036790067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/522247967036790067'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/meaningfulness-of-tests-and-odis.html' title='The meaningfulness of Tests and ODI&apos;s'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2568310472753364869</id><published>2008-03-06T11:00:00.000+01:00</published><updated>2008-03-06T11:01:13.980+01:00</updated><title type='text'>Left-handers</title><content type='html'>There are a handful of papers in academic journals that analyse cricket statistics.  The methods used in these papers tend to be far more sophisticated than what I use (and usually I don't even understand them), but often the results are interesting and/or useful.  Unfortunately, they tend to languish in academic journals, unknown by the average cricket fan.  To try to remedy this, every now and then I'll have a look at one of these papers and discuss the methods and results.&lt;br /&gt;&lt;br /&gt;The first paper I'll look at is by Robert Brooks et al.  It's called &lt;i&gt;Sinister strategies succeed at the cricket World Cup&lt;/i&gt;, and was published in the Proceedings of the Royal Society Series B (Biology Letters, Supplement)  271: S64.  You can get a copy from the website of one of the authors &lt;a href="http://www.anu.edu.au/BoZo/hunt/publications/Hunt20.pdf"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The authors studied the 2003 World Cup, in an attempt to see work out why left-handers are more prevalent in top-level cricket than in the general population.  Cricket's not unique in this regard &amp;mdash; most sports involving one-on-one contests have a higher proportion of left-handers.  Individual sports (such as athletics or golf) do not.  &lt;br /&gt;&lt;br /&gt;My own feeling was that a large part of left-handed batsmen's success is because the stock ball of right-arm pacemen usually swings into them, and inswingers are easier to play than outswingers.  But this paper by Brooks gives strong evidence to suggest that it's more a case of bowlers not being used to bowling to left-handers.&lt;br /&gt;&lt;br /&gt;The paper draws out two effects.  The first is that weaker countries have a lower proportion of left-handers.  The suggested reason is that when the domestic competition is weak, natural talent is the biggest factor in getting selected for the national side &amp;mdash; the variation in talent is large enough so that the left-handers' natural advantage is not important.  But at a stronger level of competition, where there is less variation in players' ability, the extra advantage that left-handers have becomes more important, leading to disproportionately many of them in national teams.  Their figure below shows the trend:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/brooks04-1.png"&gt;&lt;br /&gt;&lt;br /&gt;On the vertical axis is the team's net run rate (i.e., how good they are), and on the horizontal axis is the percentage of innings by left-handers.  They've fitted a quadratic to the data, which gives a pretty good fit.  The interesting feature is that the quadratic peaks at close to 50% left-handers, suggesting that the ideal batting line-up should have an equal number of left- and right-handers.&lt;br /&gt;&lt;br /&gt;Now, the obvious explanation for this is that teams with equal numbers of right- and left-hand batsmen enjoy lots of opposite-handed partnerships, and it is an accepted piece of wisdom that bowlers struggle when having to change their line when the batsmen rotate the strike.&lt;br /&gt;&lt;br /&gt;But this does not look to be a significant factor.  The authors looked at each batsman when they were in partnership with someone of the same hand or with someone of the opposite hand, and found no significant difference.  There's some mixed evidence on the usefulness of left-right partnerships.  In &lt;i&gt;The Best of the Best&lt;/i&gt;, Charles Davis says that left-right opening partnerships (in Tests) average about 15% more runs than would be expected based on the individual averages, whereas same-handed partnerships are about average.  My own figures, based on a regression on opening batsmen's averages, puts left-right combinations at 6% better than they should be, and same-handed partnerships 4% worse.  But there is plenty of individual variation.  It does certainly look like there's a real effect, but you need a large dataset to see it &amp;mdash; much larger than just one World Cup &amp;mdash; and this is why the authors of the paper didn't find anything significant.&lt;br /&gt;&lt;br /&gt;Nevertheless, if there is an advantage to having 50% of the team left-handed, and left-right partnerships are not significant or small, then there has to be something else.  The authors show us the following graph.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/brooks04-2.png"&gt;&lt;br /&gt;&lt;br /&gt;On the horizontal axis is the percentage of "bowlers' wickets" for each team, and on the vertical axis is the difference between balls faced by left-handers and right-handers by the batting team.  Bowlers' wickets are defined as catches from edges, LBW's, and bowleds.  They had to do a lot of trawling through Cricinfo's commentary archives to find catches that were at slip!  &lt;br /&gt;&lt;br /&gt;The trend here is pretty obvious.  When there are more bowlers' wickets (suggesting stronger bowling attacks... or really bad fieldsmen), left-handers don't enjoy as much of an advantage over right-handers.  The explanation offered by the authors is that weaker bowlers tend to come from weaker competitions, where there are not so many left-handed batsmen.  So these bowlers aren't as used to bowling to left-handers so much.&lt;br /&gt;&lt;br /&gt;This gives us a reason for the optimum of 50% left-handers.  Any more than 50%, and the bowlers would be so used to lefties than right-handers would start to have an advantage.  &lt;br /&gt;&lt;br /&gt;So it looks like most of the left-handers advantage comes down to bowlers not being used to bowling at them.  But the overall story is certainly more complicated.  In &lt;i&gt;The Best of the Best&lt;/i&gt;, Davis shows that players who bowl right and bat left do better, on average, than players who bowl left and bat left.  Is this because the top hand is more important, so that's where you want your dominant hand?  Who knows?  Players who bowl left and bat right do worse than players who bowl right and bat right.  I don't understand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2568310472753364869?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2568310472753364869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2568310472753364869' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2568310472753364869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2568310472753364869'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/left-handers.html' title='Left-handers'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_brooks04-1.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-8829392678387183943</id><published>2008-03-02T21:00:00.003+01:00</published><updated>2008-10-21T06:27:47.476+02:00</updated><title type='text'>Partly explaining all these double-centuries</title><content type='html'>It's pretty obvious that there are a lot more double-centuries being scored these days than in previous eras.  This isn't just because there are more Tests being played now than ever before: Charles Davis calculated the percentage of centuries converted into double-centuries &lt;a href="http://www.sportstats.com.au/blognov04tojun05.html"&gt;in early 2005&lt;/a&gt; (search for the "Double the fun for Ponting" post).  Whereas this was between 7 and 8 percent from the 1960's through to the 1990's, it jumped to 11.4% from 2000 to 2005.  That is, more than one in ten centuries were turned into doubles.&lt;br /&gt;&lt;br /&gt;Davis said that the overall batting average had rised a little bit, but not enough to account for this rise.  He suggested that while bowlers overall were a little bit weaker now, they're particularly weak when a batsman gets well set &amp;mdash; "once bowling attacks are beaten down, there is less capacity for comeback."&lt;br /&gt;&lt;br /&gt;The key question here is, if the overall batting average rises by some amount, how much should the proportion of centuries-that-are-doubles rise?  &lt;br /&gt;&lt;br /&gt;The answer to that question depends on the distribution of individual scores.  Let's assume that the distribution is exponential.  It's not, but if we compare decades, hopefully the errors will roughly cancel out, giving us a meaningful comparison.&lt;br /&gt;&lt;br /&gt;If the overall average is µ, then the fraction of scores greater than or equal to 200 is exp(-200/µ).  The fraction of scores greater than or equal to 100 is exp(-100/µ).  So the fraction of centuries turned into doubles is exp(-200/µ)/exp(-100/µ) = exp(-100/µ).  Note that this is just the fraction of centuries &amp;mdash; this is the memoryless property of the exponential distribution.&lt;br /&gt;&lt;br /&gt;Then we take µ&lt;sub&gt;1&lt;/sub&gt; for the 1990's, and µ&lt;sub&gt;2&lt;/sub&gt; for the 2000's, and use these to find the expected fraction of centuries that are doubles.  &lt;br /&gt;&lt;br /&gt;I considered only batsmen in positions 1 to 7, since that's where most centuries come from.  The overall average for these batsmen in the 1990's was 35.35, and for the 2000's it was 38.04.  &lt;br /&gt;&lt;br /&gt;Plug these numbers in, and you expect 5.9% for the 1990's, and 7.2% for the 2000's.  Now, we expect that these values will be wrong (and they are) because the distribution isn't really exponential.  But dividing one by the other should mostly cancel this out, and so we expect that the proportion of centuries turned into double should rise by a factor of 1.22.  A less than 10% rise in batting average leads to a greater than 20% rise in centuries turned into doubles.&lt;br /&gt;&lt;br /&gt;The proportion for the 1990's was 7.6% (I lumped not-outs and outs together; Davis says 7.9%).  For the 2000's it's 10.5%.  The proportion increased by a factor of 1.38.  Just based on the averages, you'd have expected it to rise to 9.3%.  The difference here is about 9 centuries over the course of the decade to date, or about one extra double-century per calendar year.&lt;br /&gt;&lt;br /&gt;That sounds like something you can blame on the minnows, but that's not the case.  If you re-do the analysis excluding Bangladesh and Zimbabwe, then the figures become expected 1.24 and observed 1.39.  &lt;br /&gt;&lt;br /&gt;So there does appear to be a real effect, but it's not that great.  The general rise in batting averages is most important factor, but there is this extra double-century a year that "shouldn't" happen.&lt;br /&gt;&lt;br /&gt;There's no point stopping here.  What's special about 200?  I did a similar above analysis for Tests from the 1950's onwards.  Then, grouping by decade, I found the fraction of scores greater than or equal to 1, greater than or equal to 2, etc. up to 240.  For a reference case, I also did this for all Tests during this period.&lt;br /&gt;&lt;br /&gt;Then for each decade, I calculated the observed increase or decrease of the fraction of each score from the reference case (as a factor, e.g., 6% to 9%, an increase by a factor of 1.5), and the expected increase or decrease based on the decade average against the reference average (e.g., 5% to 7%, an increase by a factor of 1.4).&lt;br /&gt;&lt;br /&gt;Take the observed increase minus the expected increase (1.5 - 1.4 = 0.1), and you get a measure of how common scores greater than or equal to the given score are, against what they "should" be.  A positive value tells you that scores greater than or equal to the given score are more common than you would expect, based on the decade average.&lt;br /&gt;&lt;br /&gt;It's graph time.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/propscoresnotexpected.png"&gt;&lt;br /&gt;&lt;br /&gt;I hope you can make out the different colours.  The most striking feature of the graph is the curve for the 1950's.  Despite the overall average being much lower (only 32.43), there were a comparable fraction of large scores to what we see today, when the overall average is 38!  &lt;br /&gt;&lt;br /&gt;The curve for the 1960's is kind of a damped mirror image of the previous decade &amp;mdash; less centuries than you would expect.  The curve does start to come back up after 200, but I'd be sceptical about reading too much into the curves much past 200, since those scores are pretty rare and statistical noise becomes more prevalent.&lt;br /&gt;&lt;br /&gt;The 1970's is similar to the 1960's, though it was closer to the expected.&lt;br /&gt;&lt;br /&gt;The 1980's are almost dead on expected all the way up to 200.&lt;br /&gt;&lt;br /&gt;The 1990's are a bit below expected for large scores.&lt;br /&gt;&lt;br /&gt;The 2000's are a bit above expected, particularly above about 175.  There's an amusing dip just past 200: the number of scores greater than or equal to 204 is almost exactly as expected.&lt;br /&gt;&lt;br /&gt;So there you go.  We have more double-centuries than we should (not by much), but a bigger phenomenon is the number of scores greater than 175.  I blame Michael Vaughan.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-8829392678387183943?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/8829392678387183943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=8829392678387183943' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8829392678387183943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/8829392678387183943'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/03/partly-explaining-most-of-all-double.html' title='Partly explaining all these double-centuries'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_propscoresnotexpected.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4876570438959864336</id><published>2008-02-25T08:30:00.000+01:00</published><updated>2008-02-25T08:31:24.132+01:00</updated><title type='text'>Scheduling cricket around the IPL</title><content type='html'>I had planned a couple of more blog entries before my holiday to Italy, but I've got distracted.  So before I disappear for a week, here are some thoughts on the IPL and cricket scheduling.  This is a bit of a change from my usual fare, but since everyone's talking about the IPL and its consequences, I thought it was worth sharing.  &lt;br /&gt;&lt;br /&gt;I am a big supporter of the IPL, and I hope it's a big success.  But there is one big danger that I see, and that's if it expands too much.  This year, the IPL teams will play in a double round-robin &amp;mdash; fourteen matches each &amp;mdash; before semi-finals and final.  There's huge scope to make the tournament bigger.  Teams in Major League Baseball (where games last about as long as T20's) play 162 games a season.  Now I'm not saying that anyone wants a cricket tournament in which each team plays 162 times, but the point is clear: the IPL could get much bigger.&lt;br /&gt;&lt;br /&gt;I've done some scribbling, and I think that Test cricket can survive in much its present form as long as the IPL season is not longer than four months.  IPL Teams could play, say, five games a fortnight, and so the home-and-away season could be around 35 matches per team, which would be followed by a finals series.  &lt;br /&gt;&lt;br /&gt;To make it concrete, let's assume that the IPL could fill up February to May.&lt;br /&gt;&lt;br /&gt;England's home Tests will have to be between June and September, much as they are now.  Most other Tests would be between October and January, though Tests in June or July are possible in, e.g., Australia.  (The West Indies never plays Tests before January &amp;mdash; does anyone know why?  Their domestic season starts in November.)&lt;br /&gt;&lt;br /&gt;I see a five-Test tour looking like this:&lt;br /&gt;&lt;br /&gt;tour match: days 1-4&lt;br /&gt;tour match: days 7-10&lt;br /&gt;Test 1: days 14-18&lt;br /&gt;Test 2: days 21-25&lt;br /&gt;tour match: days 31-34&lt;br /&gt;Test 3: days 38-42&lt;br /&gt;tour match: days 46-50&lt;br /&gt;Test 4: days 56-60&lt;br /&gt;Test 5: days 63-67&lt;br /&gt;&lt;br /&gt;Three-Test tours would be truncated after the third Test.  It should be clear what my thoughts on first-class tour matches are!  There are no one-day matches, though you could probably squeeze one or two in either at the beginning or at the end.  Or you could shorten the tour matches, or get rid of one, to make space.&lt;br /&gt;&lt;br /&gt;I would like to see Australia play five-Test series against England, South Africa, and India.  Australia's home schedule would look something like:&lt;br /&gt;&lt;br /&gt;Eng in summer, WI in winter&lt;br /&gt;NZ, SL in summer, Bd in winter&lt;br /&gt;Ind in summer, Pak in winter&lt;br /&gt;SA in summer&lt;br /&gt;Eng in summer&lt;br /&gt;&lt;br /&gt;The Ashes stay on a four-year cycle, and everyone else is on five-year cycles.  This could easily be relaxed to six-year cycles.  The non-England teams could be shuffled around from winter to summer, depending on other schedules or commercial considerations.&lt;br /&gt;&lt;br /&gt;Other countries may also want five-Test series (they'd be good with India-Pakistan, etc.), but they'll have to be able to host Tests outside of the October to January window, have touring teams accept the loss of some tour match days, or the loss of rest days between matches.  The other alternative (probably the more reasonable one) is to have six-year cycles.&lt;br /&gt;&lt;br /&gt;England's home schedule would look like this (five-year cycle):&lt;br /&gt;Bd (2 Tests to be squeezed in), Aus&lt;br /&gt;WI, SL&lt;br /&gt;SA, NZ&lt;br /&gt;Ind, Pak&lt;br /&gt;Aus&lt;br /&gt;&lt;br /&gt;If they had a six-year cycle, it could be:&lt;br /&gt;Aus&lt;br /&gt;WI, SL&lt;br /&gt;SA&lt;br /&gt;Ind, Pak&lt;br /&gt;Aus&lt;br /&gt;NZ, Bd&lt;br /&gt;&lt;br /&gt;Or something like that.  There are fiddly details that I haven't worked out, but basically if you spend enough time fiddling, everyone should get to play each other, there'll be more five-Test series than there are now, the IPL can go for four months, and Test cricket will survive.  The Boards would make less money from international cricket under this proposal, because there aren't any ODI's, but hopefully there will be active and popular domestic T20 competitions to boost the coffers.  Your cricket fan in Australia in February watches the NSW v Victoria T20 game before flicking over to watch a Kolkata play Mumbai game afterwards.  Something like that.  (I'm optimistic about domestic T20 being viable, because I was part of the crowd of over 27000 who watched Queensland play New South Wales at the Gabba in 2006/7.  Now, no other Australian domestic crowd recently has come close to that, outside finals, but if it was at the forefront of the cricketing calendar, I think large crowds would be common.)&lt;br /&gt;&lt;br /&gt;This is obviously a bit Utopian (and I've ignored the Champions League), but it least it shows in principle that things can all work out.&lt;br /&gt;&lt;br /&gt;I should be back blogging next Monday or Tuesday.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4876570438959864336?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4876570438959864336/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4876570438959864336' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4876570438959864336'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4876570438959864336'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/scheduling-cricket-around-ipl.html' title='Scheduling cricket around the IPL'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4856190929960408693</id><published>2008-02-23T17:03:00.000+01:00</published><updated>2008-02-23T17:04:17.249+01:00</updated><title type='text'>The IPL player auction</title><content type='html'>Sorry for the delay in updating &amp;mdash; I've just got back from a short holiday in Amsterdam.  On Tuesday morning I'm heading off to Florence and Rome, so there'll be another break in posting soon.  &lt;br /&gt;&lt;br /&gt;I had a question on whether Indian players were valued more than non-Indians in the IPL auction.  The answer is that they were, by about $250k each.  My analysis is a bit rough, since I didn't want to get bogged down in details in the couple of days I have before Italy.&lt;br /&gt;&lt;br /&gt;Firstly, it's important to note that it's not a free market &amp;mdash; there were requirements on young players, international players, icon players would have distorted the market, etc.  But we'll see what the numbers tell us.&lt;br /&gt;&lt;br /&gt;I took all the non-icon players who had ODI stats (or, failing that, List A stats) that included batting strike rate.  Because it seemed a reasonable thing to do, I gave each player a batting rating, defined as the batting average multiplied by the strike rate, divided by 100, divided by 20 (roughly).  For bowlers (and I chose bowlers by looking at them and deciding whether I'd consider their bowling in buying them; there's a grey area of course, but for most players it's pretty obvious) I gave a bowling rating: bowling average times economy rate, divided by 6, divided by 25.&lt;br /&gt;&lt;br /&gt;I might be biasing the ratings towards batsmen or towards bowlers, but it shouldn't be too bad.  Then I added the batting and bowling ratings for an overall player rating.&lt;br /&gt;&lt;br /&gt;I put three other variables into the regression model: number of matches (a bit dodgy in one or two cases, where I used List A rather than ODI's), and dummy variables for Indians and wicket-keepers.  &lt;br /&gt;&lt;br /&gt;I probably should have done something about the Australians and West Indians, who are only available for half the tournament, but I couldn't be bothered.&lt;br /&gt;&lt;br /&gt;Here are the results of the regression:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Modèle 1: Estimation en MCO avec 70 observations 1-70&lt;br /&gt;Variable dépendante: salary&lt;br /&gt;&lt;br /&gt;      VARIABLE       COEFFICIENT        ERR. STD         T           p. critique&lt;br /&gt;  const             46927,8         114520             0,410   0,68332&lt;br /&gt;  mat                 673,801          361,921         1,862   0,06716 *&lt;br /&gt;  rating           163109            62931,6           2,592   0,01178 **&lt;br /&gt;  indian           267326            71927,6           3,717   0,00042 ***&lt;br /&gt;  keeper           136852            93199,5           1,468   0,14682&lt;br /&gt;&lt;br /&gt;  Moyenne de la variable dépendante = 504357&lt;br /&gt;  Écart-type de la var. dép. = 286130&lt;br /&gt;  Somme des carrés des résidus = 4,25589e+012&lt;br /&gt;  Erreur standard des résidus = 255881&lt;br /&gt;  R2 non-ajusté = 0,246619&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Key points:&lt;br /&gt;&lt;br /&gt;- There is a slight positive correlation between matches (i.e., experience) and salary.  For every hundred extra ODI's, the salary goes up by about $65000.&lt;br /&gt;&lt;br /&gt;- My hastily calculated player ratings are positively correlated with salary.  Increase the batting average (times strike rate) by 10, your salary goes up by $80000.&lt;br /&gt;&lt;br /&gt;- If you're Indian, you get a bonus $265000.  Indian cricketers can expect to be part of marketing campaigns.&lt;br /&gt;&lt;br /&gt;- Wicket-keepers get an extra $135000, and I'll ignore the p value which tells me that it's not significant.  The extra money they get is expected, since I didn't incorporate wicket-keeping skills into the player ratings.&lt;br /&gt;&lt;br /&gt;- These factors explain 25% of the statistical variance, which is 50% of the salaries in cricket terms.&lt;br /&gt;&lt;br /&gt;Now just for a bit of fun, I decided to use the player ratings to work out how many dollars each team spent per player rating point.  I've fixed it so that the teams are on a scale of 3 to 9, so that I can compare with &lt;a href="http://www.wellpitched.com/2008/02/bid-o-meter-who-were-smartest-bidders.html"&gt;Q&lt;/a&gt;.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Team        Me   Q&lt;br /&gt;Jaipur      9    3&lt;br /&gt;Chennai     6,3  7&lt;br /&gt;Mumbai      3    6&lt;br /&gt;Bangalore   5,3  5&lt;br /&gt;Hyderabad   5,8  8&lt;br /&gt;Mohali      3,1  7&lt;br /&gt;Kolkata     3,5  9&lt;br /&gt;Delhi       6,0  9&lt;/pre&gt;&lt;br /&gt;The conclusion here is that at least one of me and Q has no idea what we're doing.  Of course, my analysis is based purely on ODI numbers (possibly out of date &amp;mdash; several people have said that T20 is a young man's game, with the play very fast), whereas Q's looked at T20 form and crowd-drawing power.  Even so!  I suspect the difference of our ratings of Jaipur is that they didn't actually spend much money on players.  So they got quality for what they spent, but the overall team isn't all that good.  The point of the bidding process is to get the best team (including marketing, etc.), not to get the most player rating points per dollar.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4856190929960408693?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4856190929960408693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4856190929960408693' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4856190929960408693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4856190929960408693'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/ipl-player-auction.html' title='The IPL player auction'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-6785332091350160139</id><published>2008-02-17T19:36:00.002+01:00</published><updated>2008-02-17T20:21:20.819+01:00</updated><title type='text'>1800's first-class cricket in England: wicket-keepers</title><content type='html'>This is Part 9, and also the final instalment, in my series on first-class cricket in the 1800's in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post I look at wicket-keepers.  I've decided to focus on pure wicket-keeping, and so I've ignored batting.  It's a bit boring-listy, but there's a graph below all the tables.&lt;br /&gt;&lt;br /&gt;Many of the early scorecards do not indicate who the wicket-keeper was.  If there were one or more stumpings, whoever effected the first one was deemed as the keeper.  I don't think that this is much of a problem in terms of the tables below, since most of the record-getters played once the scorecards became more complete.  Nevertheless, there are probably minor errors, since sometimes teams change wicket-keepers, and I've allocated all the innings byes to the first one (that I know of).&lt;br /&gt;&lt;br /&gt;To begin, let's have a look at the leading keepers by dismissal in the 1800's.  The last column is the percentage of team runs conceded as byes.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat ct   st  dis   b %&lt;br /&gt;M Sherwin     1876  1896  308 577  205 782   3,05&lt;br /&gt;EW Pooley     1864  1883  289 420  352 772   2,29&lt;br /&gt;H Wood        1881  1899  290 532  114 646   2,87&lt;br /&gt;D Hunter      1888  1899  252 437  180 617   2,98&lt;br /&gt;R Pilling     1877  1889  226 418  187 605   3,04&lt;br /&gt;H Phillips    1869  1891  209 335  184 519   2,97&lt;br /&gt;HR Butt       1890  1899  204 334  130 464   2,46&lt;br /&gt;JH Board      1891  1899  185 329  115 444   2,60&lt;br /&gt;T Lockyer     1849  1866  154 234  112 346   1,78&lt;br /&gt;J Hunter      1878  1888  153 220  118 338   3,91&lt;/pre&gt;&lt;br /&gt;Mordecai Sherwin tops the list.  He was, amusingly, a keeper in both cricket and professional soccer.  Both Hunter brothers make the top ten.  &lt;br /&gt;&lt;br /&gt;The leader in terms of dismissals per match (with at least 20 matches) is Charles Smith at 2,69.  Behind him are Pilling and Pooley.&lt;br /&gt;&lt;br /&gt;If you look down the right-hand column of the above table, one man stands out &amp;mdash; Tom Lockyer.  He was easily the best in terms of byes in the 1800's.  With a qualification of 50 matches:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat ct   st  dis   b %&lt;br /&gt;T Lockyer     1849  1866  154 234  112 346   1,78&lt;br /&gt;AP Wickham    1878  1899  65  68   42  110   2,03&lt;br /&gt;JA Bush       1870  1890  145 205  93  298   2,13&lt;br /&gt;AE Newton     1885  1899  83  130  50  180   2,21&lt;br /&gt;A Pike        1894  1899  63  99   28  127   2,27&lt;br /&gt;EFS Tylecote  1871  1886  62  90   48  138   2,29&lt;br /&gt;EW Pooley     1864  1883  289 420  352 772   2,29&lt;br /&gt;FH Huish      1895  1899  88  208  23  231   2,36&lt;br /&gt;JP Whiteside  1888  1899  108 159  44  203   2,45&lt;br /&gt;HR Butt       1890  1899  204 334  130 464   2,46&lt;/pre&gt;&lt;br /&gt;Fred Huish, early in his career, was showing signs of his greatness as a wicket-keeper.  He figures prominently in the next table, which shows the leading keepers by number of dismissals for all first-class matches in England.  I've added an extra column &amp;mdash; byes per 600 balls.  Players from the 19th century are in bold.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat ct   st  dis   b %   b/600&lt;br /&gt;RW Taylor     1960  1988  547 1257 155 1412  1,03  2,76&lt;br /&gt;H Strudwick   1902  1927  610 1133 220 1353  2,97  8,63&lt;br /&gt;JT Murray     1952  1975  546 1116 219 1335  2,05  5,50&lt;br /&gt;&lt;b&gt;FH Huish      1895  1914  493 922  377 1299  2,91  8,35&lt;/b&gt;&lt;br /&gt;&lt;b&gt;D Hunter      1888  1909  543 910  347 1257  2,59  6,69&lt;/b&gt;&lt;br /&gt;B Taylor      1949  1973  520 1036 200 1236  1,54  4,28&lt;br /&gt;&lt;b&gt;HR Butt       1890  1912  543 949  275 1224  3,05  8,89&lt;/b&gt;&lt;br /&gt;H Elliott     1920  1947  517 886  292 1178  1,84  4,67&lt;br /&gt;&lt;b&gt;JH Board      1891  1913  482 810  348 1158  2,63  7,82&lt;/b&gt;&lt;br /&gt;RC Russell    1981  2004  405 1033 111 1144  0,93  2,92&lt;/pre&gt;&lt;br /&gt;Huish never played a Test match, mostly because of Bert Strudwick, two places ahead of him on that table.  Bob Taylor is the leader all-time, as we would expect (since he holds the overall first-class record).&lt;br /&gt;&lt;br /&gt;My last two tables show a curious phenomenon.  The first has the leading keepers by byes percentage.  Qualification for both: 60 matches.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat ct   st  dis   b %   b/600&lt;br /&gt;DE East       1981  1989  189 479  53  532   0,81  2,40&lt;br /&gt;RJ Turner     1988  2005  233 666  49  715   0,83  2,88&lt;br /&gt;P Whitticase  1984  1995  129 309  14  323   0,84  2,63&lt;br /&gt;GR Stephenson 1967  1980  270 584  77  661   0,85  2,23&lt;br /&gt;BJM Maher     1981  1993  125 279  14  293   0,87  2,85&lt;br /&gt;RC Russell    1981  2004  405 1033 111 1144  0,93  2,92&lt;br /&gt;SJ Rhodes     1984  2004  390 1009 101 1110  0,95  3,02&lt;br /&gt;CP Metson     1981  2001  230 556  51  607   0,95  3,03&lt;br /&gt;CMW Read      1998  2007  151 462  23  485   0,99  3,45&lt;br /&gt;APE Knott     1964  1985  411 1012 101 1113  1,02  2,76&lt;/pre&gt;&lt;br /&gt;Lots and lots of 1980's, with David East the best.  Now again, but for byes per 600 balls:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat ct   st  dis   b %   b/600&lt;br /&gt;GR Stephenson 1967  1980  270 584  77  661   0,85  2,23&lt;br /&gt;DE East       1981  1989  189 479  53  532   0,81  2,40&lt;br /&gt;P Whitticase  1984  1995  129 309  14  323   0,84  2,63&lt;br /&gt;RW Taylor     1960  1988  547 1257 155 1412  1,03  2,76&lt;br /&gt;APE Knott     1964  1985  411 1012 101 1113  1,02  2,76&lt;br /&gt;BJM Maher     1981  1993  125 279  14  293   0,87  2,85&lt;br /&gt;RJ Turner     1988  2005  233 666  49  715   0,83  2,88&lt;br /&gt;RC Russell    1981  2004  405 1033 111 1144  0,93  2,92&lt;br /&gt;BSV Timms     1959  1971  231 456  70  526   1,24  3,00&lt;br /&gt;SJ Rhodes     1984  2004  390 1009 101 1110  0,95  3,02&lt;/pre&gt;&lt;br /&gt;Now Bob Stephenson moves up to first, part of a general movement of 1960's keepers up the rankings.  It would appear as though he was unlucky not to play a Test, but with his career coinciding with Taylor's and Alan Knott's, he was kept in county cricket.&lt;br /&gt;&lt;br /&gt;I'm guessing that the difference between byes percentage and byes per 600 balls is due to batsmen hitting the ball more often in the 1980's, so that less balls got through to the keeper.  Ideally, we'd have a "byes per balls that passed the batsmen".  It should be possible to come up with a correction factor based on the run rate (so that you'd use run rate as a proxy for balls hit), but I haven't tried to do so, and in the absence of ball-by-ball data I don't know how accurate it would be.&lt;br /&gt;&lt;br /&gt;We know the number of overs in about 95% of the innings that Tom Lockyer kept.  In these, he averaged just under 10 byes ber 600 balls, almost exactly the same as Paul Nixon.&lt;br /&gt;&lt;br /&gt;I don't want to do era adjustments for wicket-keepers.  The balance between bat and ball can change, and so averages should be adjusted accordingly to get comparisons of talent levels.  But letting through byes shouldn't change much with eras of low scoring.  Still, you might want an idea of how far away from the average keeper of the era someone like Lockyer was, so here's a graph showing the overall byes percentage for each season.  There's a lot of early noise because of the low number of matches.  There's a huge peak in the era of very low scoring around the 1830's.  This isn't just because the byes were constant and the runs were decreasing &amp;mdash; there's a peak in the byes per match as well, suggesting that keepers had just as much trouble with the round-arm bowling on those pitches as batsmen did.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/yearlybyespercent.png"&gt;&lt;br /&gt;&lt;br /&gt;Noteworthy are the jumps that follow the World Wars, telling us that keepers were out of practice and skill levels had dropped.  The general trend after World War II was downward until about 1990, when it starts to rise again, presumably when teams started giving more importance to keepers' batting ability.&lt;br /&gt;&lt;br /&gt;Looking pre-World-War-I, there are a couple of phases whose causes I don't know.  There is a clear rise from the 1860's to the 1880's, before it starts to fall again and then noisily flatten out in around 1900.&lt;br /&gt;&lt;br /&gt;And that's the end of this post and this series on 1800's first-class cricket in England.  Thanks to anyone who actually read it all.&lt;br /&gt;&lt;br /&gt;FIN&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-6785332091350160139?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/6785332091350160139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=6785332091350160139' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6785332091350160139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/6785332091350160139'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html' title='1800&apos;s first-class cricket in England: wicket-keepers'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_yearlybyespercent.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2849121350647924660</id><published>2008-02-14T22:45:00.001+01:00</published><updated>2008-02-14T22:45:34.520+01:00</updated><title type='text'>Bowler support</title><content type='html'>One of the problems in comparing bowlers from different teams is that they often have different levels of support.  This is a recurring theme in Murali v Warne debates (in between expletive-laden rages) &amp;mdash; Murali took more wickets per Test, but that was because Warne followed McGrath and Gillespie, and Murali only ever had Vaas.  Warne had it easier, and that makes Murali's low average more remarkable.  But maybe the batsmen didn't try to score as much against Murali, because they could pick off runs easily at the other end.&lt;br /&gt;&lt;br /&gt;The debate can go on and on, and it's not clear which factors are the most important.  So I asked myself the question, if you swapped the two bowlers between the two teams, what would their records be?&lt;br /&gt;&lt;br /&gt;You can't answer this question perfectly, of course, but you can try.  For each innings in which the bowler bowled, I defined the support average as the mean of the averages of the four bowlers who bowled the most overs in that innings (three bowlers if the bowler himself would have been one of the four).  Note that, unless otherwise stated, the averages used are averages in which each wicket is weighted in proportion to the batting average of the batsman dismissed.  I use end-of-career averages (to make my life easier and the numerics more stable).&lt;br /&gt;&lt;br /&gt;I'll give an example of what I mean by the support average.  Suppose that in one particular innings, the bowlers used were:&lt;br /&gt;&lt;br /&gt;bowler (bowler's average): # of overs&lt;br /&gt;A (25): 30&lt;br /&gt;B (24): 34&lt;br /&gt;C (33): 23&lt;br /&gt;D (31): 15&lt;br /&gt;E (45): 6&lt;br /&gt;&lt;br /&gt;The support average of bowler A is (24 + 33 + 31)/3 = 29,33.  The support average of bowler E is (25 + 24 + 33 + 31)/4 = 28,25.&lt;br /&gt;&lt;br /&gt;To do the analysis, for each bowler I took all innings and sorted them by support average.  I then binned them into quartiles (to reduce the noise and make for easier interpretation), that is, the quarter of innings with the lowest support averages, the quarter with the next lowest support averages, and so on.&lt;br /&gt;&lt;br /&gt;Then for each quartile I calculated the bowler's average, and also the average support average, with the latter weighted by the number of balls bowled in each innings (so that, for instance, an innings where the bowler only bowled one over would barely be counted).&lt;br /&gt;&lt;br /&gt;Then you can make tables like these ones:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;SK Warne      q1     q2     q3     q4     overall&lt;br /&gt;supp avg      26,32  27,81  29,61  35,24  29,92&lt;br /&gt;bowl avg      26,58  29,01  26,59  29,52  27,91&lt;br /&gt;&lt;br /&gt;M Muralidaran q1     q2     q3     q4     overall&lt;br /&gt;supp avg      33,86  37,19  39,26  47,28  39,45&lt;br /&gt;bowl avg      23,01  24,26  27,98  23,15  24,43&lt;/pre&gt;&lt;br /&gt;You can see that Murali's support average is indeed much higher than Warne's, as you would expect.  I don't know how much I want to read into individual trends &amp;mdash; four data points, even aggregated ones, aren't a lot.  That won't stop me trying.  The overall trend for Warne is for his average to increase as his support gets weaker.  In particular, for the upper quartile (the only one near where Murali has to bowl) his average is the highest, getting close to 30.  Murali seems to turn it on when he has no-one to support him at all.&lt;br /&gt;&lt;br /&gt;How about another pair, this time from the 1980's?&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;RJ Hadlee     q1     q2     q3     q4     overall&lt;br /&gt;supp avg      34,51  36,63  38,49  45,75  38,81&lt;br /&gt;bowl avg      24,80  19,69  26,15  25,80  23,96&lt;br /&gt;&lt;br /&gt;MD Marshall   q1     q2     q3     q4     overall&lt;br /&gt;supp avg      24,96  27,13  30,01  36,55  29,73&lt;br /&gt;bowl avg      25,01  19,29  20,75  23,22  21,83&lt;/pre&gt;&lt;br /&gt;When Marshall didn't have Croft, Garner, Roberts, and/or Holding around him, he was still awesome.  &lt;br /&gt;&lt;br /&gt;And since I've apparently made comparing pairs of bowlers a theme for this post, here are the rather surprising results for McGrath and Gillespie:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;GD McGrath    q1     q2     q3     q4     overall&lt;br /&gt;supp avg      28,02  29,32  30,61  33,38  30,28&lt;br /&gt;bowl avg      22,19  20,41  21,92  25,78  22,36&lt;br /&gt;&lt;br /&gt;JN Gillespie  q1     q2     q3     q4     overall&lt;br /&gt;supp avg      26,47  27,43  28,49  34,30  29,28&lt;br /&gt;bowl avg      32,81  28,75  30,62  22,23  28,05&lt;/pre&gt;&lt;br /&gt;It seems that Gillespie did actually do pretty well when McGrath wasn't around, and it was McGrath who got worse (a little bit) when he didn't have support.  Well, perhaps &amp;mdash; it could just be an artifact of McGrath's career trajectory, I haven't checked.&lt;br /&gt;&lt;br /&gt;These are all well and good, but we'd like to do a bit more serious analysis with them. If you fit a regression line to an individual player, you can get a rough guide of how their average will change when the support gets better or worse.  While it might be dubious to do this for just one player, if you do it for all players, the noise should largely cancel out and we'll be left with some solid numbers.&lt;br /&gt;&lt;br /&gt;So, I took all bowlers with 100 Test wickest at at least 3 wickets per Test, calculated the slope of the regression line for each, and then took the mean of the slopes.  The result was 0,50.  That's a pretty hefty figure.  It means that, on average, if the support average goes down by a run, then the bowler's average will go down by half a run.  But it's inflated by the presence of some outliers.  If you exclude players with less than 50 Tests (remember that we're dealing with quartiles here, so you need a large number of Tests to get reasonable quartile results), that figure drops to 0,25.  So if the support average drops by four runs, the bowler's average drops by one run, on average.&lt;br /&gt;&lt;br /&gt;You might be wondering if there's a correlation between average and the regression slope.  There isn't.  The scatterplot is equal parts scatter and plot.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/supportslopevavg.png"&gt;&lt;br /&gt;&lt;br /&gt;No trend emerges if you make the qualification 50 Tests to get rid of the outliers.  I tried a few other variables, but I couldn't find anything with an R-squared of better than about 0,004.  Even &lt;a href="http://sabermetricresearch.blogspot.com/2006/08/on-correlation-r-and-r-squared.html"&gt;in cricket terms&lt;/a&gt;, none of them explained more than 7% of the data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2849121350647924660?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2849121350647924660/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2849121350647924660' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2849121350647924660'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2849121350647924660'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/bowler-support.html' title='Bowler support'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_supportslopevavg.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3608885934512704020</id><published>2008-02-12T13:07:00.002+01:00</published><updated>2008-02-17T19:38:42.365+01:00</updated><title type='text'>1800's first-class cricket in England: all-rounders (across eras)</title><content type='html'>This is Part 8 in my series on first-class cricket in the 1800's in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post, I look at all-rounders.  As I did &lt;a href="http://pappubahry.blogspot.com/2008/02/all-rounderness.html"&gt;for Test cricketers&lt;/a&gt;, I'll be ranking players by the ratio of batting average to bowling average, where the averages are weighted as in Parts 6 and 7.&lt;br /&gt;&lt;br /&gt;Let's start, as always, with the 1800's.  The averages below are with respect to 16,6, the overall average for the period in question.  The +/- percentage figure applies both to the bowling average (well, technically it applies to the regular bowling average; I fondly hope that it's accurate for the weighted average) and (by a trick of mathematics) to the ratio as well.  I've given the wickets per match for those interested; recall that these are underestimates for bowlers whose wicket tallies are estimated.  Qualifications: 2000 runs and at least two (regular) wickets per match.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             start end   mat  runs  avg     wkts   avg   wpm   ratio +/- %&lt;br /&gt;W Lambert        1801  1817  62   2961  37,05   318,1  16,3  5,1   2,3   10,0&lt;br /&gt;Lord F Beauclerk 1801  1825  94   4319  37,28   406,4  18,5  4,3   2,0   10,0&lt;br /&gt;WG Grace         1865  1908  838  46792 37,41   2495   18,91 2,98  1,98  0,0&lt;br /&gt;AG Steel         1877  1895  142  6184  28,28   699    14,53 4,92  1,95  0,0&lt;br /&gt;J Broadbridge    1814  1840  90   2368  26,82   407,6  14,2  4,5   1,9   9,9&lt;br /&gt;CT Studd         1879  1884  85   3928  30,35   426    16,89 5,01  1,80  0,2&lt;br /&gt;A Mynn           1832  1859  200  4749  27,02   1059,9 15,9  5,3   1,7   7,0&lt;br /&gt;CG Taylor        1836  1859  122  3020  33,56   292,0  20,5  2,4   1,6   7,0&lt;br /&gt;EH Budd          1803  1831  68   2597  30,74   285,8  20,5  4,2   1,5   10,0&lt;br /&gt;W Caffyn         1849  1873  180  5405  24,26   564    16,17 3,13  1,50  0,3&lt;br /&gt;T Hayward        1854  1872  108  4487  27,00   237    18,00 2,19  1,50  0,6&lt;br /&gt;J Wisden         1845  1863  175  4020  19,77   1037,5 13,9  5,9   1,4   3,4&lt;br /&gt;RG Barlow        1871  1891  321  10074 18,43   879    13,06 2,74  1,41  0,0&lt;br /&gt;G Giffen         1882  1896  158  5621  20,23   502    14,81 3,18  1,37  0,0&lt;br /&gt;GA Lohmann       1884  1896  256  6495  16,17   1590   11,99 6,21  1,35  0,0&lt;br /&gt;CTB Turner       1888  1893  93   2118  13,15   610    10,34 6,56  1,27  0,0&lt;br /&gt;GA Davidson      1886  1898  155  5338  18,45   605    15,35 3,90  1,20  0,0&lt;br /&gt;W Bates          1877  1887  257  8651  19,09   746    16,13 2,90  1,18  0,0&lt;br /&gt;WE Midwinter     1877  1884  127  3533  17,90   330    15,14 2,60  1,18  0,0&lt;br /&gt;W Flowers        1877  1896  409  12035 17,61   1085   15,25 2,65  1,15  0,0&lt;/pre&gt;&lt;br /&gt;Lambert's pretty clear at the top.  Beauclerk is mildly ahead of WG Grace and Allan Steel, but the uncertainty means that all wa can say is that he's likely to be somewhere between second and sixth.&lt;br /&gt;&lt;br /&gt;Before I started on this extended exercise in analysing old English players, I didn't know much at all about the cricketers of the era, apart from WG Grace.  One name I did know was Alfred Mynn, rated by John Woodcock as the fourth-greatest cricketer of all time.  Now, Woodcock's list has lots of problems (most notably, WG Grace is number one, ahead of Bradman), but I was interested to see how Mynn would fare after adjusting for eras.  He comes in at number seven (plus or minus one) on the table above.  But if (as might have happened) Woodcock ignored cricket before 1830, then you can see what his method was &amp;mdash; he chose near the top all-rounders with huge aggregates.  Mynn was not a special batsman, but he was a prolific wicket-taker, even if his bowling average wasn't remarkable for his time.  Add in his popularity, and his dominance of single-wicket matches, and you can see where Woodcock was coming from, even if number four is too high.&lt;br /&gt;&lt;br /&gt;Now let's move onto all first-class cricket in England.  Players whose career began in the 1800's are in bold.  Averages are with respect to 24,5.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             start end   mat  runs  avg     wkts   avg   wpm   ratio +/- %&lt;br /&gt;KR Miller        1945  1959  75   4253  49,22   164    17,59 2,19  2,80  0,0&lt;br /&gt;WW Armstrong     1902  1921  124  5641  41,87   407    16,36 3,28  2,56  0,0&lt;br /&gt;&lt;b&gt;W Lambert        1801  1817  62   2961  54,68   318,1  24,01 5,13  2,28  10,0&lt;/b&gt;&lt;br /&gt;RJ Hadlee        1973  1990  187  6887  30,48   780    14,13 4,17  2,16  0,0&lt;br /&gt;GStA Sobers      1957  1974  209  13491 48,01   548    23,38 2,62  2,05  0,0&lt;br /&gt;FE Woolley       1906  1938  886  54535 40,98   1893   20,11 2,14  2,04  0,0&lt;br /&gt;&lt;b&gt;WG Grace         1865  1908  838  52043 51,85   2675   25,62 3,19  2,02  0,0&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Lord F Beauclerk 1801  1825  94   4319  55,02   406,4  27,3  4,3   2,0   10,0&lt;/b&gt;&lt;br /&gt;FA Tarrant       1903  1914  295  15925 36,93   1327   18,92 4,50  1,95  0,0&lt;br /&gt;&lt;b&gt;AG Steel         1877  1895  142  6184  41,74   699    21,45 4,92  1,95  0,0&lt;/b&gt;&lt;br /&gt;&lt;b&gt;J Broadbridge    1814  1840  90   2368  39,58   407,6  21,0  4,5   1,9   9,9&lt;/b&gt;&lt;br /&gt;JM Gregory       1919  1926  77   2869  34,26   281    18,49 3,65  1,85  0,0&lt;br /&gt;&lt;b&gt;GH Hirst         1891  1929  801  35378 35,52   2687   19,26 3,35  1,84  0,0&lt;/b&gt;&lt;br /&gt;CT Studd         1879  1884  85   3928  44,80   426    24,92 5,01  1,80  0,2&lt;br /&gt;MJ Procter       1965  1981  264  14733 32,27   848    18,31 3,21  1,76  0,0&lt;br /&gt;&lt;b&gt;W Rhodes         1898  1930  1007 35015 30,35   3960   17,43 3,93  1,74  0,0&lt;/b&gt;&lt;br /&gt;GA Faulkner      1907  1924  74   3046  29,83   267    17,42 3,61  1,71  0,0&lt;br /&gt;&lt;b&gt;FS Jackson       1890  1907  301  15626 38,88   744    22,81 2,47  1,70  0,0&lt;/b&gt;&lt;br /&gt;JW Hearne        1909  1936  593  34438 41,25   1687   24,22 2,84  1,70  0,0&lt;br /&gt;&lt;b&gt;A Mynn           1832  1859  200  4749  39,88   1059,9 23,5  5,3   1,7   7,0&lt;/b&gt;&lt;br /&gt;TL Goddard       1955  1962  48   2549  32,85   140    19,39 2,92  1,69  0,0&lt;br /&gt;SG Smith         1906  1914  143  7575  33,87   606    20,48 4,24  1,65  0,0&lt;br /&gt;&lt;b&gt;CG Taylor        1836  1859  122  3020  49,52   292,0  30,3  2,4   1,6   7,0&lt;/b&gt;&lt;br /&gt;R Kilner         1911  1927  389  13722 29,48   917    18,53 2,36  1,59  0,0&lt;br /&gt;Imran Khan       1971  1988  240  11679 31,80   733    20,17 3,05  1,58  0,0&lt;br /&gt;&lt;b&gt;JR Mason         1893  1914  324  16619 35,92   817    23,71 2,52  1,52  0,0&lt;/b&gt;&lt;br /&gt;&lt;b&gt;EH Budd          1803  1831  68   2597  45,37   285,8  30,2  4,2   1,5   10,0&lt;/b&gt;&lt;br /&gt;&lt;b&gt;W Caffyn         1849  1873  180  5405  35,81   564    23,87 3,13  1,50  0,3&lt;/b&gt;&lt;br /&gt;&lt;b&gt;T Hayward        1854  1872  108  4487  39,85   237    26,58 2,19  1,50  0,6&lt;/b&gt;&lt;br /&gt;IJ Harvey        1999  2007  75   4044  28,43   219    19,11 2,92  1,49  0,0&lt;/pre&gt;&lt;br /&gt;Keith Miller comes out on top, ahead of (surprisingly) the Big Ship Warwick Armstrong.  Lambert leads a host of 19th century players, who are vastly over-represented in the table &amp;mdash; almost half of the top thirty spots!  Given the number of players since 1900, you'd expect only about five or six from the 1800's.  Alfred Mynn is a long way down the table (20th place), but if you give more weighting to wickets per match, he would be higher.&lt;br /&gt;&lt;br /&gt;At number nine is Frank Tarrant, someone I'd never heard of.  He never played a Test, which, at first glance, is extraordinary for someone with his first-class record.  His lack of Test cricket is explained by his being Australian and playing for Middlesex, which barred him from playing for Australia (though he did play for the MCC at times).&lt;br /&gt;&lt;br /&gt;The abundance of 19th century all-rounders tells us something about the nature of the game and/or its players.  I'm not sure exactly what factors contributed to it, but I would suggest the following.  When cricket was less developed, and had fewer top-level players, a talented athlete was more likely to dominate with both bat and ball.  As batting and bowling techniques became more sophisticated, and the number of players increased, there were more specialists in both disciplines, making it harder for the talented cricketer to be good (relative to his peers) with both bat and ball.&lt;br /&gt;&lt;br /&gt;Next up (and the last instalment in this series): wicket-keepers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3608885934512704020?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3608885934512704020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3608885934512704020' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3608885934512704020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3608885934512704020'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html' title='1800&apos;s first-class cricket in England: all-rounders (across eras)'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4531332943667469652</id><published>2008-02-10T22:12:00.000+01:00</published><updated>2008-02-10T22:14:09.818+01:00</updated><title type='text'>All-rounderness</title><content type='html'>The statistical judging of all-rounders is usually not done in what I would consider a satisfactory way.  I'm not really going to remedy this problem in this post, though I will present some ranked lists.  It's more a case of looking at the stats and seeing why it's hard to get them to agree well with common sense (without making some arbitrary decisions).&lt;br /&gt;&lt;br /&gt;I'll only be considering batsman-bowler all-rounders.  If you want to comment about wicket-keepers, it should be about Tim Zoehrer.&lt;br /&gt;&lt;br /&gt;The base of this analysis will be the bowling averages with wickets &lt;a href="http://pappubahry.blogspot.com/2007/11/weighted-bowling-averages.html"&gt;weighted by the batting averages&lt;/a&gt; of the batsmen dismissed, and the batting averages with runs &lt;a href="http://pappubahry.blogspot.com/2007/12/modified-batting-averages.html"&gt;weighted by the strength of the bowling attack&lt;/a&gt;.  This gives a comparison across all eras, and rewards those players who performed well against stronger sides.  Both the averages I use in this post are normalised to 31,48, which is the overall batting average for all Tests.  All references to averages below are these weighted averages.&lt;br /&gt;&lt;br /&gt;My main ranking tool will be the batting average divided by the bowling average.  I prefer this to the difference, which is more commonly used, because I think it gets closer to a definition of "all-rounder-ness".  So, for instance, a batting average of 60 and a bowling average of 30 gives a ratio of 2.  A batting average of 40 and a bowling average of 20 also gives a ratio of 2.  I think this is fair &amp;mdash; in the first case, you have an all-time great batsman who was a good bowler, and in the second you have an all-time great bowler who was a good batsman.  You might think that one is a better player than the other, but I'm trying to get at the all-rounder-ness.  I hope that's clear.&lt;br /&gt;&lt;br /&gt;So, let's think of what qualities we'd like and qualifications we'll use in ranking the best genuine all-rounders of all time.  &lt;br /&gt;&lt;br /&gt;1. 20 Test innings&lt;br /&gt;2. At least 2 wickets per Test (I don't use weighted wickets here; I just want to make sure that they bowled regularly)&lt;br /&gt;3. A batting average above average (i.e., higher than 31,48)&lt;br /&gt;4. A bowling average below average (i.e., lower than 31,48)&lt;br /&gt;&lt;br /&gt;The top seven all-rounders of all time are then as follows.  (Runs and wickets are the regular runs and wickets; wpm is wickets per match; ratio is the ratio of batting average to bowling average).&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             mat runs  avg     wkts avg   wpm   ratio&lt;br /&gt;KR Miller        55  2958  35,37   170  23,20 3,09  1,52&lt;br /&gt;Imran Khan       88  3807  36,07   362  24,01 4,11  1,50&lt;br /&gt;W Bates          15  656   36,05   50   26,96 3,33  1,34&lt;br /&gt;TL Goddard       41  2516  35,60   123  27,06 3,00  1,32&lt;br /&gt;IT Botham        102 5200  33,43   383  30,51 3,75  1,10&lt;br /&gt;TE Bailey        61  2290  32,23   132  30,33 2,16  1,06&lt;br /&gt;JM Gregory       24  1146  32,89   85   31,24 3,54  1,05&lt;/pre&gt;&lt;br /&gt;Why only seven?  Because that's all there is.  No other players satisfy those four conditions above.&lt;br /&gt;&lt;br /&gt;I'm actually pretty happy with that list.  It's obviously not the list of the best all-rounders ever, but as a list of the most all-rounder of all-rounders, I think it works.  Keith Miller just beats Imran Khan as the best ever.&lt;br /&gt;&lt;br /&gt;Now let's remove requirements 3 and 4 above and see what we get.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             mat runs  avg     wkts avg   wpm   ratio&lt;br /&gt;GStA Sobers      93  8032  54,62   235  34,63 2,53  1,58&lt;br /&gt;KR Miller        55  2958  35,37   170  23,20 3,09  1,52&lt;br /&gt;Imran Khan       88  3807  36,07   362  24,01 4,11  1,50&lt;br /&gt;AG Steel         13  600   48,93   29   35,16 2,23  1,39&lt;br /&gt;W Bates          15  656   36,05   50   26,96 3,33  1,34&lt;br /&gt;TL Goddard       41  2516  35,60   123  27,06 3,00  1,32&lt;br /&gt;AK Davidson      44  1328  27,75   186  21,51 4,23  1,29&lt;br /&gt;GA Faulkner      25  1754  45,89   82   36,35 3,28  1,26&lt;br /&gt;SM Pollock       108 3781  30,31   421  24,18 3,90  1,25&lt;br /&gt;AW Greig         58  3599  39,76   141  33,45 2,43  1,19&lt;br /&gt;RJ Hadlee        86  3124  26,09   431  23,76 5,01  1,10&lt;br /&gt;IT Botham        102 5200  33,43   383  30,51 3,75  1,10&lt;br /&gt;TE Bailey        61  2290  32,23   132  30,33 2,16  1,06&lt;br /&gt;W Barnes         21  725   30,57   51   28,81 2,43  1,06&lt;br /&gt;JM Gregory       24  1146  32,89   85   31,24 3,54  1,05&lt;br /&gt;A Flintoff       66  3331  31,06   190  29,89 2,88  1,04&lt;br /&gt;MA Noble         42  1997  33,12   121  32,61 2,88  1,02&lt;br /&gt;CL Cairns        62  3320  32,53   218  32,07 3,52  1,01&lt;br /&gt;G Ulyett         25  949   32,25   50   31,93 2,00  1,01&lt;br /&gt;C Kelleway       26  1422  37,08   52   37,45 2,00  0,99&lt;/pre&gt;&lt;br /&gt;Now Sobers returns to number one, which is where most judges would put him.  You can see why he missed out on the previous list &amp;mdash; his bowling wasn't that good.  Despite an average in the mid-30's (both weighted and regular), he was actually a very economical bowler.  A high average and low economy rate (2,22) means that his strike rate was appallingly bad, over 90.  Not the go-to man if you need a wicket!  But he's generally considered the second-greatest ever player because no other great batsman could bowl so well.  Even if "so well" is not so well.&lt;br /&gt;&lt;br /&gt;The cut-off of 2 wickets per Test is pretty arbitrary, and it would be unfair to stop here, because it would exclude Jacques Kallis.  For my last table, I've lowered the bar to just 1 wicket per Test.  This means that a bunch of part-timers are included.  While it would be silly to consider them as being as good as the more regular wicket-takers given here, the stats must tell some story &amp;mdash; perhaps it suggests that they were underbowled, or perhaps they were just lucky and dismissed a few good batsmen from time to time.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             mat runs  avg     wkts avg   wpm   ratio&lt;br /&gt;JH Kallis        113 9331  50,52   223  31,50 1,97  1,60&lt;br /&gt;RM Cowper        27  2061  44,43   36   28,14 1,33  1,58&lt;br /&gt;GStA Sobers      93  8032  54,62   235  34,63 2,53  1,58&lt;br /&gt;FS Jackson       20  1415  55,40   24   35,89 1,20  1,54&lt;br /&gt;KR Miller        55  2958  35,37   170  23,20 3,09  1,52&lt;br /&gt;Imran Khan       88  3807  36,07   362  24,01 4,11  1,50&lt;br /&gt;CG Macartney     35  2131  44,52   45   30,30 1,29  1,47&lt;br /&gt;AG Steel         13  600   48,93   29   35,16 2,23  1,39&lt;br /&gt;A Symonds        19  1031  38,34   22   28,16 1,16  1,36&lt;br /&gt;W Bates          15  656   36,05   50   26,96 3,33  1,34&lt;br /&gt;TL Goddard       41  2516  35,60   123  27,06 3,00  1,32&lt;br /&gt;AK Davidson      44  1328  27,75   186  21,51 4,23  1,29&lt;br /&gt;EJ Barlow        30  2516  40,24   40   31,36 1,33  1,28&lt;br /&gt;GA Faulkner      25  1754  45,89   82   36,35 3,28  1,26&lt;br /&gt;SM Pollock       108 3781  30,31   421  24,18 3,90  1,25&lt;br /&gt;BM McMillan      38  1968  37,59   75   30,46 1,97  1,23&lt;br /&gt;ER Dexter        62  4502  43,58   66   35,48 1,06  1,23&lt;br /&gt;FMM Worrell      51  3860  49,45   69   40,78 1,35  1,21&lt;br /&gt;AW Greig         58  3599  39,76   141  33,45 2,43  1,19&lt;br /&gt;JDP Oram         25  1380  37,47   49   33,21 1,96  1,13&lt;br /&gt;RJ Hadlee        86  3124  26,09   431  23,76 5,01  1,10&lt;br /&gt;IT Botham        102 5200  33,43   383  30,51 3,75  1,10&lt;br /&gt;TE Bailey        61  2290  32,23   132  30,33 2,16  1,06&lt;br /&gt;W Barnes         21  725   30,57   51   28,81 2,43  1,06&lt;br /&gt;JM Gregory       24  1146  32,89   85   31,24 3,54  1,05&lt;br /&gt;NWD Yardley      20  812   23,53   21   22,63 1,05  1,04&lt;br /&gt;A Flintoff       66  3331  31,06   190  29,89 2,88  1,04&lt;br /&gt;Mushtaq Mohammad 57  3643  37,45   79   36,35 1,39  1,03&lt;br /&gt;MA Noble         42  1997  33,12   121  32,61 2,88  1,02&lt;br /&gt;CL Cairns        62  3320  32,53   218  32,07 3,52  1,01&lt;/pre&gt;&lt;br /&gt;And Kallis actually slots in at number one!  Bob Cowper, with his part-time offies, will probably suprise most of you (it surprised me, even though I was vaguely aware of his handy bowling).  Andrew Symonds' career is definitely on the improve.  His regular bowling average is now under 35.  That his weighted bowling average is just over 28 tells us that he's dismissing some good batsmen.&lt;br /&gt;&lt;br /&gt;So there you go.  I don't know what features you'd want in an ideal ranking of all-rounders.  You could set boundary of 2 wickets per Test, and penalise players (such as Kallis) who take less wickets, but setting the boundary would be arbitrary.&lt;br /&gt;&lt;br /&gt;Charles Davis, when rating bowlers, actually gives equal weighting to wickets per Test and bowling average.  While I see the arguments for doing so (and it would eliminate the problem of setting that boundary), I still like to fall back on the average, so as not to unduly reward bowlers with no support.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4531332943667469652?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4531332943667469652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4531332943667469652' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4531332943667469652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4531332943667469652'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/all-rounderness.html' title='All-rounderness'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4198734835967913664</id><published>2008-02-09T12:10:00.001+01:00</published><updated>2008-02-17T19:39:20.878+01:00</updated><title type='text'>1800's first-class cricket in England: batsmen across eras</title><content type='html'>This is Part 7 in my series on first-class cricket in the 1800's in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post, I do a comparison of batsmen across eras, by weighting each innings by the strength of the bowling attack.  The latter is taken as the "average average" of the bowlers in an innings, weighted by the number of balls bowled by each.  (You can see the results for Test matches &lt;a href="http://pappubahry.blogspot.com/2007/12/modified-batting-averages.html"&gt;here&lt;/a&gt;.)  The same effects (only now for batsmen) occur here as in the weighted bowling averages in Part 6 &amp;mdash; batsmen are rewarded for scoring runs against better bowling attacks, and batsmen in low-scoring eras are rewarded because typically the bowlers will have correspondingly low averages.&lt;br /&gt;&lt;br /&gt;For innings where the bowlers' overs are not recorded, I've instead used the overall batting average for that season.  I couldn't see an easy way of getting an unbiased estimate of the bowling strength, when we don't know who bowled.  Using the season average will in general inflate the modified batting average (since typically a batsman will score more heavily against weaker attacks, but using the season average counts that as the same as scoring against a strong attack).  But for the players near the top of the tables below, this is not so likely &amp;mdash; these batsmen tend to "rise to the occasion" and perform disproportionately better against stronger bowling attacks.  Nevertheless, perhaps you might want to put a mental asterisk next to players whose careers included matches from before 1855.&lt;br /&gt;&lt;br /&gt;Also, some of the bowling averages are estimates (even where we know the overs bowled), so some of the weighted averages should only be given to one decimal place.  But I'm getting lazy.&lt;br /&gt;&lt;br /&gt;It's interesting to see a graph of the season averages:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/englishoverallavgbyyear.png"&gt;&lt;br /&gt;&lt;br /&gt;There's a lot of noise in the early years because not many first-class matches were played (sometimes only one).  The lowest-scoring season was 1831, when the average of runs off the bat (that is, excluding extras) was just 7,35.  You can have a look at the scorecards for that season &lt;a href="http://www.cricketarchive.co.uk/Archive/Seasons/Seasonal_Averages/ENG/1831_f_Match_List.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To start our comparisons of batsmen, we look at just the 1800's.  I've given the weighted runs, regular average, and (once again) two weighted averages, one with respect to 16,6 (the overall average for the 1800's) and one with respect to 24,5 (the overall average from 1801 to 2007).  One is just a scaling of the other.  Qualification: 2000 runs.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                                                   wtd avg&lt;br /&gt;name             start end   mat inns  no  runs  wtd runs  avg   wrt 16,6 wrt 24,5&lt;br /&gt;KS Ranjitsinhji  1893  1899  291 232   28  10411 8042,8    51,03  39,43  58,19&lt;br /&gt;RM Poore         1898  1899  42  47    6   2277  1584,2    55,54  38,64  57,03&lt;br /&gt;F Pilch          1820  1854  213 389   30  6797  13668,9   18,93  38,08  56,20&lt;br /&gt;WG Grace         1865  1899  838 1250  89  46792 43431,9   40,30  37,41  55,21&lt;br /&gt;Lord F Beauclerk 1801  1825  94  172   14  4319  5890,3    27,34  37,28  55,02&lt;br /&gt;W Lambert        1801  1817  62  112   5   2961  3964,3    27,67  37,05  54,68&lt;br /&gt;N Wanostrocht    1830  1852  134 242   12  4392  8027,7    19,10  34,90  51,51&lt;br /&gt;CG Taylor        1836  1859  122 222   11  3020  7080,2    14,31  33,56  49,52&lt;br /&gt;G Parr           1844  1870  187 321   26  6116  9137,6    20,73  30,97  45,72&lt;br /&gt;EH Budd          1803  1831  68  119   9   2597  3381,4    23,61  30,74  45,37&lt;br /&gt;CT Studd         1879  1884  85  145   23  3928  3702,8    32,20  30,35  44,80&lt;br /&gt;A Shrewsbury     1875  1899  459 654   66  20837 17552,8   35,44  29,85  44,06&lt;br /&gt;J Guy            1837  1854  136 244   11  3090  6723,5    13,26  28,86  42,59&lt;br /&gt;AG Steel         1877  1895  142 227   21  6184  5826,0    30,02  28,28  41,74&lt;br /&gt;W Ward           1810  1845  116 210   21  3517  5341,6    18,61  28,26  41,71&lt;br /&gt;EG Wenman        1825  1854  135 241   15  3088  6382,4    13,66  28,24  41,68&lt;br /&gt;CB Fry           1892  1899  381 209   8   7364  5597,2    36,64  27,85  41,10&lt;br /&gt;R Robinson       1801  1819  57  111   9   2039  2811,1    19,99  27,56  40,68&lt;br /&gt;TW Hayward       1893  1899  671 283   26  9558  7014,5    37,19  27,29  40,28&lt;br /&gt;W Gunn           1880  1899  505 716   65  21520 17612,3   33,06  27,05  39,93&lt;/pre&gt;&lt;br /&gt;Ranji's high average wasn't just because batting was getting easier towards the end of the century &amp;mdash; even allowing for that he still comes out on top.  Once again, Robert Poore's lucky, since his career went downhill after he fought in the Boer War.&lt;br /&gt;&lt;br /&gt;Fuller Pilch was described a few years after his retirement as the best batsman ever, and he kept this tag until WG Grace came along.  From the little I've read about him, he seems to be the first man to consistently get his foot to the pitch of the ball.  In an era where pitches were of very low quality, smothering any turn or uneven bounce was very important.&lt;br /&gt;&lt;br /&gt;It's worth commenting on the discrepancy between the rankings in the table (i.e., Pilch ahead of Grace) and the opinion of the time (Grace ahead of Pilch).  Grace was considered a better batsman than Pilch because he could play attacking shots off a wider range of deliveries.  But since Grace's innovations to batting technique spread to the other cricketers of the time, he didn't stand out as much as Pilch &amp;mdash; scoring for most batsman improved after Grace.  &lt;br /&gt;&lt;br /&gt;It is sad that Nicholas Felix is so called, since his actual surname was Wanostrocht.  He wanted to be known as Felix, but Wanostrocht is such a cool name for a cricketer that I've gone against his wishes in these tables.  In addition to being an excellent batsman in a low-scoring era, he also invented a type of bowling machine.&lt;br /&gt;&lt;br /&gt;Now let's compare batsmen in England across all eras.  I was unsure as to how useful this would be &amp;mdash; we all know of batsmen who have excellent records in domestic cricket but do terribly in Tests.  But weighting runs by the strength of the bowling attack does a pretty good job in discarding that breed of batsman.  Of course, it also doesn't allow for players such as Marcus Trescothick, who have mediocre county records but respectable Test numbers.  Players who played in the 1800's are in bold.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                                                   wtd avg&lt;br /&gt;name             start end   mat inns  no  runs  wtd runs  avg   wrt 16,6 wrt 24,5&lt;br /&gt;DG Bradman       1930  1948  92  120   18  9837  6835,1    96,44  67,01  98,90&lt;br /&gt;GA Headley       1933  1954  47  74    9   4460  2897,8    68,62  44,58  65,80&lt;br /&gt;&lt;b&gt;KS Ranjitsinhji  1893  1920  291 473   58  23341 17015,4   56,24  41,00  60,51&lt;/b&gt;&lt;br /&gt;VM Merchant      1936  1946  49  81    15  4130  2517,8    62,58  38,15  56,30&lt;br /&gt;&lt;b&gt;F Pilch          1820  1854  213 389   30  6797  13668,9   18,93  38,08  56,20&lt;/b&gt;&lt;br /&gt;WH Ponsford      1926  1934  67  86    12  4110  2812,8    55,54  38,01  56,10&lt;br /&gt;WM Woodfull      1926  1934  72  87    9   4374  2956,3    56,08  37,90  55,94&lt;br /&gt;NC O'Neill       1961  1964  44  71    8   3350  2379,6    53,17  37,77  55,75&lt;br /&gt;AF Kippax        1930  1934  42  55    11  2412  1648,0    54,82  37,46  55,28&lt;br /&gt;DR Martyn        1991  2005  30  44    11  2549  1231,1    77,24  37,31  55,06&lt;br /&gt;&lt;b&gt;Lord F Beauclerk 1801  1825  94  172   14  4319  5890,3    27,34  37,28  55,02&lt;/b&gt;&lt;br /&gt;&lt;b&gt;CB Fry           1892  1921  381 635   42  30490 22042,7   51,42  37,17  54,86&lt;/b&gt;&lt;br /&gt;&lt;b&gt;W Lambert        1801  1817  62  112   5   2961  3964,3    27,67  37,05  54,68&lt;/b&gt;&lt;br /&gt;AR Morris        1948  1953  46  66    5   3224  2249,2    52,85  36,87  54,42&lt;br /&gt;J Cook           1989  1991  71  124   19  7604  3863,8    72,42  36,80  54,31&lt;br /&gt;RB Simpson       1961  1966  49  84    14  3702  2574,0    52,89  36,77  54,27&lt;br /&gt;W Bardsley       1909  1926  126 175   17  7866  5603,0    49,78  35,46  52,34&lt;br /&gt;MEK Hussey       2001  2005  60  105   13  6710  3253,5    72,93  35,36  52,19&lt;br /&gt;SR Waugh         1987  2002  75  109   28  5290  2855,4    65,31  35,25  52,03&lt;br /&gt;WR Hammond       1920  1951  515 828   88  40733 26039,2   55,04  35,19  51,93&lt;br /&gt;&lt;b&gt;WG Grace         1865  1908  838 1428  97  52043 46760,0   39,10  35,13  51,85&lt;/b&gt;&lt;br /&gt;&lt;b&gt;N Wanostrocht    1830  1852  134 242   12  4392  8027,7    19,10  34,90  51,51&lt;/b&gt;&lt;br /&gt;SG Barnes        1938  1948  34  46    5   2074  1417,3    50,59  34,57  51,02&lt;br /&gt;DS Lehmann       1991  2006  89  139   8   8894  4525,7    67,89  34,55  50,99&lt;br /&gt;AL Hassett       1938  1953  73  100   11  4684  3063,3    52,63  34,42  50,80&lt;br /&gt;CL Walcott       1950  1957  49  78    12  3271  2268,0    49,56  34,36  50,72&lt;br /&gt;WM Lawry         1961  1968  65  105   12  4590  3182,2    49,35  34,22  50,50&lt;br /&gt;G Boycott        1962  1986  492 814   127 38981 23425,2   56,74  34,10  50,33&lt;br /&gt;JB Hobbs         1905  1934  740 1178  98  53843 36714,9   49,85  34,00  50,17&lt;br /&gt;L Hutton         1934  1960  425 676   75  32306 20354,7   53,75  33,87  49,99&lt;/pre&gt;&lt;br /&gt;The top spot should be pretty uncontroversial.  Vijay Merchant is at number four &amp;mdash; he is perhaps not famous enough for having the second-highest first-class average of all time (71,64).  Pilch slots in at five, followed by a string of Australians.  You'll note that WG Grace fell significantly in the years between 1899 and his retirement in 1908.  It's that old story of a player hanging on too long.  Thirty-five years of first-class cricket and he still wanted another decade.&lt;br /&gt;&lt;br /&gt;Beauclerk and Lambert, those all-round giants of the first quarter of the 19th century, just miss out on the top ten.&lt;br /&gt;&lt;br /&gt;Jimmy Cook is perhaps the oddest name in the list.  He was a South African who played most of his career during isolation in South Africa, but played three seasons with Somerset before retiring.  He did get to play three Test matches.&lt;br /&gt;&lt;br /&gt;Next up: all-rounders.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4198734835967913664?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4198734835967913664/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4198734835967913664' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4198734835967913664'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4198734835967913664'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html' title='1800&apos;s first-class cricket in England: batsmen across eras'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_englishoverallavgbyyear.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-4719909543962423782</id><published>2008-02-07T14:25:00.000+01:00</published><updated>2008-02-07T14:26:13.835+01:00</updated><title type='text'>Increasing or decreasing scores</title><content type='html'>Imagine a scorecard in which each batsman's score is less than the previous one.  So, for example, the first opener scores 131, the second 82, the number three makes 75, the number four makes 56, and so on.  What would you guess is the longest such sequence of declining scores in a Test innings?  What would you guess is the longest &lt;i&gt;increasing&lt;/i&gt; sequence?&lt;br /&gt;&lt;br /&gt;These are, I admit, not the most important questions in cricket, but I found them fun.  I would have thought that, while a long decreasing sequence would be rare, once you hit number 7 and the tail it should common enough to get down to number eight or nine.  The problem is that once a batsman makes a duck, the sequence has to stop, so it'd be pretty unlikely to get all the way to number eleven.&lt;br /&gt;&lt;br /&gt;An increasing sequence should be rarer, but perhaps when a captain reversed his batting order on a sticky wicket, you could get a sequence up to seven or so.  Certainly I thought the longest increasing sequence would be shorter than the longest decreasing sequence.&lt;br /&gt;&lt;br /&gt;As it happens, the longest increasing sequence goes to number six, but so does the longest decreasing sequence!  &lt;br /&gt;&lt;br /&gt;Decreasing sequences: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/6/6814.html"&gt;Eng v Aus, 1905&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/21/21646.html"&gt;Eng v SA, 1955&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/21/21646.html"&gt;Aus v Eng, 1990/1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/21/21646.html"&gt;Zim v NZ, 1992/3&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Increasing sequences: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2927.html"&gt;Aus v Eng, 1884/5&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2927.html"&gt;Aus v Eng, 1932/3&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2927.html"&gt;Eng v WI, 1957&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you allow a score to be equal to the previous score, the longest non-decreasing sequences still only go down to number six, but there are two non-increasing sequences to number 7: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/19/19168.html"&gt;SA v Aus, 1949/50&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/37/37378.html"&gt;Eng v Aus, 1977&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What about first-class matches?  I only have a database of matches played in England (which is about 28000, more than half ever played), but the largest sequences (not allowing equal scores) in this dataset are of length eight:&lt;br /&gt;&lt;br /&gt;Decreasing: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/9/9138.html"&gt;Lancs v Middlesex, 1913&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/12/12611.html"&gt;Lancs v Glamorgan, 1928&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/13/13518.html"&gt;Wales v Minor Counties, 1930&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/20/20190.html"&gt;Yorks v Lancs, 1952&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/50/50530.html"&gt;Kent v Surrey, 1988&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Increasing: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/49/49850.html"&gt;Lancs v Kent, 1988&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The Kent v Surrey innings (Surrey's first) is actually non-increasing right the way down to number eleven (it finishes with three ducks and a nought not out).  There are two other cases of non-increasing sequences from one to eleven: &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/3/3354.html"&gt;Yorks v Lancs, 1888&lt;/a&gt; (finishes with five ducks and a nought not out), &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/13/13110.html"&gt;Gloucs v Leics, 1929&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The longest non-decreasing sequence is still eight.&lt;br /&gt;&lt;br /&gt;Is there a way we could have guessed how rare these sequences are?  Any maths-phobes may wish to stop reading now.  There's some calculus below.&lt;br /&gt;&lt;br /&gt;A naïve approach would be to say that, since each score is either higher or lower than the previous one, it's just like flipping a coin.  (Not exactly: it could be equal, but it'd be pretty close.)  What's the probability of getting seven heads in a row?  One in 128.  But there have been over 1800 Test matches, so sequences of length seven should have occurred dozens of times!&lt;br /&gt;&lt;br /&gt;The error in the above reasoning is at the start: it's not like flipping a coin.  The first step is (the second opener could be higher or lower than the first, with a close to 50% chance each way), but after that, it's no longer 50-50.  Suppose the first opener makes 40.  It might be 50-50 as to whether the second opener scores less than 40.  He might make 30, say.  But then the probability that the number three will score less than 30 is less than 50%.  If he makes 20, the chance that the number four will make less than this is even smaller than the previous probability, and so on.&lt;br /&gt;&lt;br /&gt;A better way would be to assume that individual innings follow an exponential distribution, which says that scores of 0 are more common than scores of 1, which are more common than scores of 2, etc.  (This isn't real distribution &amp;mdash; in reality it's quite skewed towards zero &amp;mdash; but it's a reasonable approximation for these fun purposes.  Also, the runs are scored in discrete units &amp;mdash; 1, 2, 3, etc. &amp;mdash; but the exponential distribution allows for any positive real number of runs, such as sqrt(2) or 4.9.)  Assume further (to make the maths easier) that each batsman has the same average.&lt;br /&gt;&lt;br /&gt;The probability that a batsman with average 1/k makes a score less than x is given by:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;x                 &lt;br /&gt;/                 &lt;br /&gt;| ds k*exp(-k*s)             (*)&lt;br /&gt;/                 &lt;br /&gt;0&lt;/pre&gt;&lt;br /&gt;Sorry for the ugly formatting.  That's the integral from 0 to x of k*exp(-k*s) with respect to s.&lt;br /&gt;&lt;br /&gt;Now let the first n batsman's scores be called s1, s2, ..., s(n-1), sn.  We want the probability P that the sequence goes s1 &gt; s2 &gt; ... s(n-1) &gt; sn.  To start, we use (*) on the last link in the chain of inequalities:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;s(n-1)                 &lt;br /&gt;/                 &lt;br /&gt;| dsn k*exp(-k*sn)&lt;br /&gt;/                 &lt;br /&gt;0&lt;/pre&gt;&lt;br /&gt;Once we have that, we carry that on to the next inequality, and so on, until we have all of them.  The last integral is from 0 to infinity, since the first score s1 can be anything:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;oo                 s1                     s(n-1)&lt;br /&gt;/                  /                      /&lt;br /&gt;| ds1 k*exp(-k*s1) | ds2 k*exp(-k*s2) ... | dsn k*exp(-k*sn)&lt;br /&gt;/                  /                      /&lt;br /&gt;0                  0                      0&lt;/pre&gt;&lt;br /&gt;Now we evaluate these integrals!  Trust me when I say that, when you expand it all out, most of the terms cancel, and you're left with a term that comes from integrating the exponentials which multiply together, so that the probability is 1/n!.  &lt;br /&gt;&lt;br /&gt;So, a decreasing sequence of length six should happen about once every 720 innings, a sequence of length seven about once every 5040 innings, and a sequence of length 8 about once every 40320 innings.&lt;br /&gt;&lt;br /&gt;Essentially the same argument gives the same probabilities for increasing sequences.&lt;br /&gt;&lt;br /&gt;It's not perfect (and we wouldn't expect it to be so, given that the real distribution isn't exponential, and batsmen's averages aren't all equal), but it gets the right order of magnitude at least.  It gives us a good idea of why we haven't seen a sequence of length seven in Test cricket yet, though we should get one eventually, perhaps in the next ten or twenty years.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-4719909543962423782?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/4719909543962423782/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=4719909543962423782' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4719909543962423782'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/4719909543962423782'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/increasing-or-decreasing-scores.html' title='Increasing or decreasing scores'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-7334803282374877016</id><published>2008-02-05T21:34:00.001+01:00</published><updated>2008-02-17T19:39:55.312+01:00</updated><title type='text'>1800's first-class cricket in England: bowlers across eras</title><content type='html'>This is Part 6 in my series on first-class cricket in the 1800's in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post, I do a comparison of bowlers across eras, by weighting each wicket that a bowler takes by the batting average of the batsman dismissed.  (You can see the results for Test matches &lt;a href="http://pappubahry.blogspot.com/2007/11/weighted-bowling-averages.html"&gt;here&lt;/a&gt;.)  This has two effects.  The first is that bowlers who dismiss the best batsmen are rewarded for doing so (and bowlers who just pick up tailenders are punished).  The second is that, since in an era of low scoring, most batsmen will have proportionally lower averages (and vice versa), the resulting weighted averages will be comparable across eras.&lt;br /&gt;&lt;br /&gt;For scorecards where the caught etc. wickets are estimated rather than known, each "fractional wicket" is weighted by the batting average times the fraction.  For these scorecards, the bowler will still be rewarded for bowling the best batsmen, but after that, he is simply rewarded for doing well against teams with good batsmen who he might have dismissed.&lt;br /&gt;&lt;br /&gt;In each of the tables below, I give the wickets, weighted wickets, runs conceded, the usual average, and then two resulting weighted averages (one being just a scaling of the other).  One is with respect to 16,6 (the overall average for the 1800's), and one is with respect to 24,1 (the overall average from 1801 to 2007).&lt;br /&gt;&lt;br /&gt;First up, the top bowlers from the 1800's, as ordered by the weighted average.  Qualification: 200 wickets.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                wtd                     wtd avg&lt;br /&gt;name          start end   wkts  wkts    runs    avg   wrt 16,6 wrt 24,1&lt;br /&gt;CTB Turner    1888  1893  610   805,2   7869    12,90  9,77   14,19&lt;br /&gt;E Jones       1896  1899  256   490,0   4789    18,71  9,77   14,19&lt;br /&gt;T Richardson  1892  1899  1455  2183,6  22980   15,79  10,52  15,28&lt;br /&gt;W Rhodes      1898  1899  333   495,0   5311    15,95  10,73  15,58&lt;br /&gt;JT Hearne     1888  1899  1635  2406,6  25986   15,89  10,80  15,68&lt;br /&gt;W Mead        1892  1899  746   1229,8  13500   18,10  10,98  15,94&lt;br /&gt;GA Lohmann    1884  1896  1590  1963,4  21968   13,82  11,19  16,24&lt;br /&gt;AE Trott      1896  1899  453   715,7   8102    17,89  11,32  16,43&lt;br /&gt;G Freeman     1865  1880  288   250,2   2849,2  9,9    11,4   16,5  +/- 0,2%&lt;br /&gt;FR Spofforth  1878  1897  675   807,9   9204    13,64  11,39  16,54&lt;br /&gt;W Attewell    1881  1899  1809  2453,7  27955   15,45  11,39  16,54&lt;br /&gt;AW Mold       1889  1899  1486  1980,2  23044   15,51  11,64  16,89&lt;br /&gt;H Trumble     1890  1899  450   676,9   7883    17,52  11,65  16,91&lt;br /&gt;HF Boyle      1878  1890  259   299,8   3600    13,90  12,01  17,44&lt;br /&gt;A Shaw        1864  1897  1881  1912,0  23108,4 12,29  12,09  17,55 +/- 0,01%&lt;br /&gt;WH Lockwood   1886  1899  902   1277,2  15484   17,17  12,12  17,60&lt;br /&gt;S Haigh       1895  1899  364   552,2   6722    18,47  12,17  17,67&lt;br /&gt;WR Cuttell    1896  1899  324   523,6   6419    19,81  12,26  17,80&lt;br /&gt;J Briggs      1879  1899  1907  2386,0  29384   15,41  12,32  17,88&lt;br /&gt;AW Hallam     1895  1899  207   322,8   3995    19,30  12,37  17,96&lt;/pre&gt;&lt;br /&gt;Australians take places one and two!  It's always good to pass such sanity checks.  Charlie Turner and Ernie Jones have remarkably similar weighted averages &amp;mdash; they only diverge at the fourth decimal place.  (On a slightly more serious note: we would expect that international bowlers come near the top of these tables, since they are Test-class bowlers, and should generally be as good as the best English bowlers.)&lt;br /&gt;&lt;br /&gt;It's also good to see that this analysis puts Tom Richardson as the top Englishmen of the 1800's.  He was considered the best fast bowler of all time at his peak, which lasted from 1893 to 1897.  A young Wilfred Rhodes gets into fourth spot, based on his spectacular first two seasons of county cricket.&lt;br /&gt;&lt;br /&gt;The absence of bowlers from the first part of the century is explained by there not being many matches played.  Only a handful of players played a decent number of matches (that is, enough to reach 200 wickets), and most of these were batsmen or all-rounders.&lt;br /&gt;&lt;br /&gt;That James Broadbridge, James Cobbett et al. are missing from the top of the table is more of a surprise.  It seems that, while the round-arm era saw the lowest scoring in first-class history, none of the bowlers really stood out.  They had spectacular averages, but so did everyone else at the time.  (It is also possible that some of the bowlers would be higher if the wickets were known &amp;mdash; perhaps some dismissed more top-order batsmen than the estimations give them).  Broadbridge is 49th, with a weighted average of 14,2 with respect to 16,6.  Alfred Mynn is a long way down the table (94th; weighted average 15,9 wrt 16,6).&lt;br /&gt;&lt;br /&gt;Now let us move on to the comparison for all time.  Players whose first-class careers started in the 1800's are in bold.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                                wtd                     wtd avg&lt;br /&gt;name          start end   wkts  wkts    runs    avg   wrt 16,6 wrt 24,1&lt;br /&gt;M Muralidaran 1991  2007  306   556,4   5163    16,87  9,28   13,47&lt;br /&gt;RR Lindwall   1948  1956  218   385,0   3667    16,82  9,53   13,83&lt;br /&gt;RJ Hadlee     1973  1990  780   1327,1  12707   16,29  9,57   13,90&lt;br /&gt;WJ O'Reilly   1934  1938  213   367,9   3584    16,83  9,74   14,14&lt;br /&gt;H Verity      1930  1939  1732  2534,0  24816   14,33  9,79   14,22&lt;br /&gt;J Garner      1977  1986  426   728,4   7428    17,44  10,20  14,81&lt;br /&gt;&lt;b&gt;CTB Turner    1888  1893  610   764,4   7869    12,90  10,29  14,95&lt;/b&gt;&lt;br /&gt;WE Bowes      1928  1947  1591  2533,9  26201   16,47  10,34  15,01&lt;br /&gt;MD Marshall   1979  1994  994   1773,1  18369   18,48  10,36  15,04&lt;br /&gt;JB Statham    1950  1968  1999  3007,7  31533   15,77  10,48  15,22&lt;br /&gt;R Appleyard   1950  1958  664   979,1   10309   15,53  10,53  15,29&lt;br /&gt;CEL Ambrose   1988  2000  447   852,8   9301    20,81  10,91  15,83&lt;br /&gt;CV Grimmett   1926  1934  358   587,0   6441    17,99  10,97  15,93&lt;br /&gt;TM Alderman   1981  1989  370   679,7   7477    20,21  11,00  15,97&lt;br /&gt;IR Bishop     1988  1995  225   417,7   4609    20,48  11,03  16,02&lt;br /&gt;WW Armstrong  1902  1921  407   620,5   6880    16,90  11,09  16,10&lt;br /&gt;ST Clarke     1979  1988  591   1009,3  11226   18,99  11,12  16,15&lt;br /&gt;DW Carr       1909  1914  334   499,8   5585    16,72  11,17  16,22&lt;br /&gt;H Larwood     1924  1938  1336  2023,9  22766   17,04  11,25  16,33&lt;br /&gt;&lt;b&gt;G Freeman     1865  1880  288   249,9   2849,2  9,9    11,4   16,6  +/- 0,2%&lt;/b&gt;&lt;br /&gt;&lt;b&gt;SF Barnes     1894  1930  461   708,2   8080    17,53  11,41  16,56&lt;/b&gt;&lt;br /&gt;S Ramadhin    1950  1965  399   581,2   6662    16,70  11,46  16,64&lt;br /&gt;Waqar Younis  1990  2003  436   801,7   9251    21,22  11,54  16,75&lt;br /&gt;CA Walsh      1984  2000  1013  1839,4  21241   20,97  11,55  16,76&lt;br /&gt;&lt;b&gt;FR Spofforth  1878  1897  675   792,1   9204    13,64  11,62  16,87&lt;/b&gt;&lt;br /&gt;&lt;b&gt;H Trumble     1890  1902  587   837,8   9804    16,70  11,70  16,99&lt;/b&gt;&lt;br /&gt;HL Jackson    1947  1963  1730  2557,9  30066   17,38  11,75  17,06&lt;br /&gt;CIJ Smith     1930  1939  824   1322,9  15565   18,89  11,77  17,08&lt;br /&gt;GA Faulkner   1907  1924  267   374,7   4423    16,57  11,80  17,14&lt;br /&gt;&lt;b&gt;W Rhodes      1898  1930  3960  5489,5  64836   16,37  11,81  17,15&lt;/b&gt;&lt;/pre&gt;&lt;br /&gt;A good cross-section of eras and countries is represented, with Murali leading the pack.  Hedley Verity is, suprisingly enough, the top English bowler, and George Freeman sneaks into the top 20.  It is curious that SF Barnes is so far down the list (number 21).  When I did this exercise for Test bowlers, he was the best for anyone with 100 wickets.&lt;br /&gt;&lt;br /&gt;In amongst international stars, there are a couple of surprising names.  Bob Appleyard, who bowled both off-breaks and fast-medium, is the first of these.  He may well have become a great of the game had it not been for a terrible run of illness (including being diagnosed with TB) and injury.  His rise for Yorkshire was spectacular, as he took 200 wickets in his first full season.  But after just nine Tests (in which he took 31 wickets at under 18), his fall was just as spectacular.&lt;br /&gt;&lt;br /&gt;Douglas Carr played just one Test for England in 1909.  His career ended with the outbreak of war, but it surprises me that he didn't get any more Tests.  &lt;br /&gt;&lt;br /&gt;Les Jackson was also unlucky, being selected only twice for England.  In an era when England had less pacemen, he probably would have played more international cricket.&lt;br /&gt;&lt;br /&gt;Big Jim Smith is best remembered for his slogging (he once hit a half-century in eleven minutes), but his numbers suggest that he could have done well as a Test bowler.  As it is, he got just five Tests, taking 15 wickets at 26,2.  I suppose that if he had played some Tests against Bradman, his figures might not look so good.&lt;br /&gt;&lt;br /&gt;Next up: cross-era comparisons of batsmen.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-7334803282374877016?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/7334803282374877016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=7334803282374877016' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7334803282374877016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/7334803282374877016'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html' title='1800&apos;s first-class cricket in England: bowlers across eras'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5793647683730254267</id><published>2008-02-03T21:22:00.002+01:00</published><updated>2008-02-17T19:41:40.520+01:00</updated><title type='text'>1800's first-class cricket in England: batsmen</title><content type='html'>This is Part 5 in my series on cricket in the 1800's in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(&lt;b&gt;Edit&lt;/b&gt;: I've fixed a number of typos in the tables.  The runs and averages were correct, but somehow the innings and not-outs had got all mixed up.)&lt;br /&gt;&lt;br /&gt;This post is nothing statistically special &amp;mdash; just a few lists of batsmen.  Batting data for first-class cricket is essentially complete, so there's no need for fancy estimation techniques for calculating averages etc.&lt;br /&gt;&lt;br /&gt;The leading batting averages for the 1800's come exclusively from players who played near the end of the century, when batting became easier.  Qualification (for all tables in this post): 2000 runs.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name            start end   mat  inns  no  runs  avg     wkts   runs  avg     +/- %&lt;br /&gt;RM Poore        1898  1899  26   47    6   2277  55,54   0      13    0,00    0&lt;br /&gt;KS Ranjitsinhji 1893  1899  128  232   28  10411 51,03   67     2340  34,93   0&lt;br /&gt;WG Grace        1865  1899  732  1250  89  46792 40,30   2495   43960 17,62   0&lt;br /&gt;WG Quaife       1894  1899  120  203   39  6122  37,33   55     1792  32,58   0&lt;br /&gt;TW Hayward      1893  1899  191  283   26  9558  37,19   364    7367  20,24   0&lt;br /&gt;CJ Burnup       1895  1899  77   137   12  4633  37,06   22     855   38,86   0&lt;br /&gt;PA Perrin       1896  1899  76   131   13  4336  36,75   2      109   54,50   0&lt;br /&gt;CB Fry          1892  1899  114  209   8   7364  36,64   149    4046  27,15   0 &lt;br /&gt;A Shrewsbury    1875  1899  400  654   66  20837 35,44   0      2     0,00    0&lt;br /&gt;J Darling       1896  1899  67   109   10  3496  35,31   1      38    38,00   0&lt;/pre&gt;&lt;br /&gt;Robert Poore's pretty lucky that I stopped at 1899.  In his last season of cricket before fighting in the Boer War, he scored a triple century and averaged over 90.  When he returned to first-class cricket in 1902, he did not return to those heights, and his first-class career finished with an average under 40.  Ranjitsinhji, on the other hand, was able to sustain his average, and indeed it finished over 56 (with a Test average in the mid-40's as well).&lt;br /&gt;&lt;br /&gt;Joe Darling is the first Australian to make any of these lists.  His presence arises from his scores in the Australian tours of England in 1896 and 1899.&lt;br /&gt;&lt;br /&gt;Now let's look at the leading batsmen by runs scored.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name            start end   mat  inns  no  runs  avg     wkts   runs  avg     +/- %&lt;br /&gt;WG Grace        1865  1899  732  1250  89  46792 40,30   2495   43960 17,62   0&lt;br /&gt;R Abel          1881  1899  468  743   48  22846 32,87   246    5644  22,94   0&lt;br /&gt;W Gunn          1880  1899  437  716   65  21520 33,06   74     1660  22,43   0&lt;br /&gt;WW Read         1873  1897  450  723   50  21408 31,81   101    3339  33,06   0&lt;br /&gt;A Shrewsbury    1875  1899  400  654   66  20837 35,44   0      2     0       0&lt;br /&gt;G Ulyett        1873  1893  499  862   39  19031 23,12   598    11765 19,67   0&lt;br /&gt;AN Hornby       1867  1899  422  687   41  15752 24,38   7      179   25,57   0&lt;br /&gt;H Jupp          1862  1881  375  686   48  15244 23,89   7      316   45,14   0&lt;br /&gt;W Barnes        1875  1894  421  666   54  14108 23,05   803    13935 17,35   0&lt;br /&gt;AE Stoddart     1885  1899  255  449   54  13799 31,72   206    5553  26,96   0&lt;/pre&gt;&lt;br /&gt;We see that WG Grace, as well as taking comfortably more wickets than anyone else, scored more than twice as many first-class runs in England than any other batsman in the 1800's!  Of course, he did play many more matches (second on that list is George Ulyett with 499), but you can see how he was such a giant of the game, and why he remains famous to this day.&lt;br /&gt;&lt;br /&gt;Henry Jupp, who represented England in the first two Test matches, is the only player in that table to have ended his career before 1890.  Here are some other players to have passed 2000 runs, ordered by the start season of their careers:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             start end   mat inns  no  runs  avg     wkts   runs    avg   +/- %&lt;br /&gt;Lord F Beauclerk 1801  1825  94  172   14  4319  27,34   406,4  5106,9  12,6  10&lt;br /&gt;W Lambert        1801  1817  62  112   5   2961  27,67   318,1  3960,3  12,5  10&lt;br /&gt;W Beldham        1801  1821  69  127   7   2265  18,88   96,0   1193,0  12,4  10&lt;br /&gt;R Robinson       1801  1819  57  111   9   2039  19,99   34,7   802,4   23,1  10&lt;br /&gt;EH Budd          1803  1831  68  119   9   2597  23,61   285,8  4200,8  14,7  10&lt;br /&gt;W Ward           1810  1845  116 210   21  3517  18,61   73,0   1511,4  20,7  10&lt;br /&gt;J Broadbridge    1814  1840  90  163   21  2368  16,68   405,6  3699,7  9,1   9,9&lt;br /&gt;F Pilch          1820  1854  213 389   30  6797  18,93   169,5  1666,3  9,8   9,4&lt;br /&gt;EG Wenman        1825  1854  135 241   15  3088  13,66   62,2   485,7   7,8   10&lt;br /&gt;FW Lillywhite    1825  1851  220 390   84  2203  7,20    1599,8 14181,1 8,9   8,5&lt;/pre&gt;&lt;br /&gt;Beauclerk and Lambert we met in Part 4.  Billy Beldham (whose photograph you can see &lt;a href="http://content-usa.cricinfo.com/england/content/player/9345.html"&gt;here&lt;/a&gt;) perhaps played his best cricket in the late 18th century, in matches that are not classified as first-class.  He is credited with being a founder of what might be called proto-modern batting technique.  Batting has evolved a long way since then!  While Beldham and his contemporaries stepped forward to meet the ball, it would take about another half-century before batsmen played attacking strokes off either the front foot or the back, and strokes off the pads were developed by Ranji late close to 1900.&lt;br /&gt;&lt;br /&gt;Robert Robinson is a rather anonymous figure as far as the Internet is concerned.  I haven't managed to find much out about him, despite him having been one of the leading batsmen of the day.  He also played in the 18th century, and scored a century in his first important match for which CricketArchive has a full scorecard (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/168.html"&gt;Kent v Hampshire&lt;/a&gt;, 1792).&lt;br /&gt;&lt;br /&gt;Next up: Adjusting bowling averages for era and the quality of wickets taken.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5793647683730254267?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5793647683730254267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5793647683730254267' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5793647683730254267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5793647683730254267'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html' title='1800&apos;s first-class cricket in England: batsmen'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3606491731719760816</id><published>2008-01-31T22:36:00.000+01:00</published><updated>2008-02-01T12:48:29.495+01:00</updated><title type='text'>Falls of wicket</title><content type='html'>This is a follow-up to my &lt;a href="http://pappubahry.blogspot.com/2008/01/openers-and-falls-of-wicket.html"&gt;post&lt;/a&gt; on the "fow-average" of openers.  &lt;a href="http://tcwj.blogspot.com/"&gt;Soulberry&lt;/a&gt; wanted to see what the average fall of wicket was for non-openers.  To make things easier for me, I've split it up by position.&lt;br /&gt;&lt;br /&gt;Note that I found a bug in my earlier code, so the list of the top openers has shuffled around a little, though Russel Arnold remains on top.  In the tables below I give the number of innings (less any not-outs which didn't see a wicket fall) and the fow-average.  Note that for the non-openers, I've subtracted off the wicket that the batsman came in at.  Qualification: 15 innings.  (&lt;b&gt;Edit&lt;/b&gt;: Richie Richardson's figures below are wrong, and Michael Clarke's might be as well.  My lazy code got them mixed up with Viv Richards and Stuart Clark.)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;opener                            number 3    &lt;br /&gt;Russel Arnold           15  3,22  George Headley          32  3,71&lt;br /&gt;Raman Subba Row         16  2,97  George Gunn             16  2,94&lt;br /&gt;Ravi Shastri            26  2,95  Allan Border            37  2,92&lt;br /&gt;Bill Woodfull           43  2,90  Wally Hammond           50  2,91&lt;br /&gt;Glenn  Turner           66  2,87  Richie Richardson       174 2,89&lt;br /&gt;Bruce Mitchell          48  2,78  Alvin Kallicharran      25  2,84&lt;br /&gt;Arthur Shrewsbury       18  2,78  Rahul Dravid            143 2,84&lt;br /&gt;Jackie McGlew           58  2,74  Ken Barrington          40  2,78&lt;br /&gt;Dennis Amiss            69  2,66  Eric Rowan              20  2,71&lt;br /&gt;Chris Tavaré            33  2,63  Lindsay Hassett         19  2,68&lt;br /&gt;&lt;br /&gt;number 4                          number 5    &lt;br /&gt;Dean Jones              18  3,20  Michael Clarke          31  3,12&lt;br /&gt;Rahul Dravid            18  3,11  Jimmy Adams             29  3,04&lt;br /&gt;Rajin Saleh             23  2,91  Shivnarine Chanderpaul  80  2,97&lt;br /&gt;Jacques Kallis          96  2,84  Kevin Pietersen         25  2,84&lt;br /&gt;Geoff Howarth           17  2,76  Dilip Vengsarkar        30  2,77&lt;br /&gt;Vijay Hazare            35  2,69  Yashpal Sharma          29  2,70&lt;br /&gt;Brian Hastings          29  2,58  Steve Waugh             138 2,69&lt;br /&gt;Monty Noble             23  2,57  Andy Flower             80  2,69&lt;br /&gt;Herbie Taylor           23  2,57  Ken Viljoen             16  2,69&lt;br /&gt;Richie Richardson       19  2,53  John Crawley            15  2,62&lt;br /&gt;&lt;br /&gt;number 6    &lt;br /&gt;Joe Solomon             16  3,41&lt;br /&gt;Trevor Bailey           38  3,37&lt;br /&gt;Imran Khan              22  3,23&lt;br /&gt;Nawab of Pataudi        21  2,86&lt;br /&gt;Hashan Tillakaratne     72  2,86&lt;br /&gt;Jimmy Adams             31  2,81&lt;br /&gt;Shivnarine Chanderpaul  39  2,75&lt;br /&gt;Allan Border            58  2,74&lt;br /&gt;Les Ames                17  2,68&lt;br /&gt;Dattu Phadkar           20  2,68&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;There's a rather conspicuous absentee amongst the number threes.  I went through Bradman's innings, and when he made a big score he was often part of a large partnership, so that he actually didn't see too many wickets fall.  His fow-average at number three is 2,45.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://eye-on-cricket.blogspot.com/"&gt;Samir&lt;/a&gt; also wanted to see average team runs scored while the batsman is at the crease.  I did mean to calculate this, but I forgot until after I'd made my spreadsheets.  I think the above tables are interesting enough as it is though.  There are several players that you would expect to have "held the innings together" often.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3606491731719760816?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3606491731719760816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3606491731719760816' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3606491731719760816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3606491731719760816'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/falls-of-wicket.html' title='Falls of wicket'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-5268994873812314632</id><published>2008-01-29T18:07:00.002+01:00</published><updated>2008-02-17T19:41:23.090+01:00</updated><title type='text'>1800's first-class cricket in England: bowlers</title><content type='html'>This is Part 4 in my series on 1800's cricket in England.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(&lt;b&gt;Edit&lt;/b&gt;: My code at first counted "absent" as a nought not out.  This has been fixed.  All it does is decrease of new innings and not-out tallies.)&lt;br /&gt;&lt;br /&gt;In this post I apply the method detailed in Part 3 to all first-class scorecards with missing data.  But first I have to make a small confession &amp;mdash; the method I've used is surely not the best one.  The scorecards with missing data come in (mostly) two types.  The earliest scorecards only credit bowlers with bowled dismissals, and do not record the runs conceded by bowlers (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/342.html"&gt;this&lt;/a&gt; is a typical example).  Later scorecards give full credit to bowlers for their dismissals, but don't record the runs conceded (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/693.html"&gt;this&lt;/a&gt; is a typical example).  There are also five matches where the runs conceded are recorded but bowlers aren't given credit for catches, etc.&lt;br /&gt;&lt;br /&gt;The method in Part 3 dealt only with the first type of scorecard.  With the second type of scorecard, you should be able to get better estimates of the bowling averages, since you have more data (namely, how many wickets each bowler took).  But when I tried to apply a similar method to these scorecards (finding the average percentage of team runs conceded by bowlers who took 1 wicket, bowlers who took 2 wickets, etc.), I got results that were biased in favour of regular wicket-takers.  The top 18 wicket-takers in the test dataset had estimates of bowling averages that were too low, with the errors ranging from 0,2% to almost 23%.  The (justified) fudge factor used in the previous method makes the estimates even lower!&lt;br /&gt;&lt;br /&gt;I don't know (yet?) how to fix this.  There must surely be a better, more sophisticated model to estimate runs conceded &amp;mdash; you shouldn't get worse results with more data!  But since that's what's happening for me, I've instead ignored all the non-bowled dismissals for these scorecards, and applied the method used on the early scorecards.  I've then scaled up the estimated runs conceded and estimated wickets so that the wicket tally matches reality.  &lt;br /&gt;&lt;br /&gt;So, onto the results!  In the various tables that follow, I give the start and end years of the career, matches (these may not agree with the usual sources, since I exclude matches that weren't eleven-a-side), wickets, runs conceded, bowling average, +/- %; and then batting stats (for which we have complete data): innings, not-outs, runs, average.&lt;br /&gt;&lt;br /&gt;Note 1: If there is a decimal comma in the wickets tally, then it is almost certainly an underestimate.  How big an underestimate I don't know.  In my test dataset, one bowler's estimated wicket tally was 47% below what it should have been.  Despite this, the estimate of the average was only out by just over 7%.  For other bowlers, the wickets estimate was within 2% of reality.  The lesson here is not to rely on my wicket estimates.&lt;br /&gt;&lt;br /&gt;Note 2: One of the columns is called +/- %.  About 80% of the estimated averages should fall inside the estimated averages, plus or minus the given percent.  If the bowler only ever had bowleds credited to him, this value is 10%.&lt;br /&gt;&lt;br /&gt;The first table gives the leading bowlers of the 1800's in England by bowling average.  Qualification (for this table and all that follow): 200 wickets.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat wkts    runs    avg   +/- %   inns  no  runs  avg&lt;br /&gt;J Cobbett     1826  1841  94  556,3   4598,7  8,3   9,7     162   16  1437  9,84&lt;br /&gt;FW Lillywhite 1825  1851  220 1599,8  14181,1 8,9   8,5     390   84  2203  7,20&lt;br /&gt;S Redgate     1830  1846  74  414,0   3775,2  9,1   8,0     133   23  957   8,70&lt;br /&gt;J Broadbridge 1814  1840  90  405,6   3699,7  9,1   9,9     163   21  2368  16,68&lt;br /&gt;J Bayley      1822  1850  81  358,7   3500,5  9,8   9,3     140   17  905   7,36&lt;br /&gt;G Freeman     1865  1880  44  288     2849,2  9,9   0,2     70    3   918   13,70&lt;br /&gt;WR Hillyer    1835  1853  216 1407,3  14061,5 10,0  7,1     386   62  2544  7,85&lt;br /&gt;J Wisden      1845  1863  175 1036,5  10356,9 10,0  3,4     305   29  4020  14,57&lt;br /&gt;T Nixon       1841  1859  50  250     2503,5  10,0  5,0     83    17  300   4,55&lt;br /&gt;A Mynn        1832  1859  200 1059,9  10940,1 10,3  7,0     372   24  4749  13,65&lt;/pre&gt;&lt;br /&gt;Note that this doesn't mean that James Cobbett had the lowest average of the 1800's &amp;mdash; if the estimate was particularly bad, it might be up around 10.  This would still be one of the lowest ever, of course.  Cobbett was a round-arm spin bowler.&lt;br /&gt;&lt;br /&gt;Second on the table is William Lillywhite, a medium-pace round-arm bowler.  His wicket tally is enormous.&lt;br /&gt;&lt;br /&gt;Third is Samuel Redgate, a fast bowler who we can thank for batting pads, along with Alfred Mynn (tenth on the table).  These two were the fastest bowlers of their day, but Mynn was also a pretty good batsman.  They squared off against each other in the &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/562.html"&gt;North v South&lt;/a&gt; game of 1836.  Mynn had hurt his ankle before play started, but nevertheless batted at 5 in South's second innings.  Redgate repeatedly hit Mynn on his unprotected legs, damaging them to the point where amputation was considered.  In what must be one of the most courageous innings of all-time, Mynn struck an unbeaten century (the only century of his first-class career), before being sent to London for medical treatment.  After this, batsmen started wearing leg guards.  You can read about this innings in more detail &lt;a href="http://content-www.cricinfo.com/columns/content/story/135983.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;James Broadbridge comes in fifth.  This average-estimating exercise is particularly useful for the Sussex round-armer &amp;mdash; in the standard sources his average is given as 18,62.  This very wrong figure is based on just 14 of his career wickets, which total over 400!&lt;br /&gt;&lt;br /&gt;The ninth player in the table above is Thomas Nixon, a round-arm slow bowler whose first-class career comprised mostly matches for the MCC.  You'll note that the +/- % figure is given as 5,0; this means that roughly half of his runs conceded came in matches where this was recorded.  This gives us a useful check: we know that his average in these matches was 10,12.  Since the estimated average is 10,0, it looks like the estimate is pretty good.&lt;br /&gt;&lt;br /&gt;For what it's worth, the next table shows the leading bowlers by wickets taken.  Since the amount of first-class cricket increased over the course of the 19th century, the top of the list is dominated by people who played close to 1900.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name          start end   mat wkts    runs    avg   +/- %   inns  no  runs  avg&lt;br /&gt;WG Grace      1865  1899  732 2495    43960   17,62 0       1250  89  46792 40,30&lt;br /&gt;J Briggs      1879  1899  446 1907    29384   15,41 0       686   44  11593 18,06&lt;br /&gt;A Shaw        1864  1897  377 1881    23108,4 12,29 0,01    582   92  6244  12,74&lt;br /&gt;W Attewell    1881  1899  399 1809    27955   15,45 0       600   60  7577  14,03&lt;br /&gt;J Southerton  1854  1879  282 1674    24171   14,44 0       474   128 3136  9,06&lt;br /&gt;JT Hearne     1888  1899  258 1635    25986   15,89 0       390   118 3029  11,14&lt;br /&gt;R Peel        1882  1899  397 1606    25233   15,71 0       630   56  10837 18,88&lt;br /&gt;FW Lillywhite 1825  1851  220 1599,8  14181,1 8,86  8,5     390   84  2203  7,20&lt;br /&gt;GA Lohmann    1884  1896  256 1590    21968   13,82 0       371   36  6495  19,39&lt;br /&gt;T Emmett      1866  1888  405 1493    20081   13,45 0       664   87  8641  14,98&lt;/pre&gt;&lt;br /&gt;WG rather stands out in this list.  Not only did he take more than 500 more first-class wickets than anyone else in England in the 1800's, but he did it while averaging over 40 with the bat.&lt;br /&gt;&lt;br /&gt;Lillywhite's wickets estimate is almost certainly low, and he should be at least one rank higher.  He might deserve to he higher still, but we can't know for sure.&lt;br /&gt;&lt;br /&gt;To have a look at some more early bowlers, here's a table with players ordered by the starting year of their careers.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name             start end   mat wkts    runs    avg   +/- %   inns  no  runs  avg&lt;br /&gt;Lord F Beauclerk 1801  1825  94  406,4   5106,9  12,6  10      172   14  4319  27,34&lt;br /&gt;W Lambert        1801  1817  62  318,1   3960,3  12,5  10      112   5   2961  27,67&lt;br /&gt;J Wells          1801  1815  44  271,1   3090,2  11,4  10      85    9   615   8,09&lt;br /&gt;TC Howard        1803  1828  81  462,3   5712,4  12,4  10      149   16  1454  10,93&lt;br /&gt;EH Budd          1803  1831  68  285,8   4200,8  14,7  10      119   9   2597  23,61&lt;br /&gt;W Ashby          1808  1830  37  209,5   2236,8  10,7  10      64    21  213   4,95&lt;br /&gt;J Broadbridge    1814  1840  90  405,6   3699,7  9,1   9,9     163   21  2368  16,68&lt;br /&gt;J Bayley         1822  1850  81  358,7   3500,5  9,8   9,3     140   17  905   7,36&lt;br /&gt;FW Lillywhite    1825  1851  220 1599,8  14181,1 8,9   8,5     390   84  2203  7,20&lt;br /&gt;W Clarke         1826  1855  129 714,1   7588,7  10,6  5,2     220   35  1966  10,63&lt;/pre&gt;&lt;br /&gt;William Lambert was, along with Beauclerk, one of the stand-out all-rounders of the early 19th century.  These two have similar averages, both for batting and bowling.  The bowling average of around 12,5 is about typical for the era, which was very low-scoring.  That should put a batting average of over 27 into some perspective.  Lambert was, however, banned for life for match-fixing.  &lt;br /&gt;&lt;br /&gt;Lord Frederick Beauclerk is perhaps my favourite character in cricket history.  Not only was he a Lord, a title sadly absent from modern English cricketers, but he was the golden boy of the first part of the 19th century (see his picture &lt;a href="http://content-www.cricinfo.com/ci/content/player/16395.html"&gt;here&lt;/a&gt;).  Not only was he an outstanding all-rounder, but he embodied the spirit of cricket so lacking in today's players.  A clergyman, he claimed to make £600 a year from betting on cricket.  He was unassuming when batting &amp;mdash; (according to his Wikipedia article at least) he used to place an expensive watch on the middle stump.  He was a "foul-mouthed, dishonest man who was one of the most hated figures in society ... he bought and sold matches as though they were lots at an auction".&lt;br /&gt;&lt;br /&gt;You may have noticed that, along with the leading wicket-takers being from near 1900, the leading averages are mostly from around the second quarter of the century.  Adjusting the bowling averages for era will be the subject of Part 6.  A suivre !&lt;br /&gt;&lt;br /&gt;If your favourite 19th century bowler with missing data has been omitted from the tables above, you can find him in the table below, which lists all bowlers whose averages needed some estimating.  They are ordered by the starting year of their first-class careers.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name               start end   mat wkts    runs    avg   +/- %   inns  no  runs  avg&lt;br /&gt;Lord F Beauclerk   1801  1825  94  406,4   5106,9  12,6  10      172   14  4319  27,34&lt;br /&gt;W Lambert          1801  1817  62  318,1   3960,3  12,5  10      112   5   2961  27,67&lt;br /&gt;J Wells            1801  1815  44  271,1   3090,2  11,4  10      85    9   615   8,09&lt;br /&gt;TC Howard          1803  1828  81  462,3   5712,4  12,4  10      149   16  1454  10,93&lt;br /&gt;EH Budd            1803  1831  68  285,8   4200,8  14,7  10      119   9   2597  23,61&lt;br /&gt;W Ashby            1808  1830  37  209,5   2236,8  10,7  10      64    21  213   4,95&lt;br /&gt;J Broadbridge      1814  1840  90  405,6   3699,7  9,1   9,9     163   21  2368  16,68&lt;br /&gt;J Bayley           1822  1850  81  358,7   3500,5  9,8   9,3     140   17  905   7,36&lt;br /&gt;FW Lillywhite      1825  1851  220 1599,8  14181,1 8,9   8,5     390   84  2203  7,20&lt;br /&gt;W Clarke           1826  1855  129 714,1   7588,7  10,6  5,2     220   35  1966  10,63&lt;br /&gt;J Cobbett          1826  1841  94  556,3   4598,7  8,3   9,7     162   16  1437  9,84&lt;br /&gt;T Barker           1826  1845  70  241,0   2543,2  10,6  9,0     128   12  1236  10,66&lt;br /&gt;S Redgate          1830  1846  74  414,0   3775,2  9,1   8,0     133   23  957   8,70&lt;br /&gt;FH Hervey-Bathurst 1831  1861  83  310,7   3676,5  11,8  7,5     142   19  755   6,14&lt;br /&gt;A Mynn             1832  1859  200 1059,9  10940,1 10,3  7,0     372   24  4749  13,65&lt;br /&gt;WR Hillyer         1835  1853  216 1407,3  14061,5 10,0  7,1     386   62  2544  7,85&lt;br /&gt;J Dean             1835  1861  296 1118,8  13358,0 11,9  4,9     533   63  4794  10,20&lt;br /&gt;CG Taylor          1836  1859  122 292,0   3281,1  11,2  7,0     222   11  3020  14,31&lt;br /&gt;W Martingell       1839  1860  170 516,3   5722,1  11,1  3,5     290   45  2258  9,22&lt;br /&gt;T Nixon            1841  1859  50  250     2503,5  10,0  5,0     83    17  300   4,55&lt;br /&gt;D Day              1842  1852  41  204,2   2253,5  11,0  6,4     71    14  352   6,18&lt;br /&gt;J Wisden           1845  1863  175 1036,5  10356,9 10,0  3,4     305   29  4020  14,57&lt;br /&gt;T Sherman          1846  1870  78  322     3986,8  12,4  3,6     133   32  704   6,97&lt;br /&gt;RC Tinley          1847  1874  113 287     4239,1  14,8  0,5     191   23  1890  11,25&lt;br /&gt;J Lillywhite       1848  1873  178 223     2573,4  11,5  0,4     312   26  5084  17,78&lt;br /&gt;W Caffyn           1849  1873  180 564     7654,1  13,6  0,3     314   20  5405  18,38&lt;br /&gt;E Willsher         1850  1875  247 1209    15600,8 12,9  0,3     435   60  4699  12,53&lt;br /&gt;J Grundy           1850  1869  282 1063    13202,8 12,4  1,9     477   37  5600  12,73&lt;br /&gt;D Buchanan         1850  1881  56  359     5552,6  15,5  1,0     96    34  224   3,61&lt;br /&gt;T Sewell           1851  1868  149 315     6161,4  19,6  0,1     250   51  2422  12,17&lt;br /&gt;FP Miller          1851  1868  134 253     5129,4  20,3  0,5     230   20  3053  14,54&lt;br /&gt;T Hayward          1854  1872  108 237     3890,9  16,4  0,6     182   11  4487  26,24&lt;br /&gt;FR Reynolds        1854  1874  65  208     3530,6  17,0  1,4     106   26  444   5,55&lt;br /&gt;J Jackson          1855  1867  107 613     7132,8  11,6  0,1     176   30  1821  12,47&lt;br /&gt;VE Walker          1856  1877  135 328     5039,3  15,4  0,9     213   31  3186  17,51&lt;br /&gt;T Hearne           1857  1876  165 287     4120,0  14,4  0,4     277   19  4807  18,63&lt;br /&gt;GF Tarrant         1860  1869  63  365     4539,6  12,4  0,4     106   8   1467  14,97&lt;br /&gt;G Wootton          1861  1873  175 904     12080,3 13,4  0,2     282   61  2343  10,60&lt;br /&gt;RD Walker          1861  1877  113 318     5468,0  17,2  0,5     186   7   3521  19,67&lt;br /&gt;ID Walker          1862  1884  269 208     4634,8  22,3  0,2     466   39  10470 24,52&lt;br /&gt;A Shaw             1864  1897  377 1881    23108,4 12,3  0,0     582   92  6244  12,74&lt;br /&gt;G Freeman          1865  1880  44  288     2849,2  9,9   0,2     70    3   918   13,70&lt;br /&gt;F Morley           1871  1883  212 1184    15748,8 13,3  0,0     324   84  1292  5,38&lt;br /&gt;A Hill             1871  1883  188 722     10392,8 14,4  0,0     303   33  2346  8,69&lt;br /&gt;CT Studd           1879  1884  85  426     7427,5  17,4  0,2     145   23  3928  32,20&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-5268994873812314632?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/5268994873812314632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=5268994873812314632' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5268994873812314632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/5268994873812314632'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html' title='1800&apos;s first-class cricket in England: bowlers'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3512141311331474660</id><published>2008-01-28T21:01:00.001+01:00</published><updated>2008-02-17T19:41:04.092+01:00</updated><title type='text'>1800's first-class cricket in England: filling in the gaps</title><content type='html'>This is Part 3 in my series on first-class cricket in England in the 1800's.  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post I detail a method of filling in all the gaps in those early scorecards.  By doing so, we can get realistic estimates of bowling averages, despite only knowing about bowled dismissals and team totals.  This will mostly be a geek interest post.  Though the maths isn't technically hard (it's really just the four basic arithmetic operators), it does go on for a bit.&lt;br /&gt;&lt;br /&gt;To begin, let's recall what the important gaps in the early scorecards are.  First, bowlers were only credited with wickets when they bowl a batsman &amp;mdash; catches, LBW's, stumpings, and hit wickets were not counted in bowler's wicket tallies.  Second, the number of runs conceded by bowlers was not recorded.&lt;br /&gt;&lt;br /&gt;To fill in these gaps, I took a set of scorecards (as old as possible, to try to match the characteristics of the earlier eras) which &lt;i&gt;do&lt;/i&gt; contain the relevant information.  For each card, I broke the dismissals down into three types:&lt;br /&gt;&lt;br /&gt;A. bowled&lt;br /&gt;B. other wicket credited to the bowler (catches, etc.)&lt;br /&gt;C. wicket not credited to the bowler (run outs, etc.) or not-outs.&lt;br /&gt;&lt;br /&gt;For each bowler who took 1 wicket bowled, I counted how many other wickets he took, out of the possible remaining (ie, type B above).  Similarly for each bowler who took 2 wickets bowled, 3 wickets bowled, and so on.&lt;br /&gt;&lt;br /&gt;If you do this for all the scorecards in the sample and add up the corresponding numbers, you can get the probability that a batsman dismissed by a type B wicket was dismissed by a bowler who took 1 wicket bowled, or by a bowler who took 2 wickets bowled, etc.&lt;br /&gt;&lt;br /&gt;Put another way: you can get the average fraction of type B wickets taken by a bowler who took 1 wicket bowled, or 2 wickets bowled, etc.&lt;br /&gt;&lt;br /&gt;The actual numbers (based on matches with the relevant data until part-way through 1863) are as follows:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;wkts bowled       1      2      3      4      5      6      7&lt;br /&gt;frac other wkts   0,300  0,363  0,417  0,432  0,423  0,461  0,525&lt;/pre&gt;&lt;br /&gt;(Tthe last value here was adjusted by hand, based on later matches.)  In this particular dataset, there was never a player who took 8 wickets or more in an innings bowled; I set the fractions for 8 and 9 wickets mildly arbitrarily at 0,5 (based on the equivalent numbers for later matches).&lt;br /&gt;&lt;br /&gt;Now comes the estimate of the wicket tally.  Suppose in a scorecard that Smith took 1 wicket bowled, and Jones took 3 wickets bowled.  There are four catches with bowler unknown, and there was one run out.&lt;br /&gt;&lt;br /&gt;There are four type B wickets, and Smith gets 4*0,302 = 1,208 of them, giving him 2,208 for the innings.  Jones gets 4*0,428 = 1,712, giving him 5,712 for the innings.  &lt;br /&gt;&lt;br /&gt;Of course, that means that the total wickets don't add up to 10.  If a bowler only took wickets caught, then he's going to be ignored by this analysis.  This means that the estimated wicket tallies will be significantly lower than what they really were.  But bowlers who didn't get any wickets bowled will also not have any runs conceded estimated for them, as we will see shortly.  We will hope that, by ignoring both wickets and runs conceded in these situations, the bowling averages over a career will be largely unaffected.&lt;br /&gt;&lt;br /&gt;(It is also possible, if three bowlers each took 3 wickets for instance, that the estimated wicket tally for an innings could be greater than 10.  This isn't a serious problem.)&lt;br /&gt;&lt;br /&gt;To estimate the runs conceded by each bowler, I followed a similar procedure to that for type B wickets, finding the average fraction of runs (ignoring byes etc.) that bowlers who took 1 wicket conceded, bowlers who took 2 wickets conceded, and so on.  The resulting table looks like this (the wickets now are total wickets, caught, bowled, the lot):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;wkts 1      2      3      4      5      6      7      8      9      10&lt;br /&gt;frac 0,164  0,223  0,277  0,322  0,359  0,368  0,405  0,401  0,424  0,5&lt;/pre&gt;&lt;br /&gt;(The last entry in that table was adjusted by hand, based on the corresponding number for later matches.)&lt;br /&gt;&lt;br /&gt;This tells us that, for instance, a bowler who took 4 wickets, on average, conceded 32,2% of the batting team's runs in an innings.&lt;br /&gt;&lt;br /&gt;So, for each scorecard, we estimate the number of wickets taken by each bowler, and then use this tally and the second table to estimate the number of runs conceded (based on the batting team's score).  We now have wickets and runs, so we can calculate an average!&lt;br /&gt;&lt;br /&gt;But there's a rather large assumption in this model, and that is that the characteristics of wicket-taking and conceding runs don't change much.  This is definitely not true in general: by taking a sample of matches from later, the fractions in the first table all decrease (suggesting that more bowlers were used in the latter part of the 19th century than in the 1850's).  This could cause a systematic error in the estimates.  To fudge my way around this, I take the overall bowling average (which we know from the team totals and the total number of wickets lost) and compare it to the overall estimated bowling average.  The estimated bowling averages are scaled up or down according to the ratio of the overall average to its estimate.  If that's not clear, I'll come to an example shortly.&lt;br /&gt;&lt;br /&gt;Before we dive in and start estimating averages from 1812, it would be prudent to check to see if the method actually works.  I took a set of about 950 matches from 1888 to 1896 (well after the dataset I used to generate the fractions above), and pretended that I didn't have data on type B wickets or runs conceded.  I do the estimates, and then compare the averages with the actual averages, which can be calculated exactly (since there's no missing information).  &lt;br /&gt;&lt;br /&gt;When I did this (before implementing the fudge factor), there was a clear systematic error: the estimates of the averages were almost always lower than the real averages.  According to the estimates, the overall average was 15,07.  In reality it was 18,23.  So I multiplied all of the estimated averages by 18,23/15,07 = 1,21.&lt;br /&gt;&lt;br /&gt;Here are the results, with players ordered by wickets taken (in real life).  Note that these are not career figures &amp;mdash; they are solely based on the sample of about 950 matches.  The headings are &lt;b&gt;est&lt;/b&gt;imated and &lt;b&gt;act&lt;/b&gt;ual.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;                   wkts         runs           avg&lt;br /&gt;name          mat  est    act   est     act    est    act    % error&lt;br /&gt;J Briggs      198  754,4  1172  9243,2  15930  15,16  13,59  +11,5&lt;br /&gt;R Peel        237  744,3  1158  9830,5  17281  16,34  14,92  +9,5&lt;br /&gt;AW Mold       166  1063,1 1107  10992,9 15884  12,79  14,35  -10,9&lt;br /&gt;W Attewell    214  786,4  1087  10714,9 15960  16,86  14,68  +14,8&lt;br /&gt;GA Lohmann    147  766,1  1011  8168,1  13227  13,19  13,08  +0,8&lt;br /&gt;JT Hearne     149  887,9  956   11805,6 14476  16,45  15,14  +8,6&lt;br /&gt;F Martin      199  773,4  950   10852,5 15128  17,36  15,92  +9,0&lt;br /&gt;T Richardson  97   727,2  765   8332,7  10647  14,17  13,92  +1,8&lt;br /&gt;E Wainwright  199  666,6  730   8538,3  11870  15,85  16,26  -2,5&lt;br /&gt;SMJ Woods     156  617,6  729   9084,8  13795  18,20  18,92  -3,8&lt;br /&gt;WH Lockwood   156  561,1  618   7358,1  10067  16,22  16,29  -0,4&lt;br /&gt;JJ Ferris     163  402,0  616   5895,8  11155  18,14  18,11  +0,2&lt;br /&gt;CTB Turner    90   539,2  585   5830,0  7607   13,38  13,00  +2,9&lt;br /&gt;W Wright      149  498,9  577   6835,2  10637  16,95  18,44  -8,1&lt;br /&gt;EJ Tyler      95   274,2  522   4534,0  9947   20,45  19,06  +7,3&lt;br /&gt;JT Rawlin     116  431,2  487   6345,1  8806   18,20  18,08  +0,7&lt;br /&gt;FG Roberts    127  372,5  458   6046,0  9627   20,08  21,02  -4,5&lt;br /&gt;W Flowers     179  336,7  447   4739,3  8006   17,41  17,91  -2,8&lt;br /&gt;WA Humphreys  127  313,1  445   6196,0  9148   24,48  20,56  +19,1&lt;br /&gt;GH Hirst      107  353,5  418   4685,3  7171   16,40  17,16  -4,5&lt;br /&gt;A Hearne      172  358,5  399   5587,0  7641   19,28  19,15  +0,7&lt;br /&gt;FW Tate       103  328,8  362   5409,6  7836   20,35  21,65  -6,0&lt;br /&gt;FS Jackson    122  288,2  359   4412,3  6571   18,94  18,30  +3,5&lt;br /&gt;WG Grace      220  206,8  358   4062,1  8022   24,29  22,41  +8,4&lt;br /&gt;W Mead        53   254,7  351   3705,0  5605   17,99  15,97  +12,7&lt;br /&gt;FJ Shacklock  100  306,4  349   4053,6  6615   16,37  18,95  -13,6&lt;br /&gt;A Watson      87   328,4  332   3663,2  4928   13,80  14,84  -7,0&lt;br /&gt;JW Sharpe     75   312,1  321   3647,3  4922   14,45  15,33  -5,7&lt;br /&gt;AD Pougher    72   223,5  312   3279,3  5260   18,15  16,86  +7,7&lt;br /&gt;GA Davidson   74   273,6  309   3793,4  5241   17,15  16,96  +1,1&lt;/pre&gt;&lt;br /&gt;It's not spectacular, but it's pretty good considering the paucity of the data that went into the estimates.  Of the top 30 wicket-takers in the sample, only 6 have estimates of the bowling average wrong by more then 10%.  And while I've truncated the table at 30 entries here, the good estimates keep going for another 30odd players.  The first really wild estimate is for Stephen Whitehead, who took 121 wickets (in the dataset) at an actual average of 21,39, but at an estimated average of 14,95. &lt;br /&gt;&lt;br /&gt;It is unfortunate, though understandable, that three of those six entries with errors of over 10% are caused by the top four wicket-takers.  The model used for the estimates was based on overall averages, and we would not expect that the best bowlers would follow the same trends, in general.&lt;br /&gt;&lt;br /&gt;I repeated this exercise for a similarly-sized dataset containing matches from between 1877 and 1888.  The results were similar to those above &amp;mdash; again 6 errors of more than 10% in the top 30 players, including the third- and fourth-highest wicket-takers.  But further down the table the results are better, perhaps because the era in question is closer to that used to generate the parameters in the model.  The first wild estimate was for a bowler who took only 71 wickets.&lt;br /&gt;&lt;br /&gt;While I'm emphasising the uncertainties in the estimates for the top bowlers, the estimates are still pretty useful.  Suppose that you knew that a modern-day Test bowler had an average between 17 and 23 (that is, 20 plus or minus 15%).  He could be one of the greatest of all-time or merely very good.  But you know that he's at least very good, and he's not someone like Brett Lee, taking plenty of wickets (until recently), but at an average of 30.&lt;br /&gt;&lt;br /&gt;Now we're almost ready to do the estimates for the first half of the 19th century!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3512141311331474660?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3512141311331474660/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3512141311331474660' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3512141311331474660'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3512141311331474660'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html' title='1800&apos;s first-class cricket in England: filling in the gaps'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1713867002737497489</id><published>2008-01-26T18:48:00.002+01:00</published><updated>2008-02-17T19:40:47.581+01:00</updated><title type='text'>1800's first-class cricket in England: classification of matches</title><content type='html'>This is Part 2 of my series on first-class cricket in England in the 1800's.  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I think that if the match isn't played between two sides of eleven, then it is not first-class.  Unfortunately (for people who share this opinion of mine), this principle was not obeyed when drawing up the list of first-class matches that we have today.  There were 149 matches played in the 1800's, classified as first-class at CricketArchive, in which one or both teams had more than eleven men.&lt;br /&gt;&lt;br /&gt;While some people might want a little flexibility on the size of the teams (at least for the early days), surely no-one can seriously suggest that a match between &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/557.html"&gt;a Gentlemen XVIII and a Players XI&lt;/a&gt; should be classified as first-class, no matter how amusingly long the Gentlemen's batting card looks.&lt;br /&gt;&lt;br /&gt;Also on the first-class record are two Gentlemen XVII v Players XI matches (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/442.html"&gt;1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/444.html"&gt;2&lt;/a&gt;), seven matches of XVI v XI, three of XV v XI, eighteen of XIV v XI, eight of XIII v XI, three of XII v XI, and 107 twelve-a-side matches.&lt;br /&gt;&lt;br /&gt;There are also seven matches (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/493.html"&gt;1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/603.html"&gt;2&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/951.html"&gt;3&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/1/1913.html"&gt;4&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2424.html"&gt;5&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2425.html"&gt;6&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/3/3017.html"&gt;7&lt;/a&gt;) classified as first-class in which one team played with eleven men and one team with less.  Of these, three were odds games (one by Players against Gentlemen; two by the Australians in their 1880 tour), two were caused by player injuries, and two are unexplained by the CricketArchive scorecards.  The most amusing of these is the last one, Hampshire v Somerset in 1885.  The CricketArchive page simply says, "Somerset only brought nine men ...".  One of the Somerset players in that match was EW Bastard.  It is perhaps fortunate that India did not tour England during his brief first-class career.&lt;br /&gt;&lt;br /&gt;Since I don't believe that these any of these matches should count as first-class, I will ignore them for my statistics.&lt;br /&gt;&lt;br /&gt;Note that while first-class matches should be XI v XI, full substitutes are permitted.  These have always been pretty rare, but are still seen in modern times &amp;mdash; a full substitute is permitted when a player gets called up to or released from England duty during a county game.  The most recent example in Australia that I know of is Brad Williams, who was replaced by Ben Edmondson during a match in 2003/4.&lt;br /&gt;&lt;br /&gt;I do not, however, think that, in the absence of a particular player, another can bat twice.  This is what happened in &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/695.html"&gt;Hampshire v Nottingham&lt;/a&gt; in 1843.  One of the Notts players was injured, and so Francis Noyes was allowed to bat twice in each innings.  I will ignore this match for my records as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1713867002737497489?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1713867002737497489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1713867002737497489' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1713867002737497489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1713867002737497489'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html' title='1800&apos;s first-class cricket in England: classification of matches'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3663815860619803798</id><published>2008-01-25T09:09:00.000+01:00</published><updated>2008-01-31T22:48:59.722+01:00</updated><title type='text'>Openers and falls of wicket</title><content type='html'>&lt;a href="http://eye-on-cricket.blogspot.com/"&gt;Samir Chopra&lt;/a&gt; asked me a question about openers: what is the average wicket that they're dismissed at?  For example, suppose an opener is the first wicket to fall in one innings, the second in another, and the first again in a third innings.  His fow-average would be 1,33.  (I can't think of a better name for this; it's not really the fow, since that refers to the runs the team has scored when the wicket falls.)&lt;br /&gt;&lt;br /&gt;You'd expect that a player would do well on this statistic if they bat slowly or if they're a good batsman in a bad top order.&lt;br /&gt;&lt;br /&gt;There's a tricky question here about what to do with not-outs.  The way I treated them is as follows.&lt;br /&gt;&lt;br /&gt;Suppose the batsman was not out, with the team &lt;i&gt;n&lt;/i&gt; wickets down.  If he'd never been not out at so many wickets down, I assigned him &lt;i&gt;n&lt;/i&gt;+1 for that innings.  In particular, this means that an opener who carries his bat gets a "score" of 11.&lt;br /&gt;&lt;br /&gt;If the batsman had lasted longer than &lt;i&gt;n&lt;/i&gt; wickets, then I replaced the not-out with his fow-average for all the times he lasted longer.  An example:&lt;br /&gt;&lt;br /&gt;A batsman is dismissed at wickets: 1, 1, 3, 5, 9.&lt;br /&gt;A batsman is not out with the team have lost: 3, 6 wickets.&lt;br /&gt;&lt;br /&gt;The "6 not out" is replaced by a 9.  Now the two rows of data look like:&lt;br /&gt;FOW's: 1, 1, 3, 5, 9, 9&lt;br /&gt;nots-outs: 3&lt;br /&gt;&lt;br /&gt;The 3 is now replaced by (5 + 9 + 9)/3 = 7,67.&lt;br /&gt;&lt;br /&gt;So, the opener's fow-average is (1 + 1 + 3 + 5 + 7,67 + 9 + 9) / 7 = 5,1.&lt;br /&gt;&lt;br /&gt;Right!  With that out of the way, here are the openers with the highest fow-averages, the lowest, and some selected examples in between the two extremes.  Qualification of 15 innings.  (&lt;b&gt;Edit&lt;/b&gt;: The original version of this table had some errors.  These have been fixed.)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;name              inns  fow avg&lt;br /&gt;Russel Arnold     15    3,22&lt;br /&gt;Raman Subba Row   16    2,97&lt;br /&gt;Ravi Shastri      26    2,95&lt;br /&gt;Bill Woodfull     43    2,90&lt;br /&gt;Glenn Turner      66    2,87&lt;br /&gt;Bruce Mitchell    48    2,78&lt;br /&gt;Arthur Shrewsbury 18    2,78&lt;br /&gt;Jackie McGlew     58    2,74&lt;br /&gt;Dennis Amiss      69    2,66&lt;br /&gt;Chris Tavaré      33    2,63&lt;br /&gt;Jack Robertson    15    2,60&lt;br /&gt;Billy Zulch       28    2,57&lt;br /&gt;Geoff Boycott     188   2,56&lt;br /&gt;Desmond Haynes    191   2,54&lt;br /&gt;Alec Bannerman    46    2,53&lt;br /&gt;----&lt;br /&gt;John Wright       145   2,29&lt;br /&gt;Mark Taylor       186   2,27&lt;br /&gt;Mike Atherton     197   2,25&lt;br /&gt;Graham Gooch      184   2,18&lt;br /&gt;Matthew Hayden    164   2,18&lt;br /&gt;Herbert Sutcliffe 83    2,09&lt;br /&gt;Jack Hobbs        97    1,97&lt;br /&gt;Gordon Greenidge  183   1,94&lt;br /&gt;Justin Langer     113   1,89&lt;br /&gt;Michael Slater    131   1,85&lt;br /&gt;Trevor Franklin   37    1,68&lt;br /&gt;----&lt;br /&gt;JJ Lyons          16    1,50&lt;br /&gt;William Shalders  18    1,50&lt;br /&gt;George Ulyett     15    1,47&lt;br /&gt;Bob Catterall     18    1,44&lt;br /&gt;Mushtaw Ali       16    1,44&lt;br /&gt;Boeta Dippenaar   18    1,39&lt;br /&gt;Syed Abid Ali     21    1,38&lt;br /&gt;Bruce Pairaudeau  16    1,38&lt;br /&gt;Alan Turner       26    1,35&lt;br /&gt;Saleem Elahi      19    1,21&lt;/pre&gt;&lt;br /&gt;I would have set the qualification at 20 innings, but I think that Russel Arnold deserves a moment in the sun.  He started his Test career as an opener, and really did nothing wrong.  Indeed, he averages over 50 as an opener (where he scored all three of his Test centuries), compared to under 30 overall.  He carried his bat once in a low-scoring draw against Zimbabwe.  But those muppets headed by a joker decided that Atapattu was a better opener instead.  And he did all right, of course, six Test double-centuries.&lt;br /&gt;&lt;br /&gt;Anyway, make what you will of the list above.  It's a bit of a mixed bag.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3663815860619803798?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3663815860619803798/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3663815860619803798' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3663815860619803798'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3663815860619803798'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/openers-and-falls-of-wicket.html' title='Openers and falls of wicket'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3719938563859424049</id><published>2008-01-24T10:27:00.001+01:00</published><updated>2008-02-17T19:40:28.795+01:00</updated><title type='text'>1800's first-class cricket in England: the data</title><content type='html'>This is Part 1 in a series of posts analysing first-class cricket in England in the 1800's.  The long-term goal is to compare first-class cricketers (in England) from all eras.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html"&gt;1 - data&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_26.html"&gt;2 - classification of matches&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_28.html"&gt;3 - filling in the gaps&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england_29.html"&gt;4 - bowlers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england.html"&gt;5 - batsmen&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_05.html"&gt;6 - bowlers across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_09.html"&gt;7 - batsmen across eras&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_12.html"&gt;8 - all-rounders (across eras)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://pappubahry.blogspot.com/2008/02/1800s-first-class-cricket-in-england_17.html"&gt;9 - wicket-keepers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But before we can start calculating averages and so forth, we run into the problem of missing data.  The &lt;a href="http://www.cricketarchive.co.uk/"&gt;CricketArchive&lt;/a&gt; website has the most comprehensive scorecard database on the Internet, but there are some gaps, of varying importance.&lt;br /&gt;&lt;br /&gt;- One match (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/471.html"&gt;Kent v Sussex, 1829&lt;/a&gt;) has only a result &amp;mdash; no record of which individuals played, what they scored, or even what the teams scored.&lt;br /&gt;&lt;br /&gt;- Four matches (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/354.html"&gt;1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/370.html"&gt;2&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/383.html"&gt;3&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/438.html"&gt;4&lt;/a&gt;) contain only team scores, and no individual player details.  The last three of these scorecards involve only Cambridge teams.&lt;br /&gt;&lt;br /&gt;- Four matches (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/537.html"&gt;1&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/761.html"&gt;2&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/936.html"&gt;3&lt;/a&gt;, &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/1/1073.html"&gt;4&lt;/a&gt;) lack the names of players who did not bat.  The second of these matches was a Gentlemen v Players game (from 1845).&lt;br /&gt;&lt;br /&gt;- There is one further match, as late as 1877 (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2154.html"&gt;here&lt;/a&gt;), in which one player who batted is unknown.  It is known that the player was a full replacement, and that he scored 7 not out, but who he was is a mystery.&lt;br /&gt;&lt;br /&gt;- One match (&lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/368.html"&gt;here&lt;/a&gt;) does not contain the dismissals in the fourth innings.&lt;br /&gt;&lt;br /&gt;While these gaps are mildly annoying, their overall effect is not serious &amp;mdash; they are only 11 matches out of almost 4500 that were played in England in the 1800's.&lt;br /&gt;&lt;br /&gt;More serious are gaps resulting from changes in scoring style.  This concerns only the bowlers &amp;mdash; the batting scores are complete, apart from the examples listed above.&lt;br /&gt;&lt;br /&gt;The most serious problem is that, for a long time, catches were credited to the fieldsman but not to the bowler.  Only bowled dismissals counted towards a bowler's wicket tally.  The earliest match where bowlers did get credit for catches was &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/550.html"&gt;in 1836&lt;/a&gt;, and it was only from the 1838 season that it became common practice.  It was not always the case, however.  Even &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/0/807.html"&gt;in 1847&lt;/a&gt; there was a match where bowlers did not get credit for catches.&lt;br /&gt;&lt;br /&gt;Making calculation of bowling averages even more difficult is that runs conceded by bowlers were not regularly recorded until about 1854.  For the next decade or so, about 8% of matches contain gaps of this sort.  After 1867, these scores are almost always recorded, but there is still a trickle of gaps, with the last gaps appearing in a match &lt;a href="http://www.cricketarchive.co.uk/Archive/Scorecards/2/2647.html"&gt;in 1882&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Recording the number of overs bowled follows a very similar pattern to that of runs conceded, but there are 50 matches, mostly from the early 1840's, in which overs bowled were recorded but not runs conceded.&lt;br /&gt;&lt;br /&gt;The plan, then, is to try to fill in the gaps with estimates.  I'll start by making estimates of wickets taken, and then do likewise for runs conceded.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-3719938563859424049?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/3719938563859424049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=3719938563859424049' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3719938563859424049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/3719938563859424049'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/1800s-first-class-cricket-in-england.html' title='1800&apos;s first-class cricket in England: the data'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-1951791163479716494</id><published>2008-01-17T22:10:00.001+01:00</published><updated>2008-01-17T22:10:55.882+01:00</updated><title type='text'>No no-balls?</title><content type='html'>Talking about the poor over-rates in the current Australia-India Test, Sambit Bal &lt;a href="http://content-www.cricinfo.com/india/content/story/331427.html"&gt;suggests&lt;/a&gt; run penalties (a move I strongly disagree with), giving as justification: "See how no-balls have become scarce in Twenty20 after they introduced the free hit."&lt;br /&gt;&lt;br /&gt;There have been 50 T20I's, and in these matches, the average rate of no-balls has been 2,45 per 300 balls.  In all ODI's, the average rate is 2,94 per 300 balls.  So it does appear that the threat of a free-hit is causing at least some bowlers to stop pushing the popping crease.  (Note that those figures aren't just front-foot no-balls, but also include illegal bouncers and so on.)&lt;br /&gt;&lt;br /&gt;A more detailed look at no-ball rates in ODI's is revealing, however.  Here is a graph showing a 49-match moving average no-ball rate (per 300 balls).&lt;br /&gt;&lt;br /&gt;&lt;img src="http://i44.photobucket.com/albums/f35/pappubahry/cricket/odinoballsmovingavg.png"&gt;&lt;br /&gt;&lt;br /&gt;(Every match classified by the ICC as an ODI is included, even the silly Asia v Africa games, etc.  Some of the spike around February 2007 is caused by the associate nations, whose bowlers lacked some front-foot discipline in their lead-up tournaments to the World Cup.)&lt;br /&gt;&lt;br /&gt;A dramatic dip started about a month before the World Cup, and now we're at the lowest level of no-balling in ODI history &amp;mdash; it's lower than the rate in T20I's.  Is it just a random blip that will right itself in the next year or two, or is it something else?  I'd like to think that, as bowlers started becoming more conservative with the position of their front feet (from playing T20 matches), they decided that any small advantage gained from getting really close to the popping crease is outweighed by the risk of a no-ball.&lt;br /&gt;&lt;br /&gt;In Test matches, the effect is not so dramatic, but we do seem to be close to a minimum for the front-foot no-ball era.&lt;br /&gt;&lt;br /&gt;Something to keep an eye on, anyway.  It could just be a blip.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-1951791163479716494?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/1951791163479716494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=1951791163479716494' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1951791163479716494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/1951791163479716494'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/no-no-balls.html' title='No no-balls?'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://i44.photobucket.com/albums/f35/pappubahry/cricket/th_odinoballsmovingavg.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-2312819027636704908</id><published>2008-01-13T10:26:00.000+01:00</published><updated>2008-01-13T10:31:01.521+01:00</updated><title type='text'>Opening partnerships, and a Kiwi record</title><content type='html'>This entry is inspired from a line from &lt;i&gt;The Best of the Best&lt;/i&gt;.  On Hobbs and Sutcliffe, Charles Davis writes that, "[e]ach was a great batsman in his own right, but even that is not quite enough to account for their performances together".  &lt;br /&gt;&lt;br /&gt;Given the individual averages of two openers, how much &lt;i&gt;would&lt;/i&gt; we expect their partnerships to average?  And which opening pairs do the "most better" that you would expect?&lt;br /&gt;&lt;br /&gt;To answer these questions, I took all opening pairs who opened the batting at least 15 times together.  I ordered each pair so that the first had the lower average of the two (so that, in the tables and equations below, avg1 is the lower individual average, and avg2 is the higher).  Common sense suggests that the average partnership should be more determined by the lower individual average, since that batsman is more likely to get out first.&lt;br /&gt;&lt;br /&gt;Note that I've used individual averages &lt;i&gt;as openers&lt;/i&gt; when doing this analysis.&lt;br /&gt;&lt;br /&gt;I then threw the data into gretl, an econometrics program.  Since there are two independent variables (one for each opening batsman), I can't easily make a pretty graph.  You'll just have to cope with equations and tables.  Here is some of the output:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Modèle 1: Estimation en MCO avec 97 observations 1-97&lt;br /&gt;Variable dépendante: avg_part&lt;br /&gt;&lt;br /&gt;      VARIABLE       COEFFICIENT        ERR. STD         T           p. critique&lt;br /&gt;  const                -7,60135          5,30561      -1,433   0,15526&lt;br /&gt;  avg1                  0,484575         0,144117      3,362   0,00112 ***&lt;br /&gt;  avg2                  0,766951         0,157437      4,871  &lt;0,00001 ***&lt;br /&gt;&lt;br /&gt;  Moyenne de la variable dépendante = 41,6219&lt;br /&gt;  Écart-type de la var. dép. = 12,6076&lt;br /&gt;  Somme des carrés des résidus = 7632,06&lt;br /&gt;  Erreur standard des résidus = 9,01067&lt;br /&gt;  R2 non-ajusté = 0,499844&lt;/pre&gt;&lt;br /&gt;You'll note that my computer is French.  The word &lt;i&gt;moyenne&lt;/i&gt; is 'mean', &lt;i&gt;écart-type&lt;/i&gt; is 'standard deviation', and the other words are close to their English counterparts.  If you don't know what they mean, that is not important.  &lt;br /&gt;&lt;br /&gt;The table tells us that, "on average", we expect that the average opening partnership (avg_part) should obey the following equation:&lt;br /&gt;&lt;br /&gt;avg_part = 0,484575*avg1 + 0,766951*avg2 - 7,60135.&lt;br /&gt;&lt;br /&gt;The R&lt;sup&gt;2&lt;/sup&gt; value says that roughly half of the variance in the data-set is explained by this model.&lt;br /&gt;&lt;br /&gt;Obviously the equation isn't valid everywhere &amp;mdash; if both openers average zero, you would not expect them to score negative runs!  But roughly 47 of the 97 opening pairs in the sample do better than the equation, and 50 do worse, so it appears to be pretty much "in the middle".&lt;br /&gt;&lt;br /&gt;It is surprising (to me, at least) that the co-efficient of avg2 is so much higher than that of avg1.  This says that it is the opener with the &lt;i&gt;higher&lt;/i&gt; average who more determines the size of the average partnership.  I'm at a bit of a loss to explain this.  Perhaps openers with lower averages have lower strike rates (so while they don't score as many runs, they don't get out first)?&lt;br /&gt;&lt;br /&gt;Now we get onto the pairs who do better than they should.  In the following table, I've given the individual averages-as-openers, the runs scored together, the number of partnerships, 'obs' the observed average partnership, 'exp' the expected average partnership based on the equation above, and the ratio of the observed to expected.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;opener1         opener2       avg1    avg2    runs  inns  obs     exp     ratio&lt;br /&gt;T Franklin      J Wright      23,00   38,12   1543  28    55,11   32,78   1,68&lt;br /&gt;Javed Omar      Nafees Iqbal  22,08   25,60   665   19    35,00   22,73   1,54&lt;br /&gt;P Roy           V Mankad      31,71   40,74   868   16    57,87   39,01   1,48&lt;br /&gt;J Stollmeyer    A Rae         41,94   46,18   1349  21    71,00   48,14   1,47&lt;br /&gt;B Murray        G Dowling     23,92   31,55   786   20    39,30   28,19   1,39&lt;br /&gt;C Cowdrey       G Pullar      42,42   43,84   906   15    64,71   46,58   1,39&lt;br /&gt;C McDonald      A Morris      39,40   45,69   949   15    63,27   46,53   1,36&lt;br /&gt;J Hobbs         H Sutcliffe   56,37   61,11   3249  38    87,81   66,58   1,32&lt;br /&gt;Imran Farhat    Taufeeq Umar  33,10   39,30   754   15    50,27   38,58   1,30&lt;br /&gt;Sadiq Mohammad  Majid Khan    34,93   42,23   1391  26    53,50   41,71   1,28&lt;/pre&gt;&lt;br /&gt;And it's a Kiwi pair who finish first!  I suppose that if you analyse enough Test data, you'll eventually find New Zealand coming first in something.&lt;br /&gt;&lt;br /&gt;My guess that openers with lower averages score slower certainly applies to Trevor Franklin, who is the fourth-slowest batsman of all-time according to &lt;a href="http://www.sportstats.com.au/hotscore.html"&gt;Davis's list&lt;/a&gt; (qual. 1000 runs or 2000 balls faced; average over 20).  &lt;br /&gt;&lt;br /&gt;It may just be coincidence that some opening pairs do well in that table &amp;mdash; perhaps they both had a good run of innings while batting together, or maybe they batted against weaker teams (I haven't tried adjusting for strength of bowling attack).  But it may also be that they bring out the best in each other.  Or, as Davis suggests in the case of Hobbs and Sutcliffe, that they held a psychological edge over their opponents when together.&lt;br /&gt;&lt;br /&gt;When &lt;a href="http://historyofcricket.blogspot.com/"&gt;Stuart&lt;/a&gt; sees the other end of the table, he will be happy to see Graeme Wood coming dead last.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;opener1         opener2       avg1    avg2    runs  inns  obs     exp     ratio&lt;br /&gt;M Elliott       M Taylor      35,32   43,50   721   23    31,35   42,88   0,73&lt;br /&gt;M Dekker        G Flower      15,86   29,30   357   22    16,23   22,56   0,72&lt;br /&gt;B Woodfull      B Ponsford    50,90   54,18   860   22    40,95   58,62   0,70&lt;br /&gt;G Gooch         T Robinson    43,88   44,97   621   19    32,68   48,15   0,68&lt;br /&gt;R Simpson       L Hutton      25,92   56,48   477   15    31,80   48,28   0,66&lt;br /&gt;E McMorris      C Hunte       26,86   45,07   548   21    26,10   39,98   0,65&lt;br /&gt;B Pocock        B Young       22,93   32,13   378   21    18,00   28,15   0,64&lt;br /&gt;Wasim Jaffer    V Sehwag      35,82   51,29   619   21    29,48   49,09   0,60&lt;br /&gt;Hannan Sarkar   Javed Omar    20,66   22,08   207   18    11,50   19,34   0,59&lt;br /&gt;A Hilditch      G Wood        31,56   33,61   354   18    19,67   33,47   0,59&lt;/pre&gt;&lt;br /&gt;It is also interesting that Javed Omar comes both second and second-last.  Mark Dekker has easily the worst average of any opening batsman who's opened the innings 15 times.&lt;br /&gt;&lt;br /&gt;For all the hugging, Langer and Hayden did slightly worse than would be expected, with a ratio of 0,92.  Their average opening partnership of 52,08 is quite good (22nd on the list), but they each have excellent individual opening averages (48,94 and 52,66).  The Langer/Hayden and Boycott/Amiss pairs are the only ones to have an average partnership of over 50 and a ratio below 1.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Figures are based on Tests 1 to 1858, that is up to the first Test between New Zealand and Bangladesh.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22713811-2312819027636704908?l=pappubahry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pappubahry.blogspot.com/feeds/2312819027636704908/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22713811&amp;postID=2312819027636704908' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2312819027636704908'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22713811/posts/default/2312819027636704908'/><link rel='alternate' type='text/html' href='http://pappubahry.blogspot.com/2008/01/opening-partnerships-and-kiwi-record.html' title='Opening partnerships, and a Kiwi record'/><author><name>David Barry</name><uri>http://www.blogger.com/profile/08378763233797445502</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://bp1.blogger.com/_gAHUdUeTIGI/SIk8T23_97I/AAAAAAAAAAM/9HpLeUPKexA/S220/monsteravatar'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22713811.post-3140257903993539747</id><published>2008-01-08T20:44:00.000+01:00</published><updated>2008-01-08T22:46:39.634+01:00</updated><title type='text'>Bradman v Gretzky v Orr</title><content type='html'>Yesterday I read Charles Davis' book &lt;i&gt;The Best of the Best&lt;/i&gt;.  Overall this is an excellent statistical study of cricket and cricketers through Test history.  But here I want to talk about one of the later chapters, in which he compares Don Bradman to greats from other sports.&lt;br /&gt;&lt;br /&gt;The technique used to compare players across sports is to find a suitable quantity to measure for each player, so that the resulting distribution for all players becomes a bell curve, at least in the high tail.  From this, you can compute each player's z-score (z = (x-µ)/&amp;sigma;, where µ is the mean, and &amp;sigma; is the standard deviation), which is directly comparable across different sports.&lt;br /&gt;&lt;br /&gt;Davis' analysis of cricketers gives Bradman a z-score of 5.0 when considering batsmen only, and 4.4 when combining batting, bowling, and fielding.  Keep in mind the batsmen-only score here, because that will be a fairer comparison to the ice hockey players.  It's worth pointing out, for those unfamiliar with statistics, that a z-score of 5 is truly phenomenal &amp;mdash; only one player in almost 3.5 million should be that good compared to all other players of the sport.  That's 3.5 million Test cricketers, in this case, not 3.5 million members of the general public.  There have only been about 2500 Test cricketers, so for Bradman to have existed makes us very lucky.&lt;br /&gt;&lt;br /&gt;Davis' analysis of other sports was not as detailed as for cricket, but the results are reasonably persuasive.  Pele is the closest to Bradman, with a z-score of 3.7 for goals per international game.  Ty Cobb's baseball batting average turns into a z-score of 3.6.  Though these numbers might not look so far away from Bradman's 4.4 or 5, you have to remember that larger z-scores become much, much rarer &amp;mdash; Pele's 3.7 makes him a 1 in 14000 player.&lt;br /&gt;&lt;br /&gt;Unfortunately, Davis neglected ice hockey, even as a major international sport.  If cricket is to be counted as an international sport, then so should ice hockey.  Most international cricket is sustained by relatively small population bases.  Ice hockey's international reach is similar to cricket's.  Wikipedia tells me th
