tag:blogger.com,1999:blog-22713811.post3567331187314343779..comments2023-05-18T10:02:56.564+02:00Comments on Pappus' plane - cricket stats: Accuracy of averagesDavid Barryhttp://www.blogger.com/profile/08378763233797445502noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-22713811.post-45016591512968053332008-07-13T15:58:00.000+02:002008-07-13T15:58:00.000+02:00True enough. It's a fascinating area though - I'v...True enough. It's a fascinating area though - I've loosely considered the problems of using the career average as a proxy for talent before, particularly as averages are increasing whilst there is no reason to believe that talent is. I'm tempted to try and create some kind of latent variable model to see if anything interesting comes from it!sdrogershttps://www.blogger.com/profile/17363641724780714690noreply@blogger.comtag:blogger.com,1999:blog-22713811.post-49166074384643582922008-07-13T13:43:00.000+02:002008-07-13T13:43:00.000+02:00Also note that when I work with the actual batting...Also note that when I work with the actual batting scores (not the exponential simulations), I split the innings into evens and odds. So even if there are significant changes in true talent, they'll be spread pretty much equally into the two buckets.David Barryhttps://www.blogger.com/profile/08378763233797445502noreply@blogger.comtag:blogger.com,1999:blog-22713811.post-13539855076829255032008-07-13T13:21:00.000+02:002008-07-13T13:21:00.000+02:00Thanks Simon. There have been a couple of studies...Thanks Simon. There have been a couple of studies into the notion of batting 'form', and at best, it appears to be a very weak effect. The career average is a much more accurate predictor of the next innings than an average over some recent period.<BR/><BR/>I might have another look at the problem. When I last did so, I took the last five or ten dismissals. Perhaps there might be a slightly larger effect if you take a year.<BR/><BR/>My suspicion is that you wouldn't see much though.David Barryhttps://www.blogger.com/profile/08378763233797445502noreply@blogger.comtag:blogger.com,1999:blog-22713811.post-66246139151749693472008-07-13T13:10:00.000+02:002008-07-13T13:10:00.000+02:00Hi Dave,A very interesting and enjoyable blog. Ju...Hi Dave,<BR/>A very interesting and enjoyable blog. Just a quick comment on the analysis of averages. Is there not also an issue of dependence here? Your exponential simulations assume that innings are independent whereas in reality there ought to be some conditioning on previous innings (i.e. form/confidence) - take Paul Collingwood in the last 6 months or so. Might incorporating some kind of smoothing reduce this 10,000 innings figure?<BR/>Simonsdrogershttps://www.blogger.com/profile/17363641724780714690noreply@blogger.comtag:blogger.com,1999:blog-22713811.post-21451555087528635022008-06-26T17:15:00.000+02:002008-06-26T17:15:00.000+02:00i think my brain nearly exploded then loli think my brain nearly exploded then lolAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-22713811.post-842629729128967862008-06-25T06:29:00.000+02:002008-06-25T06:29:00.000+02:00Dave, the central limit theorem applies to means o...Dave, the central limit theorem applies to means of normally distributed data. If you remove the average term it is essentially a percentile uncertainty, given the number of samples (in this case, for 100 innings, of 9%). Hence, the uncertainty is linear with respect to a player's average.<BR/><BR/>My thoughts yesterday were that players who averaged less (say, 25), didn't really have half the uncertainty of someone who averaged more (say, 50). That is, I was thinking that the uncertainty percentile would decline as averages increased. That is, it would be <I>sqrt(average)</I> rather than <I>average</I>. Today I am not so sure, and if anything it might go the other way - higher averages are much more dependent on a few big scores (ie. luck). Eye-balling the graph doesn't really say either way, so I'm content with what you've done.<BR/><BR/>Incidentally, exponential distributions have their own uncertainty calculators for <A HREF="http://en.wikipedia.org/wiki/Exponential_distribution" REL="nofollow">maximum likelihood</A>. Using a <A HREF="http://www.stat.tamu.edu/~west/applets/chisqdemo.html" REL="nofollow">chi-squared generator function</A> they seem to give similar figures to what you are getting (Border: +/- 6-7%).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-22713811.post-78998683782534662092008-06-23T13:30:00.000+02:002008-06-23T13:30:00.000+02:00Thanks for that Russ. I did indeed exclude tail-e...Thanks for that Russ. <BR/><BR/>I did indeed exclude tail-enders out of habit, though it wouldn't have occurred to me that putting them in would improve the R-squareds. Anyway, they look much better now - R^2 = 0.74 or so.<BR/><BR/><I>The odd and even average will be reflections of each other around the actual average.</I><BR/>Aha! That makes things somewhat more palatable. I re-did it with comparing the odd average to the overall average and got k = 0.9, which is pretty close to 1.7/2.<BR/><BR/><I>What values do you get with a variation of k * sqrt( average ) / sqrt( number innings )</I><BR/>I'm not really sure what you're getting at here. If I look for a k so that the uncertainties are at 68%, then I get k = 4.8. The uncertainties tend to be about a run higher for long careers and a run lower for short careers. (Hussey, with such a high average, has an uncertainty about 3.5 runs lower, now +/- 6.1.)<BR/><BR/>But I don't understand where you get the sqrt(avg) from. From the central limit theorem, we expect that the standard deviation of the averages is roughly the standard deviation of the distribution the scores are coming from, divided by sqrt(inns). <BR/><BR/>The underlying distribution for each batsman is something close to an exponential distribution, skewed towards zero. The standard deviation is typically a bit over the mean.<BR/><BR/>I don't see where a sqrt(avg) would come in.David Barryhttps://www.blogger.com/profile/08378763233797445502noreply@blogger.comtag:blogger.com,1999:blog-22713811.post-36135845855665740382008-06-23T04:17:00.000+02:002008-06-23T04:17:00.000+02:00Dave, a few quick comments - too busy to think at ...Dave, a few quick comments - too busy to think at length on this right now. <BR/><BR/>Your r-value is mostly poor because you made a window of the averages (effectively 30-55) which is somewhat similar to the error. Technically, the graph should be y = x. Adding tail-enders would fix that. I am not sure why you excluded them - habit?<BR/><BR/>The odd and even average will be reflections of each other around the actual average. Hence, a k-value error of 1.7 is double the width of the variation from actual average to odd/even average. Or, to put it another way, correct to ~95% not ~68%.<BR/><BR/>Lastly, because averages are already aggregates, I don't think the variation will increase linearly. This matters a lot for estimating Hussey's average, even if it is much of a muchness for Waugh or Richards. What values do you get with a variation of k * sqrt( average ) / sqrt( number innings )Anonymousnoreply@blogger.com