Sunday, June 29, 2008

Followup on accuracy of averages

Russ pointed out a couple of things in the previous post. For those who missed the comments thread, here are the revised formulas for calculating uncertainties.

Batting: 0.9 * average / sqrt(# innings)
Bowling: 0.9 * average / sqrt(# wickets)

So, e.g., Mike Hussey becomes 68.4 +/- 9.5. About 68% of 'true' averages will lie within the range given. You need to double it to get it up to 95%.

I haven't made much of an effort to work out the underlying distribution of Australian players that Hussey comes from. To get a rough idea of what should happen, I found the mean and standard deviation of averages of Australian batsmen at batting positions 1 through 7, over the last ten years. There's a bit of a problem about what to do with players who only played a couple of Tests and averaged (say) 5 — clearly they could have averaged up around 20 or 30 if given more opportunities.

Anyway, I bumped those guys up to 20, and the result was something like mean 42, standard deviation 12. So, carrying on with the Hussey example, we crunch the numbers like this:

regressed average = (68.4/9.52 + 42 / 122) / (1/9.52 + 1/122)

uncertainty = 1 / sqrt(1/9.52 + 1/122)

to estimate Hussey's 'true' average as about 58 +/- 7.

Let's just hope that he can score runs in India.

hey db.
just stumbled onto your page.
interesting stuff, although hardcore statistics aren't my strong point.

i am a phd student in machine learning (mainly bioinformatics) at griffith uni in brisbane.
i'm quite good at MATLAB programming and am willing to offer my services, if you need them.
for one thing, the graphs are prettier than excel.
i'm interested in data mining and statistical learning, too, so that might help.

and i love cricket.

panicslowly at hotmail
Post a Comment

Subscribe to Post Comments [Atom]

<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]