Saturday, March 01, 2014
Highest scores
Recently I got thinking about Flappy Bird scores, and noted that they should be similar to cricket scores in the way they're distributed.
Translating the problem to cricket, the question is, "What is the relationship between a batsman's highest score, his average, and the number of innings he's batted in?" A follow-up question is how well such a formula predicts, say, the average based on the innings and highest score. I expected there to be quite a lot of scatter (assessing a batsman on just one innings!), but the correlation ended up being pretty decent. This isn't a particularly useful correlation, but I found it fun to play with.
Given the average, the probability that a batsman makes a score less than x is roughly (1 - exp(-x/avg)). Treating each innings as independent of the others, the probability that all N of a batsman's innings are less than x is therefore (1 - exp(-x/avg))N.
A natural "expected" highest score, given the batsman's average, would be the value x such that the probability that all innings are less than x is 1/2. Calling this value HS, we have
(1 - exp(-HS/avg))N = 1/2,
which is the basic relation between highest score, average, and number of innings batted.
Solving for HS (i.e., what highest score do we expect, given average and innings batted?):
HS = -avg * ln(1 - 0.51/N).
Solving for avg (given only highest score and innings batted, what do we think the batsman's average is?):
avg = -HS / ln(1 - 0.51/N).
Using the latter formula as a predictor works surprisingly (to me) well:
The weird qualification of 50 dismissals rather than innings is because I began by including all abtsmen with at least 2 dismissals, and at sample sizes this small, I figured dismissals would be useful to avoid not-out irregularities. (The R-squared is actually a bit higher, about 0.78, in that expanded sample.)
I doubt I'll ever use this formula, but Perhaps the most amusing individual result is Shane Warne: based on his average and innings batted, his "predicted" high score is... 98! So he didn't quite deserve that hundred after all.
Full table, straight off Statsguru – I was so lazy that I didn't even turn off the ICC game.
predict one based real on the other Player Inns NO HS avg HS avg DG Bradman (Aus) 80 10 334 99.9 475.0 70.3 H Sutcliffe (Eng) 84 9 194 60.7 291.6 40.4 KF Barrington (Eng) 131 15 256 58.7 307.7 48.8 ED Weekes (WI) 81 5 207 58.6 279.3 43.4 WR Hammond (Eng) 140 16 336 58.5 310.4 63.3 KC Sangakkara (SL) 209 17 319 58.1 331.6 55.9 GS Sobers (WI) 160 21 365 57.8 314.5 67.0 JB Hobbs (Eng) 102 7 211 56.9 284.4 42.2 CL Walcott (WI) 74 7 220 56.7 265.0 47.1 L Hutton (Eng) 138 15 364 56.7 300.1 68.7 JH Kallis (ICC/SA) 280 40 224 55.4 332.4 37.3 GS Chappell (Aus) 151 19 247 53.9 290.1 45.9 AD Nourse (SA) 62 7 231 53.8 242.1 51.3 SR Tendulkar (India) 329 33 248 53.8 331.5 40.2 BC Lara (ICC/WI) 232 6 400 52.9 307.5 68.8 Javed Miandad (Pak) 189 21 280 52.6 294.9 49.9 R Dravid (ICC/India) 286 32 270 52.3 315.1 44.8 Mohammad Yousuf (Pak) 156 12 223 52.3 283.3 41.2 AB de Villiers (SA) 152 16 278 52.3 281.9 51.6 S Chanderpaul (WI) 261 45 203 51.9 308.1 34.2 RT Ponting (Aus) 287 29 257 51.9 312.5 42.6 HM Amla (SA) 130 11 311 51.6 270.0 59.4 A Flower (Zim) 112 19 232 51.5 262.2 45.6 MEK Hussey (Aus) 137 16 195 51.5 272.5 36.9 Younis Khan (Pak) 158 14 313 51.4 279.1 57.6 SM Gavaskar (India) 214 16 236 51.1 293.1 41.2 SR Waugh (Aus) 260 46 200 51.1 302.7 33.7 MJ Clarke (Aus) 178 19 329 50.8 282.0 59.3 ML Hayden (Aus) 184 14 380 50.7 283.2 68.1 AR Border (Aus) 265 44 205 50.6 300.7 34.5 DPMD Jayawardene (SL) 240 15 374 50.3 294.2 63.9 IVA Richards (WI) 182 12 291 50.2 279.9 52.2 DCS Compton (Eng) 131 15 278 50.1 262.5 53.0 Inzamam-ul-Haq (ICC/Pak) 200 22 329 49.6 281.1 58.1 FMM Worrell (WI) 87 9 261 49.5 239.3 54.0 V Sehwag (ICC/India) 180 6 319 49.3 274.4 57.4 B Mitchell (SA) 80 9 189 48.9 232.3 39.8 TT Samaraweera (SL) 132 20 231 48.8 256.1 44.0 Misbah-ul-Haq (Pak) 80 14 161 48.8 231.7 33.9 GC Smith (ICC/SA) 203 13 277 48.7 276.8 48.8 RN Harvey (Aus) 137 10 205 48.4 256.0 38.8 KD Walters (Aus) 125 14 250 48.3 250.8 48.1 SJ McCabe (Aus) 62 5 232 48.2 216.9 51.6 ER Dexter (Eng) 102 8 205 47.9 239.2 41.0 G Boycott (Eng) 193 23 246 47.7 268.7 43.7 VS Hazare (India) 52 6 164 47.7 206.1 37.9 EH Hendren (Eng) 83 9 205 47.6 228.1 42.8 AC Gilchrist (Aus) 137 20 204 47.6 251.8 38.6 SM Nurse (WI) 54 1 258 47.6 207.6 59.1 RB Kanhai (WI) 137 6 256 47.5 251.4 48.4 KP Pietersen (Eng) 181 8 227 47.3 263.2 40.8 WM Lawry (Aus) 123 12 210 47.2 244.3 40.5 LRPL Taylor (NZ) 98 9 217 46.9 232.6 43.8 RB Simpson (Aus) 111 7 311 46.8 237.8 61.2 PBH May (Eng) 106 9 285 46.8 235.4 56.6 CH Lloyd (WI) 175 14 242 46.7 258.2 43.7 AL Hassett (Aus) 69 3 198 46.6 214.4 43.0 DM Jones (Aus) 89 11 216 46.6 226.2 44.5 AN Cook (Eng) 183 10 294 46.5 259.4 52.7 AR Morris (Aus) 79 3 206 46.5 220.3 43.5 IJL Trott (Eng) 87 6 226 46.5 224.7 46.7 DR Martyn (Aus) 109 14 165 46.4 234.7 32.6 DL Amiss (Eng) 88 10 262 46.3 224.5 54.0 AD Mathews (SL) 62 12 157 46.2 207.7 34.9 M Leyland (Eng) 65 5 187 46.1 209.4 41.1 WM Woodfull (Aus) 54 4 161 46.0 200.6 36.9 VVS Laxman (India) 225 34 281 46.0 265.9 48.6 EJ Barlow (SA) 57 2 201 45.7 202.0 45.5 NC O'Neill (Aus) 69 8 181 45.6 209.8 39.3 Saeed Anwar (Pak) 91 2 188 45.5 222.2 38.5 IR Bell (Eng) 170 22 235 45.4 250.0 42.7 MD Crowe (NZ) 131 11 299 45.4 237.9 57.0 JL Langer (Aus) 182 12 250 45.3 252.3 44.9 G Kirsten (SA) 176 15 275 45.3 250.7 49.6 CC Hunte (WI) 78 6 260 45.1 213.0 55.0 SM Katich (Aus) 99 6 157 45.0 223.6 31.6 M Azharuddin (India) 147 9 199 45.0 241.3 37.1 Zaheer Abbas (Pak) 124 11 274 44.8 232.4 52.8 MH Richardson (NZ) 65 3 145 44.8 203.5 31.9 CG Greenidge (WI) 185 16 226 44.7 249.9 40.4 GP Thorpe (Eng) 179 28 200 44.7 248.1 36.0 GM Turner (NZ) 73 6 259 44.6 208.1 55.6 AI Kallicharran (WI) 109 10 187 44.4 224.9 36.9 RB Richardson (WI) 146 12 194 44.4 237.6 36.2 TW Graveney (Eng) 123 13 258 44.4 230.0 49.8 Shoaib Mohammad (Pak) 68 7 203 44.3 203.6 44.2 AH Jones (NZ) 74 8 186 44.3 207.0 39.8 DI Gower (Eng) 204 18 215 44.3 251.6 37.8 DJ Cullinan (SA) 115 12 275 44.2 226.1 53.8 G Gambhir (India) 96 5 206 44.2 218.0 41.7 MC Cowdrey (Eng) 188 15 182 44.1 246.9 32.5 Hanif Mohammad (Pak) 97 8 337 44.0 217.5 68.2 ME Trescothick (Eng) 143 10 219 43.8 233.5 41.1 Saleem Malik (Pak) 154 22 237 43.7 236.2 43.8 RA Smith (Eng) 112 15 175 43.7 222.2 34.4 EAB Rowan (SA) 50 5 236 43.7 187.1 55.1 DC Boon (Aus) 190 20 200 43.7 245.1 35.6 JH Edrich (Eng) 127 9 310 43.5 227.0 59.5 MA Taylor (Aus) 186 13 334 43.5 243.3 59.7 IR Redpath (Aus) 120 11 171 43.5 224.1 33.2 BF Butcher (WI) 78 6 209 43.1 203.8 44.2 PA de Silva (SL) 159 11 267 43.0 233.7 49.1 DA Warner (Aus) 54 3 180 42.9 187.0 41.3 HP Tillakaratne (SL) 131 25 204 42.9 224.8 38.9 MJ Slater (Aus) 131 7 219 42.8 224.6 41.8 C Washbrook (Eng) 66 6 195 42.8 195.3 42.7 GA Gooch (Eng) 215 6 333 42.6 244.4 58.0 M Amarnath (India) 113 10 138 42.5 216.6 27.1 RC Fredericks (WI) 109 7 169 42.5 215.0 33.4 IM Chappell (Aus) 136 10 196 42.4 224.1 37.1 JB Stollmeyer (WI) 56 5 160 42.3 186.2 36.4 DL Haynes (WI) 202 25 184 42.3 240.1 32.4 PR Umrigar (India) 94 8 223 42.2 207.4 45.4 SC Ganguly (India) 188 17 239 42.2 236.4 42.6 DB Vengsarkar (India) 185 22 166 42.1 235.5 29.7 NS Sidhu (India) 78 2 201 42.1 199.2 42.5 DJ McGlew (SA) 64 6 255 42.1 190.6 56.3 CH Gayle (WI) 174 9 333 42.0 232.2 60.2 HH Gibbs (SA) 154 7 228 42.0 226.8 42.2 GR Viswanath (India) 155 10 222 41.9 226.9 41.0 ME Waugh (Aus) 209 17 153 41.8 238.8 26.8 CG Macartney (Aus) 55 4 170 41.8 183.0 38.8 AG Prince (SA) 104 16 162 41.6 208.8 32.3 MP Vaughan (Eng) 147 9 197 41.4 222.1 36.8 JC Adams (WI) 90 17 208 41.3 200.9 42.7 GN Yallop (Aus) 70 3 268 41.1 190.0 58.0 GRJ Matthews (Aus) 53 8 130 41.1 178.4 29.9 KC Wessels (Aus/SA) 71 3 179 41.0 190.0 38.6 TM Dilshan (SL) 145 11 193 41.0 219.1 36.1 AJ Strauss (Eng) 178 6 177 40.9 227.1 31.9 PH Parfitt (Eng) 52 6 131 40.9 176.9 30.3 MJ Prior (Eng) 116 20 131 40.8 209.2 25.6 HW Taylor (SA) 76 4 176 40.8 191.7 37.4 LEG Ames (Eng) 72 12 149 40.6 188.5 32.1 PD Collingwood (Eng) 115 10 206 40.6 207.4 40.3 W Bardsley (Aus) 66 5 193 40.5 184.6 42.3 AW Greig (Eng) 93 4 148 40.4 198.2 30.2 Saeed Ahmed (Pak) 78 4 172 40.4 191.0 36.4 B Sutcliffe (NZ) 76 8 230 40.1 188.5 48.9 ST Jayasuriya (SL) 188 14 340 40.1 224.6 60.7 BL D'Oliveira (Eng) 70 8 158 40.1 185.1 34.2 SP Fleming (NZ) 189 10 274 40.1 224.7 48.8 RR Sarwan (WI) 154 8 291 40.0 216.3 53.8 WJ Edrich (Eng) 63 2 219 40.0 180.6 48.5 KWR Fletcher (Eng) 96 14 216 39.9 196.9 43.8 HA Gomes (WI) 91 11 143 39.6 193.4 29.3 AJ Stewart (Eng) 235 21 190 39.5 230.4 32.6 BM McMillan (SA) 62 12 113 39.4 177.1 25.1 CC McDonald (Aus) 83 4 170 39.3 188.3 35.5 DN Sardesai (India) 55 4 212 39.2 171.8 48.4 C Hill (Aus) 89 2 191 39.2 190.5 39.3 Mushtaq Mohammad (Pak) 100 7 201 39.2 194.9 40.4 Azhar Ali (Pak) 60 4 157 39.1 174.8 35.1 VL Manjrekar (India) 92 10 189 39.1 191.4 38.6 VT Trumper (Aus) 89 8 214 39.0 189.7 44.0 MS Atapattu (SL) 156 15 249 39.0 211.4 46.0 AP Gurusinha (SL) 70 7 143 38.9 179.8 31.0 Majid Khan (Pak) 106 5 167 38.9 195.9 33.2 Asif Iqbal (Pak) 99 7 175 38.9 192.9 35.2 MS Dhoni (India) 130 15 224 38.8 203.0 42.8 Taufeeq Umar (Pak) 81 5 236 38.7 184.5 49.5 WW Armstrong (Aus) 84 10 159 38.7 185.7 33.1 CD McMillan (NZ) 91 10 142 38.5 187.7 29.1 PJP Burge (Aus) 68 8 181 38.2 175.2 39.4 Mudassar Nazar (Pak) 116 8 231 38.1 195.1 45.1 BB McCullum (NZ) 145 8 302 38.1 203.6 56.5 Shakib Al Hasan (Ban) 65 5 144 38.0 172.6 31.7 JG Wright (NZ) 148 7 185 37.8 202.9 34.5 Imran Khan (Pak) 126 25 136 37.7 196.2 26.1 MA Atherton (Eng) 212 7 185 37.7 215.8 32.3 Ijaz Ahmed (Pak) 92 4 211 37.7 184.3 43.1 JV Coney (NZ) 85 14 174 37.6 180.8 36.2 PE Richardson (Eng) 56 1 126 37.5 164.8 28.6 KR Stackpole (Aus) 80 5 207 37.4 177.9 43.6 KJ Hughes (Aus) 124 6 213 37.4 194.1 41.0 ND McKenzie (SA) 94 7 226 37.4 183.7 46.0 AN Petersen (SA) 52 3 182 37.3 161.3 42.1 N Hussain (Eng) 171 16 207 37.2 204.9 37.6 SV Manjrekar (India) 61 6 218 37.1 166.5 48.6 Mohsin Khan (Pak) 79 6 200 37.1 175.9 42.2 NJ Astle (NZ) 137 10 222 37.0 195.8 42.0 KR Miller (Aus) 87 7 147 37.0 178.8 30.4 Tamim Iqbal (Ban) 62 0 151 36.6 164.6 33.6 CL Hooper (WI) 173 15 233 36.5 201.3 42.2 WJ Cronje (SA) 111 9 135 36.4 184.9 26.6 KS Williamson (NZ) 56 2 135 36.4 160.0 30.7 SR Watson (Aus) 95 3 176 36.3 178.9 35.7 JDP Oram (NZ) 59 10 133 36.3 161.6 29.9 Wasim Raja (Pak) 92 14 125 36.2 176.9 25.6 AJ Lamb (Eng) 139 10 142 36.1 191.4 26.8 FE Woolley (Eng) 98 7 154 36.1 178.7 31.1 Sadiq Mohammad (Pak) 74 2 166 35.8 167.4 35.5 AL Logie (WI) 78 9 130 35.8 169.2 27.5 RJ Shastri (India) 121 14 206 35.8 184.9 39.9 A Ranatunga (SL) 155 12 135 35.7 193.2 24.9 JN Rhodes (SA) 80 9 117 35.7 169.5 24.6 CG Borde (India) 97 11 177 35.6 176.0 35.8 MW Gatting (Eng) 138 14 207 35.6 188.3 39.1 MN Samuels (WI) 90 6 260 35.5 172.9 53.4 BJ Haddin (Aus) 94 9 169 35.5 174.4 34.4 JA Rudolph (SA) 83 9 222 35.4 169.7 46.4 Aamer Sohail (Pak) 83 3 205 35.3 169.0 42.8 GM Ritchie (Aus) 53 5 146 35.2 152.9 33.6 MAK Pataudi (India) 83 3 203 34.9 167.2 42.4 JP Crawley (Eng) 61 9 156 34.6 155.2 34.8 MA Butcher (Eng) 131 7 173 34.6 181.3 33.0 TL Goddard (SA) 78 5 112 34.5 162.9 23.7 TW Hayward (Eng) 60 2 137 34.5 153.9 30.7 W Jaffer (India) 58 1 212 34.1 151.2 47.8 GS Blewett (Aus) 79 4 214 34.0 161.3 45.1 Mohammad Hafeez (Pak) 70 6 196 34.0 156.9 42.4 WR Endean (SA) 52 4 162 34.0 146.8 37.5 Yuvraj Singh (India) 62 6 169 33.9 152.6 37.6 AP Sheahan (Aus) 53 6 127 33.9 147.3 29.2 AC MacLaren (Eng) 61 4 140 33.9 151.8 31.2 IT Botham (Eng) 161 6 208 33.5 182.8 38.2 CL Cairns (NZ) 104 5 158 33.5 168.1 31.5 KD Mackay (Aus) 52 7 89 33.5 144.8 20.6 Yashpal Sharma (India) 59 11 140 33.5 148.8 31.5 Shoaib Malik (Pak) 54 6 148 33.5 145.9 33.9 AC Hudson (SA) 63 3 163 33.5 151.0 36.1 DW Randall (Eng) 79 5 174 33.4 158.2 36.7 JR Reid (NZ) 108 5 142 33.3 168.1 28.1 GR Marsh (Aus) 93 7 138 33.2 162.7 28.1 WW Hinds (WI) 80 1 213 33.0 156.9 44.8 L Klusener (SA) 69 11 174 32.9 151.3 37.8 APE Knott (Eng) 149 15 135 32.8 176.0 25.1 M Prabhakar (India) 58 9 120 32.7 144.7 27.1 NT Paranavitana (SL) 60 5 111 32.6 145.5 24.9 P Roy (India) 79 4 173 32.6 154.3 36.5 CJ Tavare (Eng) 56 2 149 32.5 142.9 33.9 GP Howarth (NZ) 83 5 147 32.4 155.4 30.7 Mushfiqur Rahim (Ban) 71 4 200 32.4 150.3 43.2 SL Campbell (WI) 93 4 208 32.4 158.8 42.4 SM Pollock (SA) 156 39 111 32.3 175.1 20.5 BE Congdon (NZ) 114 7 176 32.2 164.5 34.5 JM Parks (Eng) 68 7 108 32.2 147.7 23.5 MS Sinclair (NZ) 56 5 214 32.1 141.0 48.7 Imran Farhat (Pak) 77 2 128 32.0 150.9 27.1 PJL Dujon (WI) 115 11 139 31.9 163.4 27.2 Rameez Raja (Pak) 94 5 122 31.8 156.4 24.8 GM Wood (Aus) 112 6 172 31.8 162.0 33.8 BA Young (NZ) 68 4 267 31.8 145.9 58.2 A Flintoff (Eng/ICC) 130 9 167 31.8 166.4 31.9 RES Wyatt (Eng) 64 6 149 31.7 143.6 32.9 MJK Smith (Eng) 78 6 121 31.6 149.5 25.6 NJ Contractor (India) 52 1 108 31.6 136.6 25.0 CPS Chauhan (India) 68 2 97 31.6 144.9 21.1 MH Mankad (India) 72 5 231 31.5 146.3 49.7 DJ Bravo (WI) 71 1 113 31.4 145.6 24.4 GA Hick (Eng) 114 6 178 31.3 159.9 34.9 HAPW Jayawardene (SL) 77 11 154 31.2 147.2 32.7 MG Burgess (NZ) 92 6 119 31.2 152.6 24.3 GT Dowling (NZ) 77 3 239 31.2 146.9 50.7 FM Engineer (India) 87 3 121 31.1 150.3 25.0 AL Wadekar (India) 71 3 143 31.1 144.0 30.9 N Kapil Dev (India) 184 15 163 31.1 173.4 29.2 Habibul Bashar (Ban) 99 1 113 30.9 153.3 22.8 Kamran Akmal (Pak) 92 6 158 30.8 150.6 32.3 JT Tyldesley (Eng) 55 1 138 30.8 134.7 31.5 KLT Arthurton (WI) 50 5 157 30.7 131.6 36.6 ML Jaisimha (India) 71 4 129 30.7 142.2 27.8 MJ Greatbatch (NZ) 71 5 146 30.6 141.9 31.5 BA Edgar (NZ) 68 4 161 30.6 140.4 35.1 Salman Butt (Pak) 62 0 122 30.5 137.0 27.1 JHB Waite (SA) 86 7 134 30.4 146.9 27.8 T Taibu (Zim) 54 3 153 30.3 132.2 35.1 MV Boucher (ICC/SA) 206 24 125 30.3 172.6 21.9 RA McLean (SA) 73 3 142 30.3 141.2 30.5 MA Noble (Aus) 73 7 133 30.3 141.0 28.5 BF Hastings (NZ) 56 6 117 30.2 132.8 26.6 W Rhodes (Eng) 98 21 179 30.2 149.6 36.1 HH Dippenaar (SA) 62 5 177 30.1 135.6 39.3 DL Vettori (ICC/NZ) 173 23 140 30.1 166.2 25.4 AD Gaekwad (India) 70 4 201 30.1 138.9 43.5 K Srikkanth (India) 72 3 123 29.9 138.9 26.5 AW Nourse (SA) 83 8 111 29.8 142.6 23.2 TE Bailey (Eng) 91 14 134 29.7 145.2 27.5 MJ Guptill (NZ) 59 1 189 29.6 131.8 42.5 GW Flower (Zim) 123 6 201 29.5 153.1 38.8 GJ Whittall (Zim) 82 7 203 29.4 140.6 42.5 Imtiaz Ahmed (Pak) 72 1 209 29.3 136.1 45.0 RS Mahanama (SL) 89 1 225 29.3 142.2 46.3 Rashid Latif (Pak) 57 9 150 28.8 127.0 34.0 Abdul Razzaq (Pak) 77 9 134 28.6 134.9 28.4 J Darling (Aus) 60 2 178 28.6 127.6 39.9 Moin Khan (Pak) 104 8 137 28.6 143.2 27.3 KG Viljoen (SA) 50 2 124 28.4 121.8 28.9 MJ Horne (NZ) 65 2 157 28.4 129.0 34.5 RD Jacobs (WI) 112 21 118 28.3 144.0 23.2 RP Arnold (SL) 69 4 123 28.0 129.0 26.7 IA Healy (Aus) 182 23 161 27.4 152.6 28.9 MR Ramprakash (Eng) 92 6 154 27.3 133.7 31.5 D Ramdin (WI) 95 13 166 27.3 134.2 33.7 ADR Campbell (Zim) 109 4 103 27.2 137.7 20.4 Sir RJ Hadlee (NZ) 134 19 151 27.2 143.1 28.7 RC Russell (Eng) 86 16 128 27.1 130.8 26.5 KR Rutherford (NZ) 99 8 107 27.1 134.5 21.6 SMH Kirmani (India) 124 22 102 27.0 140.3 19.7 SV Carlisle (Zim) 66 6 118 26.9 122.7 25.9 H Masakadza (Zim) 50 2 119 26.9 115.3 27.8 P Willey (Eng) 50 6 102 26.9 115.3 23.8 J Dyson (Aus) 58 7 127 26.6 118.1 28.6 PR Reiffel (Aus) 50 14 79 26.5 113.7 18.4 RW Marsh (Aus) 150 13 132 26.5 142.6 24.5 AC Parore (NZ) 128 19 110 26.3 137.2 21.1 JJ Crowe (NZ) 65 4 128 26.2 119.3 28.2 RS Kaluwitharana (SL) 78 4 132 26.1 123.5 27.9 G Miller (Eng) 51 4 98 25.8 111.1 22.8 D Ganga (WI) 86 2 135 25.7 124.0 28.0 KS More (India) 64 14 73 25.7 116.4 16.1 RG Nadkarni (India) 67 12 122 25.7 117.6 26.7 IDS Smith (NZ) 88 17 173 25.6 123.9 35.7 DA Allen (Eng) 51 15 88 25.5 109.8 20.4 MW Tate (Eng) 52 5 100 25.5 110.2 23.1 N Boje (SA) 62 10 85 25.2 113.5 18.9 SA Durani (India) 50 2 104 25.0 107.3 24.3 DS Smith (WI) 58 2 108 24.7 109.5 24.4 AK Davidson (Aus) 61 7 80 24.6 110.2 17.8 GS Ramchand (India) 53 5 109 24.6 106.8 25.1 JM Parker (NZ) 63 2 121 24.6 110.8 26.8 SE Gregory (Aus) 100 7 201 24.5 122.0 40.4 C White (Eng) 50 7 121 24.5 104.8 28.2 R Benaud (Aus) 97 7 122 24.5 120.9 24.7 V Pollard (NZ) 59 7 116 24.3 108.3 26.1 WPUJC Vaas (SL) 162 35 100 24.3 132.7 18.3 DJ Richardson (SA) 64 8 109 24.3 109.9 24.1 SCJ Broad (Eng) 95 12 169 24.2 119.2 34.3 SC Williams (WI) 52 3 128 24.1 104.4 29.6 NR Mongia (India) 68 8 152 24.0 110.3 33.1 Mohammad Ashraful (Ban) 119 5 190 24.0 123.6 36.9 GO Jones (Eng) 53 4 100 23.9 103.8 23.0 G Giffen (Aus) 53 0 161 23.4 101.4 37.1 R Illingworth (Eng) 90 11 113 23.2 113.2 23.2 AC Bannerman (Aus) 50 2 94 23.1 98.9 21.9 CC Lewis (Eng) 51 3 117 23.0 99.1 27.2 DL Murray (WI) 96 9 91 22.9 113.0 18.4 JM Brearley (Eng) 66 3 91 22.9 104.4 19.9 DD Ebrahim (Zim) 55 1 94 22.7 99.3 21.5 WAS Oldfield (Aus) 80 17 65 22.7 107.7 13.7 S Madan Lal (India) 62 16 74 22.7 101.9 16.4 Wasim Akram (Pak) 147 19 257 22.6 121.3 48.0 JE Emburey (Eng) 96 20 75 22.5 111.2 15.2 MG Johnson (Aus) 87 14 123 22.4 108.4 25.4 CB Wishart (Zim) 50 1 114 22.4 96.0 26.6 HH Streak (Zim) 107 18 127 22.4 112.7 25.2 FJ Titmus (Eng) 76 11 84 22.3 104.8 17.9 Intikhab Alam (Pak) 77 10 138 22.3 105.0 29.3 GP Swann (Eng) 76 14 85 22.1 103.9 18.1 Javed Omar (Ban) 80 2 119 22.1 104.8 25.0 DJG Sammy (WI) 63 2 106 21.7 97.9 23.5 KJ Wadsworth (NZ) 51 4 80 21.5 92.5 18.6 KD Ghavri (India) 57 14 86 21.2 93.7 19.5 RR Lindwall (Aus) 84 13 118 21.2 101.6 24.6 AF Giles (Eng) 81 13 59 20.9 99.5 12.4 DN Patel (NZ) 66 8 99 20.7 94.3 21.7 AFA Lilley (Eng) 52 8 84 20.5 88.7 19.4 TG Evans (Eng) 133 14 104 20.5 107.8 19.8 JG Bracewell (NZ) 60 11 110 20.4 91.2 24.6 BR Taylor (NZ) 50 6 124 20.4 87.4 28.9 S Abid Ali (India) 53 3 81 20.4 88.4 18.6 B Lee (Aus) 90 18 64 20.2 98.1 13.1 H Trumble (Aus) 57 14 70 19.8 87.4 15.9 HDPK Dharmasena (SL) 51 7 62 19.7 84.9 14.4 B Yardley (Aus) 54 4 74 19.6 85.3 17.0 Khaled Mashud (Ban) 84 10 103 19.0 91.4 21.5 TG Southee (NZ) 50 5 77 19.0 81.4 18.0 MD Marshall (WI) 107 11 92 18.9 95.1 18.2 JN Gillespie (Aus) 93 28 201 18.7 91.8 41.0 Mohammad Rafique (Ban) 63 6 111 18.6 83.8 24.6 IWG Johnson (Aus) 66 12 77 18.5 84.4 16.9 Harbhajan Singh (India) 142 22 115 18.4 97.7 21.6 J Briggs (Eng) 50 5 121 18.1 77.6 28.2 DG Cork (Eng) 56 8 59 18.0 79.2 13.4 A Kumble (India) 173 32 110 17.8 98.1 19.9 Sarfraz Nawaz (Pak) 72 13 90 17.7 82.3 19.4 PH Edmonds (Eng) 65 15 64 17.5 79.6 14.1 SK Warne (Aus) 199 17 99 17.3 98.1 17.5 JJ Kelly (Aus) 56 17 46 17.0 74.9 10.5 HJ Tayfield (SA) 60 9 75 16.9 75.5 16.8 MG Hughes (Aus) 70 8 72 16.6 76.9 15.6 Nasim-ul-Ghani (Pak) 50 5 101 16.6 71.1 23.6 BL Cairns (NZ) 65 8 64 16.3 74.0 14.1 RW Taylor (Eng) 83 12 97 16.3 78.0 20.3 GF Lawson (Aus) 68 12 74 16.0 73.3 16.1 Wasim Bari (Pak) 112 26 85 15.9 80.8 16.7 WW Hall (WI) 66 14 50 15.7 71.8 11.0 JM Blackham (Aus) 62 11 74 15.7 70.5 16.4 Abdul Qadir (Pak) 77 11 61 15.6 73.5 12.9 DR Pringle (Eng) 50 4 63 15.1 64.7 14.7 ATW Grout (Aus) 67 8 74 15.1 69.0 16.2 AME Roberts (WI) 62 11 68 14.9 67.2 15.1 CM Old (Eng) 66 9 65 14.8 67.6 14.2 PAJ DeFreitas (Eng) 68 5 88 14.8 68.0 19.2 SB Doull (NZ) 50 11 46 14.6 62.6 10.7 Saqlain Mushtaq (Pak) 78 14 101 14.5 68.5 21.4 RO Collinge (NZ) 50 13 68 14.4 61.7 15.9 DW Steyn (SA) 89 21 76 14.3 69.2 15.6 PM Siddle (Aus) 76 11 51 14.2 67.0 10.8 J Srinath (India) 92 21 76 14.2 69.5 15.5 VA Holder (WI) 59 11 42 14.2 63.2 9.4 Fazal Mahmood (Pak) 50 6 60 14.1 60.4 14.0 JC Laker (Eng) 63 15 63 14.1 63.6 14.0 CV Grimmett (Aus) 50 10 50 13.9 59.7 11.7 FS Trueman (Eng) 85 14 39 13.8 66.5 8.1 MA Holding (WI) 76 10 73 13.8 64.8 15.5 GAR Lock (Eng) 63 9 89 13.7 62.0 19.7 DK Lillee (Aus) 90 24 73 13.7 66.8 15.0 JA Snow (Eng) 71 14 73 13.5 62.7 15.8 GR Dilley (Eng) 58 19 56 13.4 59.2 12.6 Iqbal Qasim (Pak) 57 15 56 13.1 57.7 12.7 HMRKB Herath (SL) 70 15 80 13.0 59.9 17.3 Mashrafe Mortaza (Ban) 67 5 79 12.9 58.8 17.3 JR Thomson (Aus) 73 20 49 12.8 59.7 10.5 AV Bedser (Eng) 71 15 79 12.8 59.1 17.0 D Gough (Eng) 86 18 65 12.6 60.6 13.5 J Garner (WI) 68 14 60 12.4 57.1 13.1 CEL Ambrose (WI) 145 29 53 12.4 66.3 9.9 M Morkel (SA) 65 10 40 12.3 55.8 8.8 GD McKenzie (Aus) 89 12 76 12.3 59.6 15.6 CJ McDermott (Aus) 90 13 42 12.2 59.4 8.6 IR Bishop (WI) 63 11 48 12.2 54.9 10.6 Z Khan (India) 127 24 75 12.0 62.3 14.4 SJ Harmison (Eng/ICC) 86 23 49 11.8 56.9 10.2 Mushtaq Ahmed (Pak) 72 16 59 11.7 54.4 12.7 S Venkataraghavan (India) 76 12 64 11.7 54.9 13.6 M Muralitharan (ICC/SL) 164 56 67 11.7 63.8 12.3 AA Mallett (Aus) 50 13 43 11.6 49.8 10.0 Mohammad Sami (Pak) 56 14 49 11.6 51.0 11.1 DL Underwood (Eng) 116 35 45 11.6 59.2 8.8 RC Motz (NZ) 56 3 60 11.5 50.8 13.6 RGD Willis (Eng) 128 55 28 11.5 60.0 5.4 EAS Prasanna (India) 84 20 37 11.5 55.1 7.7 JB Statham (Eng) 87 28 38 11.4 55.3 7.9 AA Donald (SA) 94 33 37 10.7 52.5 7.5 MS Kasprowicz (Aus) 54 12 25 10.6 46.2 5.7 AR Caddick (Eng) 95 12 49 10.4 51.1 10.0 JM Anderson (Eng) 127 47 34 10.4 54.0 6.5 Waqar Younis (Pak) 120 21 45 10.2 52.6 8.7 Shahadat Hossain (Ban) 65 17 40 10.2 46.3 8.8 Shoaib Akhtar (Pak) 67 13 47 10.1 46.1 10.3 Umar Gul (Pak) 67 9 65 9.9 45.5 14.2 M Ntini (SA) 116 45 32 9.8 50.4 6.2 RM Hogg (Aus) 58 13 52 9.8 43.2 11.7 GP Wickramasinghe (SL) 64 5 51 9.4 42.6 11.3 I Sharma (India) 81 28 31 9.2 43.9 6.5 PR Adams (SA) 55 15 35 9.0 39.4 8.0 BS Bedi (India) 101 28 50 9.0 44.8 10.0 EJ Chatfield (NZ) 54 33 21 8.6 37.4 4.8 M Dillon (WI) 68 3 43 8.4 38.7 9.4 DK Morrison (NZ) 71 26 42 8.4 39.0 9.1 S Ramadhin (WI) 58 14 44 8.2 36.3 9.9 CD Collymore (WI) 52 27 16 7.9 34.1 3.7 DBL Powell (WI) 57 5 36 7.8 34.5 8.2 CA Walsh (WI) 185 61 30 7.5 42.1 5.4 ARC Fraser (Eng) 67 15 32 7.5 34.1 7.0 GD McGrath (Aus) 138 51 61 7.4 39.0 11.5 MJ Hoggard (Eng) 92 27 38 7.3 35.6 7.8 Danish Kaneria (Pak) 84 33 29 7.1 33.9 6.0 LR Gibbs (WI) 109 39 25 7.0 35.3 4.9 FH Edwards (WI) 88 28 30 6.6 31.8 6.2 TM Alderman (Aus) 53 22 26 6.5 28.4 6.0 DE Malcolm (Eng) 58 19 29 6.1 26.8 6.5 PCR Tufnell (Eng) 59 29 22 5.1 22.7 4.9 MS Panesar (Eng) 68 23 26 4.9 22.4 5.7 AL Valentine (WI) 51 21 14 4.7 20.2 3.3 BS Chandrasekhar (India) 80 39 22 4.1 19.3 4.6 CS Martin (NZ) 104 52 12 2.4 11.8 2.4
Sorting the table by estimated high score minus actual high score we see the biggest "underachievers" are Bradman, Kallis, Chanderpaul, S Waugh, Sutcliffe, Border, M Waugh, Tendulkar, Armanath, Prior.
The fact that Bradman is on top shows a flaw of the exponential model: there are other batsman who get out and time constraints that make batting for long enough to score 475 difficult. Some of the other batsman on the list have excuses too: they are all-rounders or bat at number 5 etc. It does amuse me that S Waugh is seen as not going on with it from this data, whereas at the time it was M Waugh that was perceived as having this problem (maybe that was before Mark scored his 150?).
At the other end, the biggest overachiever? Wasim Akram.
I guess it would be better to compare the actual distributions of scores to the theoretical one based on average so you're not just using one datapoint to label a player as not-going-on-with-it but I think this is interesting anyway.
If I calculate (estHS-HS)/HS the new list of underachievers is Willis, Collymore, Kasprowicz, Chatfield, Trueman, Giles, Oldfield, Kelly, Mackay, Prior. Also M Waugh is a bigger underachiever than S Waugh by this measure.
Wasim Akram comes out on top at 0.0017, followed by Gillespie 0.0020. At the other end, Bob Willis has an absurdly high p-value of 0.999992, and Corey Collymore 0.9993. Probably they're hurt a bit by batting at number eleven and being left stranded so often.
(Everyone else is pretty sensible, though the numbers aren't a perfect fit: about 10% of the dataset are at p > 0.95).
If you look at his innings list on statsguru I think he has a high score of 56 runs between dismissals (for all-time records of this type see http://cricketarchive.com/Archive/Articles/1/1610.html although that's not very up to date) in 73 completed innings.
1-(1-exp(-56/11.5))^73 = 0.43, so his "high score" is about right.
I'm not willing to go through any more innings lists, so I think I'll leave this investigation here.
I did a practically identical exercise to this a few years ago, but never got around to writing it up. There was a companion piece about how you just need to know how many innings a bowler has bowled in and how many wickets he's taken overall to estimate, e.g., his 5WI count using the Poisson distribution. That never saw the light of day, either.
A few notes:
(1) A simple extension is to calculate the expected number of, e.g., 100s in a career of a given length and a given mean, which is
exp(-1/avg*100)*N
So, e.g., someone with Tendulkar's average should have scored 51.25 100s (he actually got 51). I get r2 = 0.946 for your 50+ dismissals dataset for this.
(1a) As an aside, you could, of course, try to predict the number of ducks you should expect using this method, but that draws attention to the inadequacy of the constant hazard assumption and consequent exponential approximation when it comes to the beginning of batsmen's innings (Tendulkar should have 6 ducks; he has 14; r2 = 0.616 across the dataset).
(2) I'm not convinced that using the batsman's average as the mean of your distribution does you any favours when you then compare your model with reality. That's because the average, by accounting for not-out innings, effectively estimates a batsman's expectation of runs in a world in which not-outs don't exist. So you end up modelling that world and, when you compare it to the actual record, the not-outs muck things up ever so slightly. I can think of some quite involved ways of getting around this problem, but that would take away the fun of the really simple model providing an impressively good approximation of reality. So a quick fix would be to use RPI rather than avg as your mean. If you do that, you get a slightly improved fit (r2 = 0.753), though you move from systematically slightly overestimating the HS to systematically slightly underestimating it.
(3) Pedantically, I think I take issue with the suggestion that batsmen who have a lower HS than would be expected from their average are underachievers; a narrower range of achievement for a given average suggests a slightly more consistent career and, since innings-to-innings consistency is weakly a good thing, I think I say the lower the HS the better.
The under/over-achiever terminology was Martin's, not mine....
I did wonder about using RPI - I think it was you who first pointed out to me that it's a better predictor of the next innings than the average? But I'm too lazy to change my ways when all it does is squeeze a couple of R-squared points out of everything. (I feel that there has to be a better predictor than RPI, though again it wouldn't make large changes....)
On ducks and centuries: I've written on this topic before! e.g., here, here (this one has the next-simplest hazard function that I would use to incorporate ducks properly), here.
Without having thought about it much, I think I'd be pretty agnostic about whether I prefer consistent/inconsistent batsmen. Gabe, is there a reason that you would prefer to have a consistent batsman? Does that somehow lead to more won/drawn games?
"I think a more elegant way to look at this is to consider extreme value theory. The maximum of a set of N exponential random variables can be shown to approach a Gumbel distribution for N large. The expected value of that distribution is similarly given, for N large, by (G + Ln(N))/A where A is the average of the distribution and G is Euler’s constant (about 0.58). This gives a relationship between the expected maximum and the average of the individual distribution which doesn’t rely on your arbitrary factor of ½. Note that your expression for the maximum is approximately ((ln 2+ Ln(N))/Afor large N. Since ln(2)= 0.69 this is close to the expression above so perhaps there is a deeper result here that I haven’t noticed.
All this obviously begs the question of whether there is a correlation between the max/mean ratio and the number of innings that these formulation suggest, but that is the subject of another post."
My response:
This is really interesting - I didn't know anything about limiting distributions of maximum values and had never heard of the Gumbel distribution.
(Clerical note: you should be multiplying by the mean of the exponential distribution, not dividing.)
What I am effectively doing with my (not actually arbitrary!) 1/2 is asking for the median of the almost-Gumbel distribution, which here is avg*(-ln(ln(2)) + ln(N)). I haven't proved that my original expression is approximately equal to this for large N, but it is clearly true when I empirically plot one against the other (differences of "predicted" high scores are less than half a run). I guess you lost a log somewhere in your algebra!
Anyway, the Gumbel distribution is skewed to the right, with the mean larger than the median, so your suggested method of using the mean of the Gumbel distribution results in higher "predicted" highest scores.
median: avg*(-ln(ln(2)) + ln(N))
mean: avg*(gamma + ln(N)).
I prefer using the median - I like having a number here that'll have around half of all batsmen below it and half above. (Actually only 45% of the batsmen in the dataset are above the predicted-by-Gumbel-median highest score; 38% are above the predicted-by-Gumbel-mean highest score.) But I can imagine people's tastes being different here.
My original post has
HS = -avg * ln(1 - 0.5^(1/N)).
Now, for N large, 1/N is small, and we can approximate 0.5^(1/N) by its Taylor series truncated after the linear term. I.e.,
0.5^(1/N) = exp((1/N)*ln(1/2))
~ 1 + (1/N)ln(1/2)
= 1 - ln(2) / N.
So,
-avg*ln(1 - 0.5^(1/N)) ~ -avg*ln(ln(2)/N)
= avg*[ln(N) - ln(ln(2))],
which is the median of the Gumbel distribution.
Subscribe to Post Comments [Atom]
<< Home
Subscribe to Posts [Atom]