## Learning from baseball: Pitchf/x

While we're on the subject of baseball, I thought I'd outline a simple idea used in baseball that would be useful and fun in cricket. In short: put Hawkeye data on the web for anyone to download.

In Major League Baseball, they have a system called Pitchf/x, which we can basically think of as Hawkeye. They don't have it at every game (only about a quarter, I think), but since there are over a thousand games a season, that's still a lot of pitching data. The raw data gets put on the MLB website, and you can download big pitch-by-pitch tables, with each pitch described by release point, start speed, end speed, break length, break angle, etc.

Classifying the pitch type can be difficult, but by using enough of the variables in the table, people who've studied this problem are getting reasonable results (for an introduction to it, see here). Here's an example, taken from this article on Jake Peavy by Pitchf/x'er Josh Kalk:

If you have a look at the linked article, you'll see other graphs, plotting different variables.

It's a gold mine for baseball analysis, and it would be the same in cricket. There are all sorts of things you could look at, at the level of an individual bowler, or looking at the characteristics of the ground — length of the ball, amount of swing, amount of turn, how much bounce there is in the pitch, etc.

To get it to work, we'd want something like the following recorded for each ball (it may be possible to make this more efficient with some knowledge of cricket ball physics, but this should give the idea):

bowler, batsman, age of ball, did ball hit bat?, number runs scored off the ball or type of wicket, and then x-, y-, z-components of position and velocity at: release point, just before pitching, just after pitching, contact with bat/batsman, crossing the stumps (projected if necessary).

I will happily plug the first broadcaster that puts this data on the web.

I'm often wondered if there's any content to the idea of late swing (or dip, for spinners). To get to that, you'd have to include all the flight data of course.

One thing that I guess even the baseball folks don't have is simply how much spin different bowlers/pitchers are able to put on the ball. That'd really open up spin bowling analysis. Perhaps, if the ball were marked in some way??

Actually, the pitchf/x guys have got spin covered. I should actually work through the physics of it (it shouldn't be too hard, I am supposed to be a physics student...) some time. Essentially, they take the initial and final velocities, and look at how far the ball moved vertically or horizontally. Now horizontal movement's easy to pick out, since if there's no spin on the baseball then it won't move laterally. But the vertical movement is nifty - backspin on a baseball works against gravity somewhat, so the ball doesn't dip as much as it would without spin. I think I read somewhere that a sinker is just a fast ball with less backspin.

Calculating the spin can actually help a lot with pitch classification sometimes.

Presumably we could do the same thing in cricket. Look at the release point and velocity, look at where it pitched and the angle it came in at, compare to a ball without spin.

I thought about talking about late swing, but I decided against it. Once again, you've homed in on the thing I omitted. According to this article on the physics of swing bowling, there's not actually much difference in the lateness of reverse swing with conventional swing.

Thanks for the link to the baseball data. Indeed, it would be interesting to analyze it and I wonder if any analysis has been done from a game-theoretic viewpoint. It has been done for tennis serving by Mark Walker and John Wooders and penalty kicks in football.

Basically, the idea is very simple. Let us say for simplicity that you can serve to the left or the right of the receiver in tennis. Clearly, you can't serve to one side all the time - otherwise, a good receiver will learn to anticipate you. In "Nash equilibrium" (assuming that both server and receiver are "rational"), it must be the case that the probability that the server wins the point by serving left equals the probability that the server wins the point by serving right. Otherwise, the server would gain by changing the mix. This is a hypothesis which one can test by looking at serving data. Walker and Wooders figured that in professional tournaments like Wimbledon where the stakes are high and the players highly skilled professionals, the assumptions underlying the game-theory model should more-or-less be true and so they looked at around 50 hours of Wimbledon tapes to see whether the hypothesis held. In short, it did though not totally. A similar type of analysis was done for penalty kicks in football.

With such rich data for baseball, one can, I guess do similar analysis. In particular, one can test whether the pitcher is mixing up his pitches in an optimal manner similar to a tennis server. If data for cricket becomes available (say, for T20), one can then work out whether the bowlers are behaving "optimally" in mixing up their deliveries. Of course, the analysis becomes more complicated because in cricket you can vary your length, line,flight, swing etc. Still it might be fun.

Just some thoughts...and really sorry for rambling on.

No worries on the long comment Suresh.

I don't think that anyone's got around to doing game theory on baseball pitching yet, but certainly some people expect it to happen. Pitchf/x'er Mike Fast wrote a list of questions for future research with Pitchf/x, and one of his questions about pitching was asking whether or not pitchers follow game theory.

Thanks for telling us about the game theory studies in tennis and football. I don't know much game theory (or any at all, to be honest), but I'll look them up later.