Three implications of mathematics on the World Series

Sabermetrics provides useful advice for modern data science

« Back to Blog

Many of my friends who live in the American Midwest are captivated by the Major League Baseball World Series, not only for the drama of two teams with a rich history and many years of waiting for this opportunity, but also because it provides a welcome respite from news about our presidential election. While the Oakland Athletics, the baseball team for whom I root, will be waiting for a while before our next World Series appearance, I’ll share my enthusiasm for the Indians and the Cubs fans, and one of my favorite sports.

I remember, as a college student, being enthralled with reading the box scores every morning in the San Francisco Chronicle, watching the ERA of the pitching squad, and monitoring the amazing batting average of Carney Lansford, the A’s third baseman at the time. This experience of reading the box scores, and how they changed day to day, influenced the way I watched the game. I was thrilled when Michael Lewis published his book “Moneyball” in 2003, focused on the emerging role of analytics and how it shaped the strategy for the very same Oakland A’s that I loved and rooted for.

For those who don’t remember the book, or the movie treatment of the book, “Moneyball” is all about the Oakland A’s of the 1980’s and 1990’s, how they competed with other American League teams like the Yankees and Red Sox who had remarkably better-funded player budgets, and how their general manager, Billy Beane, changed the fabric of recruiting and on-field strategy for many teams throughout the league. I remember vividly picking up the book at LaGuardia airport while heading to a long, long series of coast-to-coast flights to get back home one Friday evening. It was so great that I finished most of the book before I landed in Portland early the next morning.

Many years after college, and not long before Lewis published “Moneyball”, I had the opportunity in 1997 to write a paper on the use of exploratory data analysis with baseball player data. That paper has since been referenced in multiple scientific and scholarly data analytics articles, which to me was a pleasant surprise. I think the appeal of that paper (http://www2.sas.com/proceedings/sugi24/Infovis/p160-24.pdf) was that I was tackling a problem that was familiar to many people, using some sophisticated tools in ways that were simple and non-mystifying.

In short, my paper focused on how a team could recruit players that exhibited great performance on the field, but were significantly underpaid compared to their peers as measured by performance measures. I learned to my surprise that fielding statistics have relatively little to do with player salaries, when compared with the predictive contribution of career batting performance, single-season batting performance, and batting average correlation with salary. Even more important to me as an A’s fan was the identification that both Jose Canseco and Mark McGwire (members of the Oakland A’s 1988, 1989 and 1990 World Series teams, among other standout performances) were among the younger players in that dataset that the model predicted as attractive recruiting targets.

One of my favorite stories from “Moneyball” is Billy Beane’s realization that a team at bat, that never gets out, will score an infinite number of runs. While it seems all too obvious on the surface, it actually contains some deep-rooted theory about batting and base running strategy. On-base percentage is more important for winning than would have been projected by player salaries alone. This fundamentally influenced the A’s decisions about bunting, stealing bases, and recruiting batters who could slap singles versus those that swing for the fences. Simple: get on base, however you do it, whether by hitting singles, taking a walk, as long as you do it in a way that doesn’t lead to proportionally higher incidence of getting out. The team changed the way they played on the field to maximize run production while reducing the number of outs that were driven by batting and base running strategies that incurred proportionally larger risks of getting an out.

A really compelling take-away from my read of Moneyball were the cultural changes required of the team at large, which includes the scouts that recruit new talent, as well as the coaches and players on the field. The scouts in particular were firmly entrenched in the long-held belief that the fans and the data scientists “…don’t know what we know. They don’t have our experience and they don’t have our intuition.” In the movie, Billy Beane, portrayed by Brad Pitt, replied with the simple response of “Adapt or die.” Beane delivered a sobering assessment to his scouts of their forecasts of player performance that contrasted starkly with the conventional wisdom, as they saw it.  There are implications for the changes required in baseball team recruiting and on-field strategy for many enterprises, especially those grappling with the challenges of making more evidence-driven decisions about customer strategy.

In the book, Beane and Paul dePodesta, his assistant general manager, also left me with another interesting insight: making the team more successful isn’t about using the greatest algorithm; it’s all about using the right data. The equations they used weren’t incredibly sophisticated: squared runs scored, divided by the sum of squared runs scored + squared runs scored against us, is highly correlated with the team’s win percentage. That’s an easy one! The challenge is about measuring the things that matter, even if nobody else is measuring those things. Measuring the number of times that a specific batter advances the runner without drawing a sacrifice fly or getting out with a put out at first, is more high-impact than simply advancing the runner. It doesn’t matter if you advance the runner if you also fly out with two outs, because then the side gets retired.

Finding the players that can perform in specific ways at specific parts of the game really matter, as does the manager’s ability to measure those events. Yet in business, how often do we even know what those measures are, that shift the balance of performance in our favor? I’d argue that’s where the real game should be played.


Robin Way

The Founder and President of Corios, Robin’s professional passion lies in democratizing and demystifying the science of applied analytics. An established thought leader fueled with 30 years’ experience in the design, development, execution and improvement of applied analytics models, Robin welcomes every opportunity to move the analytics conversation forward.

Connect with him on LinkedIn , or reach out to Corios to get in touch.