Research: Following the Ball - Modeling the Juiced Ball in MLB

This article was written alongside Josh Calianos, Jeffrey Qiao, and Brian Tomasco at Dartmouth College

Abstract

Over the past few years, Major League Baseball has experienced a massive increase in home runs, with multiple records falling during that time period. Some of this can be attributed to an emphasis on higher launch angles and a focus on exit velocity, but the significant nature of this sudden power surge has led many to believe that something else may be at play. In particular, some have theorized that MLB has changed the baseball in order to make it fly farther, and that the increase in home runs can be attributed to this “juiced ball.” In this article, we quantitatively test this theory in two ways. First, we look at two separate difference-in-difference analyses to see how the home run rates have changed in different leagues over time after a change in the ball. Second, we look at the observed wOBA on fly balls over the past five seasons and compare it to the expected outcomes of those hits to see if a significant difference has appeared in recent years. Through this analysis, we find discontinuities in the data that cannot be explained by other theories, implying that a difference in the ball itself may be a part of the recent increase in home runs.

Introduction

Immediately after assuming office, MLB Commissioner Rob Manfred circulated a list of ideas to potentially create more offense in the upcoming 2015 season. One idea: “wrapping the ball to make it fly farther” (Fox Sports). Whether that came to pass or not or not, the three months following the 2015 All-Star Break brought the highest home run rate since the Steroid Era. Arthur and Lindbergh (2016) found that exit velocity on batted balls spiked in summer 2015 compared to the two previous summers, indicating a major change. They found a home run rate 30% higher in the second half of the season than their first-half model would have predicted.

Analysts, bloggers, and sabermetricians took notice and began to examine everything from the distance of fly balls to the composition of the ball itself. Nathan (2016) found “the principal factor accounting for the large increase in home runs in 2016 is likely due to exit speed [off the bat]”, but he stopped short of crediting the ball for this change. Noticing that exit speeds were increased for some launch angles (fly balls and pop-ups) but not others (line drives), he could not reach a strong explanation for the increased exit speed. Arthur (2017) found that batters were swinging with a greater uppercut than before. This is a systematic change in gameplay, purportedly in response to the increase in fastballs thrown by pitchers and increase in specialized defensive shifts.

Trying to differentiate the new swing patterns from the rising home run rate, analysts began focusing on the ball. Arthur and Dix (2018) x-rayed sets of balls from before and after the 2015 shift and found some changes to the new balls’ cores. This manifested itself in a “more porous, less dense layer of rubber,” leading to a bouncier ball. They cited Nathan (2016), who calculated the increase in home runs with each additional foot of ball flight. That estimate of 8.6 feet further per fly ball would lead to 25% more home runs, according to Nathan’s calculations. Notably, the actual increase in home runs was 46% between 2014 and 2017.

However, Nathan never reached the same ball-based conclusion for the change in offensive production as Arthur and Dix did. Nathan (2018), then working for the Commissioner’s office, expounded on his previous study and found no evidence of changes in the ball. His committee said, “We cannot find a single property that we can measure that would account for decreased drag” (Waldstein 2018) As media coverage of the ball shift increased, major media outlets like the New York Times began covering more than the Commissioner’s Study. In 2019, the Times quoted multiple veteran Major Leaguers who were convinced the ball was changed (Kepner).

The 2019 postseason saw a drastically reduced number of home runs. Arthur estimated the drag coefficients from MLB Statcast data and determined a “one in a million chance the balls [from the regular season and postseason] are the same” (via Rymer 2019)

Our research aims to clear up the disagreements between Arthur and Nathan. MLB still states that they have not changed the ball, and their extremely loose requirements of ball design means that they are still within the requirements of balls that are acceptable for gameplay. We aim to separate the swing changes from any potential ball changes, and we aim to model our findings using regressions, Poisson distributions, and Expected Weighted On-base Average (xwOBA). These methods allow us to show unexplained changes in home run rates between leagues and use tangible characteristics like exit velocity and launch angle to measure how unexpected the ball flights of the past few years have been.

Data and Research Design

We looked at two areas as part of our analysis. First, we examined the potential impact that the Major League ball has had on home run rates, both for MLB and its minor leagues. To do this, we performed two difference-in-difference regressions and drew two Poisson distributions. The first regression and distribution focused on home run rates in MLB and in AAA (the level directly below MLB) before and after rumors of a changed ball began in 2015. We use AAA as the control group here for two main reasons. First, since AAA is the last step before players make it to the majors, and players come between the two levels all the time, the level of talent, ballpark dimensions, competition, and playing style is generally similar between the two leagues. Second, AAA was using their own official baseball at the time that did not change over this period. We looked at data for 2014 and 2016 as our “before” and “after” cases. This data consists of the number of at-bats and home runs in each game for both seasons, acquired through the scraping of online MLB and AAA box scores.

For our second regression and distribution, we had a similar focus on home run rates, except this time we looked at data for AAA and AA (the level directly below AAA) over the 2018 and 2019 seasons. Importantly, there was a definitive change in the baseball during this time period. In 2019, AAA began using the MLB baseball for their games, while AA continued using their own ball. This gives us a more concrete natural experiment that can provide us with insight as to how the ball might be affecting offensive output.

Our second area of analysis concerns the observed and expected values of weighted on-base average (wOBA) on fly balls. wOBA measures a player’s effect on run production through their offensive output. It is calculated by first giving each outcome of an at-bat (walks, hits, home runs, and hit by pitches) a coefficient based on the number of runs that outcome is expected to produce. For example, singles have a coefficient of roughly .8, doubles sit around 1.25, and home runs are worth roughly 2, though the exact numbers change every year. These coefficients are then multiplied by their frequencies and divided by plate appearances.

When doing an analysis on wOBA, it is helpful to know how much of that value is explained by offensive skill and how much was due to other factors. Luckily, Statcast provides us with a tool to measure this. Expected wOBA, or xwOBA, shows what a player’s wOBA values are expected to be based on the quality of their batted balls alone. Using their advanced camera tracking system, Statcast’s algorithms take the launch angle, exit velocity, and hit direction of a batted ball, then estimate probabilities for each possible outcome and multiply by the coefficients to return the xwOBA of a given batted ball. Through this method, xwOBA effectively takes factors like defense and park factors out of the equation.

In this case, a wOBA differential analysis is perfect because we have an outside factor that may be affecting offensive performance and we want to see if controlling for it will lead to a gap between observed and expected outcomes. In this analysis, we looked specifically at fly balls, as these hits would naturally be the ones most affected by a “juiced ball”. For example, if the ball really is affecting outcomes, we would see many fly balls that should be outs (outcomes with low xwOBA values) instead go for doubles and home runs (outcomes with high wOBA values) because they now fly farther than expected. Using the R package baseballr, we scraped Statcast batted ball data from baseballsavant.com - MLB’s online Statcast database - to look at every fly ball in MLB from 2015-2019. From that data, we regressed the difference between wOBA and xwOBA against xwOBA for each year.

Results 

Difference-in-difference Analysis

Figures 1 and 2 below show how the total number of home runs change across the leagues in question over the time period we are studying. Figure 1 focuses on the MLB and AAA scenario for 2014 and 2016, while Figure 2 shows the AAA and AA data for 2018 and 2019. In both cases, we see similar slopes for both leagues in the former year and diverging slopes starting at the beginning of the latter year. This divergence is largely the result of an increased slope for the experimental league, as the slope stays similar across years for the control league. The regression models below explore this behavior further.

Figures 1 and 2.png

MLB and AAA Home Run Rates, 2014 and 2016

For our first regression, we looked at the difference in home run rates for 2014 and 2016 in MLB and AAA, before and after the MLB ball was first allegedly changed. Our regression model is as follows:

HRper = B0 + B1(MLB) + B2(2016) + B3(MLB ∗ 2016) + u

Where HRper represents the expected number of home runs in game i, MLB is a dummy indicating league (MLB = 1, AAA = 0), After is a dummy indicating year (2016 = 1, 2014 = 0), and u is an error term. In a separate regression, we added the number of at-bats in a game as a control variable, as more at-bats in a game would yield more home run chances.

The results of this regression are in Table 1 below. From this regression, we see that the home run rate went up by 0.7 home runs per game compared to the previous rate in 2016 MLB. While this was happening, the AAA rate stayed basically the same. However, when we control for the number of at bats in each game, the AAA rate actually decreases by 0.27 home runs per game while the interacted coefficient stays at around 0.7.

Screen Shot 2020-08-07 at 11.02.21 AM.png

AAA and AA Home Run Rates, 2014 and 2016

Our second regression focuses on how the home run rates changed before and after AAA changed their baseball in 2019. The regression model is the same as the one from above, except that the year dummy indicator was changed to 2019 and the league dummy indicator was changed to AAA. The result of this regression is shown in table 2.

Screen Shot 2020-08-07 at 11.02.43 AM.png

In this data, we see a lot of the same results, except this time a bit more extreme. Here, we find that the home run rate goes up by about a full home run per game for AAA when playing with the new ball. The rate slightly increases for AA as well, but this goes away when we account for the number of at-bats per game.

When looking at the Poisson distributions of home runs per game in Figures 3, 4, 5, and 6, the effect of these rate changes become very clear. In our first case, while the average number of home runs per game in AAA stayed fairly steady at about 1.6, the MLB rate jumped from 1.7 to 2.3 per game. The second case shows the same trend, with AA staying at around 1.55 and AAA jumping from 1.75 to 2.75 home runs per game. Overall, the results of our difference-in-difference analysis are consistent with the Juiced Ball theory.

Figures 3 and 4.png
Figures 5 and 6.png

Fly Ball wOBA Analysis

For the second part of our analysis, we looked at the relationship between xwOBA values and wOBA differentials and how it has changed since 2015. The trends in the wOBA differential are shown in Figure 7 below. In this chart, the differential for each fly ball is plotted on the y-axis and the xwOBA value associated with each fly ball is plotted on the x-axis. We would expect a slight dip in the middle of the trendline, which would be explained by a good number of predicted singles ending up as outs or predicted doubles ending up as singles. This kind of relationship is what we see in 2015 and 2016, the first two years where the ball allegedly changed. However, from 2017 onward, we see an unexpected and significant positive trend in the relationship with a peak in 2019. Instead of seeing predicted doubles becoming outs, we see many of them becoming triples or home runs instead. This behavior would likely be caused by the ball flying farther than expected, which is consistent with the Juiced Ball theory.

Figure 7.png

Another important thing to note about these charts is their parabolic shape, which comes largely as a result of xwOBA’s design. It is only measured from 0 to 2, with a value of 0 being an almost guaranteed out and a value of 2 being an almost guaranteed home run. This results in the differential anchoring toward zero on both ends of each plot. With this in mind, we ran a regression for each year using a quadratic model, which was designed as follows:

Diff = B0 + B1(xwOBA) + B2(xwOBA2) + u

In this model, Diff is the expected differential of a fly ball, xwOBA is the xwOBA value of that fly ball, xwOBAˆ2 is the squared term, and u is an error term. When we run this model, the coefficient for xwOBA shows the initial increase in Diff for the first unit increase and the coefficient for the squared term shows how the slope changes with each unit increase of xwOBA. The data for this model is adjusted so that each unit is equal to a change in xwOBA value of 0.1. If xwOBA is performing as expected, both of these coefficients should be either not significantly different from zero or the xwOBA coefficient would be slightly negative. However, if we really are seeing a difference in the ball that produces more offense, we would see the xwOBA coefficient be significantly positive and the squared term coefficient be significantly negative.

The results of these regressions are shown in Table 3 below. For the most part, they reflect what we see in Figure 7. The results for 2015 and 2016 show a negative slope at first, with a positive coefficient for the squared term. 2017 serves as a bit of a turning point, as the coefficients for 2018 and 2019 have statistically significant values in the opposite direction. 2019 is especially noticeable, as the initial slope is much higher than in any of the other years. Overall, we see a trend where the xwOBA coefficient is increasing over time, which is consistent with the Juiced Ball theory.

Screen Shot 2020-08-07 at 11.09.39 AM.png

Potential Limitations and Confounds

Our analysis benefits from the size of our dataset, but it also has areas where the metrics may fall short. Our limitations mainly deal with data availability, and potential confounds are mostly in the realm of rule changes. There is also a potential omitted variable problem, because many components that were not included in our regression models could influence home run rates.

First, we do not have access to Statcast advanced metrics from before 2015. This would confound our results for two reasons. The first is that does not allow us to get a larger baseline of “normal” ball behavior from before the alleged changes in the ball. The second is that Statcast has completely changed the way teams view the game and their players in its short existence. For example, analytical teams like the Tampa Bay Rays have modified the way they analyze their players in spring training, moving away from metrics like batting average and undergoing more exit velocity analyses. On the margins, this would privilege hitters who could create a larger exit velocity towards being on the MLB roster. Hitters with outstanding exit velocity are more likely to hit home runs, and if more of these players are in the MLB than prior trends would dictate, there would be an increase in the number of home runs per game in the MLB. This problem, distributed systematically, could contribute the discrepancy in Figs. 3-6. In addition, since the MLB had access to Statcast data throughout the season, whereas AAA did not, there could have been changes in play style not reflected across the leagues. This difference is likely not strong, as all players within an organization would have worked with the technology during spring training and likely adjusted their play style to better fit the goals of their organization.

More interesting and difficult to explain, the difference between wOBA and xwOBA began to grow in 2017. This suggests an omitted variable and contradicts the conventional wisdom that the change in ball occurred in 2015. Possible reasons for this change could be a further change in ball, which matches the data showing greater year-over-year home runs every year since 2015.

There were also rule changes in MLB in 2015 that could lead to increased offensive production. For example, pitchers were no longer entitled to eight warm-up pitches as of 2015, which could have caused them to be unprepared for the first batters in a given half-inning.

Another potential confound is changes in defensive fielding. Changes in defensive fielding are very hard to quantify; consequently, we were unable to incorporate fielding into our analyses. There have been major changes stemming from usage of the player-tracking SportVU technology, mostly with the creation of personalized defensive shifts. This would cause more fly balls to turn into outs than before. Therefore, it will likely not explain the spike in home runs and actually is a moderating effect. Without this confound, we would have found more significance above our already-significant findings.

Conclusions and Discussion

The change in MLB offensive production between 2015 and 2019 is undeniable. Our regressions and Poisson models show that the home run rate changed in a significant way, our wOBA differentials show that more batted balls are becoming fly balls than expected, and the difference-in-difference analyses pinpoint a change that could track with a livelier ball. From these, we conclude that the ball likely changed in 2015. However, one of our more surprising results comes from the comparison between xwOBA and wOBA. Because these statistics track together for 2015 and 2016, begin to separate in 2017, and become quite disparate in 2018 and 2019, we hypothesize that another change may have happened to the ball in more recent years. In other words, this may not be the work of just one change, but rather multiple changes made from 2015-2019. This theory would be able to better explain the variation in home run rates between the regular season and postseason in 2019.

This is not certain, of course. For example, the continued growth in wOBA discrepancies on fly balls could just be related to inaccuracies in the Statcast algorithms that calculate xwOBA. As the aphorism goes, “when a metric becomes a goal, it loses its use as a metric.” xwOBA is not in this trap, but if MLB teams truly prioritize bat speed and exit velocity, xwOBA could become less tracked with wOBA over time. A further experiment upon our conclusions might look at the differences in offensive production between 2016 and 2018 in MLB compared to AAA, which tests our theory that the biggest ball change actually happened in 2017 while keeping the pre-2019 AAA control group.

For now though, we should enjoy the games ahead, as we are living in a golden age of home runs that we may never see again.

Works Cited:

Arthur, Rob. “The Fly Ball Revolution Is Hurting As Many Batters As It’s Helped.” FiveThirtyEight, 16 May 2017, https://fivethirtyeight.com/features/the-fly-ball-revolution-is-hurting-as-many-batters-as-its-helped/ 

Arthur, Rob, and Tim Dix. “We X-Rayed Some MLB Baseballs. Here’s What We Found.” FiveThirtyEight, 1 Mar. 2018, https://fivethirtyeight.com/features/juiced-baseballs/ 

Arthur, Rob, and Ben Lindbergh. “A Baseball Mystery: The Home Run Is Back, And No One Knows Why.” FiveThirtyEight, 30 Mar. 2016, https://fivethirtyeight.com/features/a-baseball-mystery-the-home-run-is-back-and-no-one-knows-why/ 

Kepner, Tyler. “ ‘Just Come Out and Say It’: Players Want Answers on the Changing Ball.” The New York Times, 9 July 2019. NYTimes.com, https://www.nytimes.com/2019/07/09/sports/baseball/mlb-baseballs-juiced.html 

Lindbergh, Ben, and Mitchel Lichtman. “The Juiced Ball Is Back.” The Ringer, 14 June 2017, https://www.theringer.com/2017/6/14/16044264/2017-mlb-home-run-spike-juiced-ball-testing-reveal-155cd 

Nathan, Alan. “Exit Speed and Home Runs.” The Hardball Times, 18 Jul. 2016. https://tht.fangraphs.com/exit-speed-and-home-runs/  

“New MLB Commissioner Rob Manfred Says He Is Open to Discussing Any Change to Improve Pace, Offense, Attraction of Baseball to Younger Fan, and That Is Just the Perspective Baseball Needs Right Now, Ken Rosenthal Says.” FOX Sports, 25 Jan. 2015, http://www.foxsports.com/mlb/just-a-bit-outside/story/mlb-commissioner-rob-manfred-recognizes-boring-needs-change-more-offense-adapt-quickly-012515 

Rymer, Zachary D. “A De-Juiced Ball Conspiracy Hangs over the MLB Postseason.” Bleacher Report, https://bleacherreport.com/articles/2858362-a-de-juiced-ball-conspiracy-hangs-over-the-mlb-postse

Waldstein, David. “M.L.B. Hired Scientists to Explain Why Home Runs Have Surged. They Couldn’t.” The New York Times, 24 May 2018. NYTimes.com, https://www.nytimes.com/2018/05/24/sports/major-league-baseball-study.html 

Previous
Previous

Analysis: Castellanos is Crushing It

Next
Next

Commentary: The Biggest Surprises of the 2020 Season (So Far)