Analysis: It’s Not About How You Start…Or Is It?
In the context of a normal MLB season, most people (and especially Nationals fans) will tell you that it’s not about how you start, it’s how you finish. And we have certainly seen our fair share of incredible comebacks over the years, such as the Miracle Mets of ‘69, the iconic ‘95 Mariners, the epic Game 162 scenario in 2011, and, of course, a certain team that started their season 19-31 last year. The excitement of the pennant chase and the ever-present hope of a comeback is one of the best things about watching a baseball season, and it's why we tell so many stories about these teams decades after the fact.
However, all of those teams had 162 games to showcase their real talent and make up for their lackluster starts. What happens now that teams are facing a sixty game season? If there is less time to catch back up to the division leaders, will the start of the season be just as important as the end? How crucial are the first twenty games to a team’s success in the context of a sixty-game season?
To look into this, I took a deep dive into the records for each team after 20, 60, and 162 games over the past 5 seasons. I then analyzed these records in two different ways; the first using a bit of statistical analysis, and the second looking at how playoff races were affected by the first twenty games.
First, let’s get the math out of the way. In order to get a sense of how important these first twenty games are, I did a regression analysis of the relationship between a team’s performance in the first twenty games and their winning percentage at 60 and 162 games. Before I could do this though, I had to address an issue with the twenty-game winning percentage. See, because only twenty games had been played, the winning percentage for each team can only vary in increments of five percentage points: .350, .400, .450, and so on. This makes a scatter plot of the data look less like a scatter plot and more like a row of vertical lines, which is a problem because if we can’t differentiate the performances of teams with the same record, this quirk will bias the results of the regression.
So, instead of using regular winning percentage over those games, I will be using a slightly altered variation known as Pythagorean winning percentage. This is a pretty common metric that was developed by Bill James, and it measures what a team’s winning percentage should be given how many runs they’ve scored and allowed. Using this metric helps us in two ways. First, it allows us to see the variation in performance between teams with the same record, which solves the problem I described earlier. Second, Pythagorean W-L% is much more predictive of future performance than regular winning percentage, meaning we should find a stronger overall relationship in both cases. The results of this analysis are shown in the scatter plots below, where the solid line represents perfect correlation and the dashed line represents the regression result:
I’m not going to lie, these graphs surprised me a bit. While there is a difference in the correlation, it is definitely not as large as I imagined it would be. An increase of one percentage point to a team’s 20-game pythagorean winning percentage is associated with an increase of roughly half a percentage point to your 60-game winning percentage, compared to an increase of .41 percentage points to your final winning percentage. In other words, if a team had a Pythagorean winning percentage that was .100 higher than average at 20 games, we would expect their final winning percentage to be about .050 higher than average after a full 60 games (which would come out to roughly a 33-27 season since the average winning percentage is always .500). As one might imagine, this isn’t quite the 1:1 relationship we would expect from something like this, something that also is shown through the correlation measures. Just like the slope estimates for both regressions, the r-squared (a measure of how related two variables are in a regression) for the sixty-game regression is slightly higher than the 162-game regression, but still not particularly strong--a .46 compared to .39.
These numbers mainly tell us the same story: while having a good first twenty games is certainly related to having a good sixty-game record, the effect might not be dramatically better than it would be for a full season. In general, this makes sense. Even though it’s nowhere close to a normal season length, sixty games is still a good sample size, so we should expect most of the best teams to eventually rise to the top regardless of how each team starts the year. And this doesn’t even take into account things like injuries, trades, and front office moves that are generally out of the players hands but nevertheless affect team performance. So while having a great first twenty games is certainly desirable for any team, there’s no telling whether that luck will bleed into the other 40 games.
That being said, this analysis leaves out arguably the most important context, that being the playoff race. As such, one of the main questions we still need to answer is this: do teams that start the year on top tend to ride their momentum into October? To answer this, I went back into my data set of team records and followed the playoff picture throughout the year, sorting each division by record at 20, 60, and 162 games for the same five seasons. The goal here is to see how the leaders in each division and wild card spot change over time, and how the playoff teams at sixty games compare to those at the end of the season. The results are shown in the tables below:
In the second column, the teams in bold were in playoff position both twenty and sixty games into the season, while the teams with an asterisk would have had a shot at the playoffs in a sixty-game season but not make the cut at the end of the year. In the third column, teams that are underlined would either have had to play their way into a playoff spot or miss the playoffs entirely had the season been sixty games that year. And as can be seen from these tables, there are quite a lot of bold letters, asterisks, and underlines to discuss here.
It is important to note that most of the teams in playoff position at sixty games were also in playoff contention at the twenty-game mark, but many of the division leaders at twenty games would not go on to win that division. In some cases, a team can be particularly good and take a stranglehold on a particular division (e.g. the 2019 Dodgers or 2016 Cubs). However, there are other cases where if the season were 60 games, a team could have sneaked into the playoffs on the coattails of a great start. For example, we could have seen the Diamondbacks win the NL West in 2018 and keep the Dodgers out of the postseason entirely. In addition, we could have seen the Mariners finally break their playoff drought that same year on the heels of their excellent first two months of the season. Generally, it seems like every season there are at least 2-3 teams in playoff contention at 60 games that are eliminated at 162 and vice versa, which indicates that there may be a lot of weird and exciting baseball to come in this shortened season.
So what should we make of all of this? As we see from the data as well as the tables above, having a good first twenty games may not be an automatic ticket to the playoffs in a short year, but it can give you a good chance of shocking the world. Sixty games still gives plenty of time for the cream to rise to the top and there are plenty of outside factors that could affect the last 40 games, but if a middling team can come out of the gates firing and play their cards right, they might be able to sneak in and play some October baseball. Of course, a lot is going to depend on how each team finishes their season. But for the first time in a long time, how those teams start their season may be just as important.
UPDATE: Since the time of writing, Ben Clemens of FanGraphs published a similar piece that does a better job of explaining the concepts of this piece in an analytical way. You can check out that article here.