Yesterday, I published an article about Memorial Day as it relates to the baseball standings. In sum, I wrote about the baseball adage that one should not check the standings until Memorial Day. Using data from 2010 to 2018, I looked at the correlation between Memorial Day winning percentage and end-of-season winning percentage and constructed a linear regression line to fit the data.
Within the piece, I used the regression equation to discuss full-season scenarios for the Twins and Nationals, two teams that have surprised — albeit for different reasons — this season. The response to the article was interesting, and some asked for me to take a look at full-season projections for all 30 teams based on the regression.
This sortable chart does exactly that:
|Team||Win Percentage||Projected Final||Projected Wins||Projected Losses|
I will say that you should take these projections with a grain of salt, and I’d recommend for you to look at our actual projected standings for a better estimate of the full-season results. These projections are based on a regression line that only could account for 57% of the variability in full-season results.
This means that these expected win totals could be off, and they could be off pretty significantly. As I wrote in my article on Wednesday, we’ve seen teams outperform their expectation by as many as 112 points (2012 Dodgers) or underperform their expectation by as many as 129 points (2013 Astros). With this in mind, let’s call these two extremes our best- and worst-case scenarios, respectively. Now let’s put those best- and worst-case scenarios into a chart for all 30 teams:
|Team||Win Percentage||Projected Final||Best Case||Best Case Wins||Worst Case||Worst Case Wins|
As you can probably see, this doesn’t tell us much. The Twins aren’t going to win 116, the Rockies aren’t going to win 96, and the Orioles won’t win 80. But if you consider these to be the absolute high-bound win totals for most teams — something like the 99.6th percentile, considering only one team out of the 270-team sample (0.4%) we have from our initial dataset was able to achieve these levels of outperforming the expectation — things begin to make more sense.
A troubling figure is the one for Nationals fans with hope; if their 99.6th percentile projection is only 88 wins, I think we can begin to safely assume that 2019 is going to be a lost season in D.C. On the flip side, if you’re a Twins fan and you see that their 0.4th percentile projection is 77 wins, you’d have to be feeling pretty good.
Realistically, no team is going to play to these projections. Let’s use the 25th and 75th percentile instead, as those would still be within a truly possible range of outcomes. Based on the 270-team sample again, we would find the 75th percentile residual to be +29 points of win percentage and the 25th percentile residual to be -31 points of win percentage. Let’s construct a third chart with these scenarios:
|Team||Win Percentage||Projected Final||75th Percentile||75th Wins||25th Percentile||25th Wins|
I still caution you when looking at these charts; they are not adjusted for team strength (or even run differential), as they just use previous team data to estimate the full scenarios. But these results paint what appears to be a pretty decent picture of where the league stands today. The Twins continue to look like the favorites to win AL Central, don’t they?
With that said, I am brought to a second question: if a team is in a playoff spot on Memorial Day, do they tend to hang on to the spot by the end of the season?
I went back to my sample of Memorial Day and Final Standings, and I looked at the results from 2012 through 2018. This represents every team who has played in the era of two Wild Cards. I found that of the 70 teams that have made the playoffs in those seven years, 46 of them held a playoff spot on Memorial Day. That’s 66%.
That would mean between six and seven of the teams who are already in a playoff spot as of Memorial Day will still be in a playoff spot by the end of the season. We can probably begin to guess those teams. The Dodgers have a six-game lead in the NL West, the Twins have a seven-game lead in the AL Central, and the Astros have a seven-and-a-half-game lead in the AL West. Those three teams are more or less locks to make the playoffs, and our odds reflect that; those three teams all have greater than a 90% chance to continue into October. The Yankees do too, as they currently have a two-game lead in the AL East.
With those, we’re already at four of our six or seven teams, so while 66% might sound like a lot, all those teams who had commanding divisional leads on Memorial Day tended not to fall out of the playoffs altogether. Bubble teams stay as bubble teams, and those teams will continue to shuffle in the standings as the season goes on.
The last topic I want to discuss is the predictiveness of Memorial Day records. As one reader of the original piece kindly pointed out, a comparison of the Memorial Day winning percentage and a team’s final winning percentage doesn’t have much predictive power. This is because a team’s final win percentage includes the games they played before Memorial Day, so there is double-counting involved. For my initial question, “Is Memorial Day the time to check the standings?” the double-counting works fine. I wasn’t looking to determine how predictive a team’s Memorial Day record actually is, I just wanted to figure how well teams finished out after their early-season performance.
This distinction is important, and our Nationals example can prove exactly why. The Nationals are currently 19-30 and in a nine-game hole in the NL East. Even if the Nationals finish their season by going 62-51, which represents a pretty solid .549 win percentage (89-win pace), they’d only finish the year 81-81. The Memorial Day record didn’t do a great job of predicting the Nationals’ rest-of-season record, but it did do a better job of predicting the Nationals’ full-season record.
So, let’s take a look at the predictive power of a team’s Memorial Day record:
To be blunt, it’s not great. There’s a moderate correlation here, evidenced by our r-value. But our regression line can only explain about 25% of the variability in a team’s rest-of-season win percentage, so there’s still a lot of change that can happen over the remainder of the baseball season, as expected.
What does this tell us, in combination with yesterday’s scatterplot which showed a much stronger correlation between Memorial Day winning percentage and full season winning percentage? Well, it tells us that teams can build themselves a large cushion (a la the Twins) by Memorial Day and ride that to full-season success. Conversely, it tells us that teams can be buried (a la the Nationals) by Memorial Day, and even with a rest-of-season turnaround, they probably still won’t be successful overall. But a team’s record on Memorial Day alone doesn’t necessarily tell us how they will play over the remaining games. That small distinction is extremely important when trying to answer my initial question. Yes, Memorial Day standings are meaningful, but no, they don’t do a great job of telling us how the teams will play over the remaining 110 or so games.
var SERVER_DATA = Object.assign(SERVER_DATA || );