All of our championship predictors work in basically the same way, which is described in general below.
My philosophy was simple: find out what a championship-winning team looks like statistically, or in other words, identify the statistical criteria required for winning a championship, and then develop a system to rate each team in the league against those criteria. My goal is to determine which team most closely resembles a championship winner with simple calculations that could be done easily in excel.
Given that objective, let’s take as an example the NBA predictor. I needed to identify the stats that would inform the model. I started with offensive and defensive ratings (estimates of points scored and points allowed per 100 possessions, respectively) and net rating (net points per 100 possessions). Offensive and defensive ratings have become popular in the NBA because they put all teams on a level playing field and tell which offenses or defenses are most efficient. Points per game could be inflated by playing faster (play fast = more possessions = more opportunities to score) or deflated by playing slower. The same idea could be applied to football, where a team that leans more heavily to running plays might have more time-consuming drives and therefore fewer drives in a game. So we use offensive or defensive efficiency ratings rather than raw points scored or allowed per game.
Another consideration was that the league changes over time. For example, in the 2010-2011 season, the Denver Nuggets had the best offense in the league with an offensive rating of 112.7, and the league-average offensive rating was 107.4, per Cleaning The Glass. In the 2020-2021 season, that 112.7 offensive rating would have only ranked 14th and the league-average was 112.9. So rather than use the actual ratings, we use the league-wide rank. Relative ratings are more important than exact ratings.
So to identify the criteria required to win a championship, I looked at the league-wide ranking in offensive, defensive, and net ratings for the last few teams that actually won. For example, looking at the last 5 NBA Champions in terms of offensive rating, and calculating the average, you get this table below.
|Year||Team||Offensive Rating Rank|
|2020||Los Angeles Lakers||11|
|2018||Golden State Warriors||2|
|2017||Golden State Warriors||1|
So over the last 5 years, the average NBA champion had an offensive rating rank of 5.4, so you could say that your criteria for winning a championship is a top-5 offense. You could carry out that same activity for each stat you want to include. And that’s exactly what I did! I chose a handful of stats and calculated the average of the last few champions’s rankings in those stats. If the average ranking was high (like top-10 or better), I would consider using that stat in the model. If it was low (lower than top-10), I would not consider including it in the model because it’s unlikely to actually be important.
For the NBA model, knowing that three-point shooting is highly valued in the NBA today, I needed to find a three-point shooting stat to include. I originally used 3FG% and made threes but found that using both of those favored offense more heavily than I was comfortable with, so I settled on eFG% (1.5 * made three-point shots + made two-point shots / total shots attempted) and eFG% allowed (same thing but for defense).
So I established a criteria for offensive rating, defensive rating, net rating, eFG% and defensive eFG%. Then categories that have a stricter criteria are weighted more heavily (i.e., if the criteria for a category is a top-3 ranking, that category has more points than a category with a criteria of a top-6 ranking). Teams are scored in each category and the sum of the scores in every category is a team’s overall “Championship Score.” Championship probabilities are based on these Championship Scores.
After developing the rating system, I also wanted the ability to predict a team’s record. So I charted each team’s end-of-season Championship Score against their actual winning percentage and, for the 2020-2021 NBA season as an example below, fit the data to an equation.
Now there’s a simple method for rating each team and predicting their winning percentage from just simple calculations in excel, no simulation or programming required.
Data for every season beginning in 2012 inform the overall record prediction models, with data from each additional season added when available. Beginning in 2023, a rolling 10-year window will be used.
The same approach was applied to the MLB World Series, NFL Super Bowl, WNBA Finals, and March Madness / NCAA Men’s Basketball Tournament. Each model is informed by an offensive, defensive, and net “efficiency” (defined in sport-specific ways) and two additional sport-specific stats. In the future we’d like to do the same for NCAA Women’s Basketball, NCAA Football, and possibly more. These are in various stages of development and data acquisition.back to blog