I’m kind of obsessed with trying to figure out which all-in-one metrics best measure player productivity and goodness in the NBA. I’ve built a basic statistical plus-minus in the past, called DRE, which essentially functions as an updated version of John Hollinger’s GameScore, only with more accurate weights.
Recently I decided I wanted to build off of work done by Andrew Johnson to create his Player Tracking Plus-Minus (PT-PM) back when SportVU was the NBA’s primary public facing data provider. SportVU has since been replaced by Second Spectrum and they’ve released more years of data, some of the way things are counted has changed, and some additional data assets have been released, like individual shot defense with an accounting for the difference in shot success versus an opponents’ average percentages in the same spot on the court. In addition, thanks to Ryan Davis of NBAShotCharts.com, there is now a long-run, publicly available 5 year RAPM (as well as a Luck-Adjusted variant, which is what I utilized here) that the first 5 years of tracking data could be trained on to produce a more accurate statistical plus-minus.
I spent a lot of time tweaking and refining the models for offense and defense to attempt to maximize predictiveness out-of-sample. (Methodological note: I utilized the caret package in R, using the “glmnet” method, and 10×10 repeated cross-validation to arrive at these values). I gave a lot of thought to different variables to include and which variables to drop due to obvious collinearity issues, as well issues of overfitting based on variables being included that just made no basketball sense.
For projecting Offensive LA-RAPM an elastic net model proved best for maximizing out of sample prediction. For Defensive LA-RAPM a LASSO model was best. For Tracking Plus-Minus Offense, the following variables and coefficients were selected:
Those variables listed are defined as follows:
- FG2M_100: 2 Point Field Goals Made per 100 possessions
- FG2A_100: 2 Point Field Goals Attempted per 100 possessions
- FG3M_100: 3Point Field Goals Made per 100 possessions
- FG3A_100: 3Point Field Goals Attempted per 100 possessions
- FTM_100: Free Throws Made per 100 possessions
- FTA_100: Free Throws Attempted per 100 possessions
- ADJ_ORB_PCT: Adjusted Offensive Rebounding Percent — the percentage of offensive rebounds per chance, excluding rebounds deferred to teammates
- AST_PTS_100: Points assisted on per 100 possessions
- TOV_100: Turnovers per 100 possessions
- DIST_OFF_TOP: Distance (in miles) traveled on offense / time of possession on offense (hat tip to Krishna Narsu for suggesting this variable)
- STL_100: Steals per 100 possessions
- MPG: Minutes per game
In looking at the variables and their relative values it seems that the regression matches general basketball sense. Efficiency and volume (in scoring as well as passing) is highly prized. Adjusting for deferred rebounds to get a picture of a player’s rebounding prowess when actually trying helps better separate the best rebounders. Turnovers are bad. Off-ball movement (measured by proxy via the amount of distance traveled per time of possession) brings additional value. Steals create easy offense and serve as a positive athleticism proxy.
On the defensive side of the ball the r² on the resampling results was lower (.50 for offense v. .39 for defense), while the root mean squared error (RMSE) was actually very slightly smaller for predicting Defensive Luck-Adjusted RAPM (1.38 for offense v. 1.37 for defense). The variables and coefficients selected are:
The variables not already defined above are as follows:
- DREB_CONTEST_PCT: The percentage of defensive rebounds a player collects that are actually contested
- DRB_100: Defensive rebounds per 100 possessions
- LT6_2PTS_SVD_100: Points Saved per 100 possessions within 6 feet of the basket (calculated using the tracking data)
- GT6_2PTS_SVD_100: Points Saved per 100 possessions outside 6 feet, but still on 2 point shot attempts (calculated using the tracking data)
- DFG3_PTS_SVD_100: Points Saved per 100 possessions on 3 point shot attempts (calculated using the tracking data)
- OFD_100: Offensive fouls drawn per 100 possessions
We can see from these coefficients that offensive load (proxied by made 2 point shots, attempted 3 point shots, free throw attempts) generally carries with it a negative effect on defense, all else equal. In addition, shot defense seems to matter a great deal, as points saved from each area of the court mattered (though to slightly varying degrees). Steals and offensive fouls drawn pair to provide significant predictive value, which mirrors the work of others. Interestingly, when shot defense is accounted for, blocks are no longer needed to predict defensive impact. Finally, MPG remaining a predictor of defensive impact, even when controlling for these other variables, shows that coaches are able to provide us additional valuable information about which players are best at defense.
After developing those weights, I utilized a mean-regression method via Jacob Goldstein to add 350 minutes of -1.7 points per 100 possessions impact of offensive play and 450 minutes of -.3 points per 100 possessions impact of defensive play, which is one of the ways he mean-regresses his metric, Player Impact Plus-Minus (“PIPM”).
After that, I calculate the league wide difference between possession-weighted Tracking Plus-Minus for both offense and defense and 0 and then adjust the numbers so that the league is zero sum on both offense and defense.
Now for what most of you have probably been waiting for, the results!
The whole 6 years of results (2013–18 is in sample while 2018–19 is out of sample) can be found here.
These results jive pretty well with my own eye test and the top 3 in MVP voting also made it into the top 3 of the metric, albeit in slightly different order. I feel pretty good about the results overall.
Hope you enjoyed! I should have more to come on Tracking Plus-Minus, as I’ll be utilizing it to predict win totals for this year’s NBA season before the season stars.