Developing An Updated Tracking Plus-Minus Metric

I’m kind of obsessed with trying to figure out which all-in-one metrics best measure player productivity and goodness in the NBA. I’ve built a basic statistical plus-minus in the past, called DRE, which essentially functions as an updated version of John Hollinger’s GameScore, only with more accurate weights.

Recently I decided I wanted to build off of work done by Andrew Johnson to create his Player Tracking Plus-Minus (PT-PM) back when SportVU was the NBA’s primary public facing data provider. SportVU has since been replaced by Second Spectrum and they’ve released more years of data, some of the way things are counted has changed, and some additional data assets have been released, like individual shot defense with an accounting for the difference in shot success versus an opponents’ average percentages in the same spot on the court. In addition, thanks to Ryan Davis of NBAShotCharts.com, there is now a long-run, publicly available 5 year RAPM (as well as a Luck-Adjusted variant, which is what I utilized here) that the first 5 years of tracking data could be trained on to produce a more accurate statistical plus-minus.

I spent a lot of time tweaking and refining the models for offense and defense to attempt to maximize predictiveness out-of-sample. (Methodological note: I utilized the caret package in R, using the “glmnet” method, and 10×10 repeated cross-validation to arrive at these values). I gave a lot of thought to different variables to include and which variables to drop due to obvious collinearity issues, as well issues of overfitting based on variables being included that just made no basketball sense.

For projecting Offensive LA-RAPM an elastic net model proved best for maximizing out of sample prediction. For Defensive LA-RAPM a LASSO model was best. For Tracking Plus-Minus Offense, the following variables and coefficients were selected:

-5.45+.673*FG2M_100-.339*FG2A_100+.976*FG3M_100-.141*FG3A_100+.630*FTM_100-.163*FTM_100+1.799*ADJ_ORB_PCT+.162*AST_PTS_100-.479*TOV_100+1.1*DIST_OFF_TOP+.459*STL_100+.046*MPG

Those variables listed are defined as follows:

  • FG2M_100: 2 Point Field Goals Made per 100 possessions
  • FG2A_100: 2 Point Field Goals Attempted per 100 possessions
  • FG3M_100: 3Point Field Goals Made per 100 possessions
  • FG3A_100: 3Point Field Goals Attempted per 100 possessions
  • FTM_100: Free Throws Made per 100 possessions
  • FTA_100: Free Throws Attempted per 100 possessions
  • ADJ_ORB_PCT: Adjusted Offensive Rebounding Percent — the percentage of offensive rebounds per chance, excluding rebounds deferred to teammates
  • AST_PTS_100: Points assisted on per 100 possessions
  • TOV_100: Turnovers per 100 possessions
  • DIST_OFF_TOP: Distance (in miles) traveled on offense / time of possession on offense (hat tip to Krishna Narsu for suggesting this variable)
  • STL_100: Steals per 100 possessions
  • MPG: Minutes per game

In looking at the variables and their relative values it seems that the regression matches general basketball sense. Efficiency and volume (in scoring as well as passing) is highly prized. Adjusting for deferred rebounds to get a picture of a player’s rebounding prowess when actually trying helps better separate the best rebounders. Turnovers are bad. Off-ball movement (measured by proxy via the amount of distance traveled per time of possession) brings additional value. Steals create easy offense and serve as a positive athleticism proxy.

On the defensive side of the ball the on the resampling results was lower (.50 for offense v. .39 for defense), while the root mean squared error (RMSE) was actually very slightly smaller for predicting Defensive Luck-Adjusted RAPM (1.38 for offense v. 1.37 for defense). The variables and coefficients selected are:

-1.70+.048*MPG-.17*FG2M_100-.104*FG3A_100-.046*FTA_100-.034*TOV_100+.294*DREB_CONTEST_PCT+0.181*DRB_100+0.737*STL_100+.689*LT6_2PTS_SVD_100+.561*GT6_2PTS_SVD_100+.607*DFG3_PTS_SVD_100+.101*OFD_100

The variables not already defined above are as follows:

  • DREB_CONTEST_PCT: The percentage of defensive rebounds a player collects that are actually contested
  • DRB_100: Defensive rebounds per 100 possessions
  • LT6_2PTS_SVD_100: Points Saved per 100 possessions within 6 feet of the basket (calculated using the tracking data)
  • GT6_2PTS_SVD_100: Points Saved per 100 possessions outside 6 feet, but still on 2 point shot attempts (calculated using the tracking data)
  • DFG3_PTS_SVD_100: Points Saved per 100 possessions on 3 point shot attempts (calculated using the tracking data)
  • OFD_100: Offensive fouls drawn per 100 possessions

We can see from these coefficients that offensive load (proxied by made 2 point shots, attempted 3 point shots, free throw attempts) generally carries with it a negative effect on defense, all else equal. In addition, shot defense seems to matter a great deal, as points saved from each area of the court mattered (though to slightly varying degrees). Steals and offensive fouls drawn pair to provide significant predictive value, which mirrors the work of others. Interestingly, when shot defense is accounted for, blocks are no longer needed to predict defensive impact. Finally, MPG remaining a predictor of defensive impact, even when controlling for these other variables, shows that coaches are able to provide us additional valuable information about which players are best at defense.

After developing those weights, I utilized a mean-regression method via Jacob Goldstein to add 350 minutes of -1.7 points per 100 possessions impact of offensive play and 450 minutes of -.3 points per 100 possessions impact of defensive play, which is one of the ways he mean-regresses his metric, Player Impact Plus-Minus (“PIPM”).

After that, I calculate the league wide difference between possession-weighted Tracking Plus-Minus for both offense and defense and 0 and then adjust the numbers so that the league is zero sum on both offense and defense.

Now for what most of you have probably been waiting for, the results!

The whole 6 years of results (2013–18 is in sample while 2018–19 is out of sample) can be found here.

2018–19 Results

Top 20

These results jive pretty well with my own eye test and the top 3 in MVP voting also made it into the top 3 of the metric, albeit in slightly different order. I feel pretty good about the results overall.

Hope you enjoyed! I should have more to come on Tracking Plus-Minus, as I’ll be utilizing it to predict win totals for this year’s NBA season before the season stars.

From the Archives: Is ESPN’s Real Plus-Minus For Real?

The statistic is great but imperfect.

This piece originally ran on the now defunct precursor to the Nylon Calculus, Hickory High (RIP). Thanks to the Internet Archive I was able to rescue it.

Image via Keith Allison

This week ESPN rolled out a “new” statistic in its NBA toolkit. They call it Real Plus-Minus, or RPM. RPM is, essentially, the latest version of a statistic called xRAPM which has been made publicly available online by its developer, Jeremias Englemann, for a few years. Now, xRAPM is short for “expected regularized adjusted plus minus,” which, while very accurate, is just a ridiculous mouthful, so it’s understandable that ESPN would want a name that rolls off the tongue a bit more easily. Unfortunately, the use of REAL as the operative portion of the name they chose for their version of this metric indicates a certainty that belies the level of uncertainty that remains in the RPM framework. It is this inability or unwillingness to delve into the uncertainty in the numbers which is the biggest problem with ESPN’s roll out of RPM.

It’s understandable, given the effort ESPN surely had to go through to get these numbers under the ESPN brand, that they’d want to express their confidence in them. Still, these numbers are going to be used in more and more discussions about player value, and as such, it’s important that the underlying assumptions and framework for the metric are properly understood, so that they may be used in proper context. Here’s how ESPN introduced RPM to the general population:

What is real plus-minus

As the name suggests, real plus-minus shares a family resemblance with the +/- stat in the box score, which merely registers the net change in score (plus or minus) while each player is on the court.

RPM is inspired by the same underlying +/- logic: If a team outscores its opponents when a player is on the court, he’s probably doing something to help his team, whether or not he’s putting up big numbers.

But the familiar +/- stat has a serious flaw: Each player’s rating is heavily influenced by the play of his on-court teammates.

For example, in the basic +/- numbers, Thunder backup point guard Reggie Jackson is ranked 27th in the league. But he’s also spent the majority of his minutes playing alongside Kevin Durant, the league’s likely MVP. What we really want to know is how much of Jackson’s elite rating is attributable to his own play, and basic +/- simply can’t tell us.

But real plus-minus can.

(Emphasis mine). This is simply not true. The reason it’s not true is because it is quite literally impossible to totally attribute the impact of an individual player to the margin of victory in a basketball season. The best we can do–and what RPM actually does–is use math to come up with a best estimate of the value of each individual player. Again, ESPN has an incentive to go for the spectacular description, so this is hardly surprising, but it is too bad. RPM, and xRAPM before it, are incredibly powerful predictive tools and are probably the best estimate that presently exists for determining the all-in-one value of a given player, in their role on their team. It is, however, imperfect. It’s imperfect for perfectly reasonable reasons, but it’s not without its caveats, due to the methodology. The trouble is, ESPN has hidden parts of the methodology and described its assumptions insufficiently and incompletely. (Kevin Pelton did go on Zach Lowe’s Grantland podcast in order to explain how RPM works in somewhat more specific terms, but he didn’t get into all of the caveats I’ll get into below, and it’s not entirely clear why the explanation he gave was not part of the initial roll out. There’s also a good chance that many who read the initial RPM introduction did not also manage to listen to a podcast on a totally different, though affiliated website.)

It’s not that the method is proprietary or must stay hidden either, as anyone with the curiosity and free time can go Googling or diving into the APBR Metrics message board archives to find out just about everything that goes into RPM. Englemann has been very open about his process from the beginning. There’s a lot of fancy math involved that goes over my head, but here’s a couple of things I have gleaned from reading and paying attention:

RPM uses data from the prior year to reduce noise.

(Ed. note: After the first season it was used, ESPN stopped using prior year data in its calculation of RPM, which reduced its predictive ability but allowed them to more credibly use it for season-end awards discussions without being accused of using data from years before and muddying the waters.)

In any metric based around adjusted plus-minus, there is bound to be some level of noise or uncertainty. This is due to issues of collinearity and relatively small sample size. Collinearity, in this context, just means that there are often players who play together a lot or who only ever sub in and out for one another and thus it is hard to disentangle their value in +/- from one another. Small sample size results from their being a relatively small number of lineups from each team to draw upon in sussing out the value of individual players. As a result of this, some means are necessary to reduce the noise to have the best possible estimates of player impact.

RPM utilizes just about every possible way to reduce noise there is. All of these methods for reducing noise introduce certain influences to the numbers and its important to note them. The first of these influences is that RPM uses prior year data as a way to inform the regression upon which RPM rests. This helps in terms of overall predictiveness, but it also makes individual player ratings representative of their last two years rather than just the year that is allegedly being measured. This has important implications, given the way that ESPN will likely use the data (particularly in their end of season awards discussions).

Currently, LeBron leads the league in RPM per 100 possessions and (barely) in Wins Above Replacement (WAR). This is partially a function of LeBron’s stellar year last year when he hit his likely peak as a player. The guys at Talking Practice have created Individual Player Value (IPV), which is an all in one metric that is pretty similar to RPM in methodology and results, but they report numbers that are not informed by prior years. Under IPV, LeBron is third behind Kevin Durant and Stephen Curry. It’s clear that giving LeBron credit for last year gives him an upper hand under RPM, but no mention is made anywhere about the use of prior year data in articles using RPM as a tool thus far. This should be mentioned whenever RPM is used to make individual player comparisons.

RPM, like xRAPM before it, uses a box-score based prior which actually makes up a large portion of the metric.

What this means is that when running the regression (i.e. a crazy big math equation where there are many variables that need to be solved for, in this case the variables are the RPM values of the individual players), the equation is given information about where the player should likely shake out in terms of value, given what’s known about their box-score stats. The box-score based prior which RPM uses is based on a regression of box-score stats against season lineup data to best predict results. Basically, what this means is that players that put up great box-score numbers will be benefitted under RPM, even if those numbers are somewhat hollow.

As an example, say you have a player like Carlos Boozer who is a defensive rebounding machine, but who is otherwise an awful defensive player. Under RPM, Carlos’s defensive value will be somewhat artificially propped up, because defensive rebounding, on average, serves as one proxy for defensive value. In terms of overall predictive accuracy and confidence in the numbers produced, the box-score based prior helps a lot, but on the margins, there will be problems like our man Carlos. You can paint a similar picture for a player who gambles for steals a lot or someone who blocks a lot of shots but has poor defensive discipline and blows rotations routinely. The same is true of a player who puts up numbers on offense, while neglecting all of the unmeasured things on that end, the box-score prior will inflate his value relative to his Platonic “True Value.” The portion of RPM which is determined by regression against point differential while on and off the floor mitigates some of these ill effects, but it’s important to know these influences exist.

RPM contains a height-based prior which boosts the defensive ratings of all taller players.

This is a final situation where something is included in RPM which improves predictive accuracy, but which introduces a certain amount of bias into the numbers on the individual level. The reason for including the height based prior is simple: on average, big men tend to be much more impactful on the defensive side of the ball than smaller players. Although this is undoubtedly true on average, not all big men are good defenders, so adding the boost to all players who are big will necessarily inflate some undeserving players for the sake of greater overall average accuracy.

Given the box-score and the height-based priors, it’s again easy to think about Carlos Boozer, who takes frequent naps on defense, but is tall and snags many defensive rebounds. RPM is going to make Boozer look better on defense than he deserves. Unfortunately, these influences can have cascading effects. Due to the fact that the regression of RPM necessarily ties all players together in one big equation, if Boozer is getting more credit than he deserves, someone he plays with frequently is probably getting jobbed. (Yes, I’m mad about Joakim Noah getting screwed in RPM by Carlos Boozer, you guys).

These influences or biases, however you choose to term them, are not the end of the world, and the logic of them is sound in terms of modeling the league, but RPM is not infallible and ESPN would do well to more fully explain the assumptions underlying the model and the potential consequences therefrom. RPM is the best all-in-one estimate of player value (within a given role) in the public domain, and good on ESPN for bringing it to a wider audience. However, it’s not a perfect measure and no one number metric likely ever will be in a game as dynamic as basketball. It is a valuable tool to use, alongside paying close attention to games and the more standard box-score metrics.

* Multiple years are used in order to reduce the noise caused by the relatively small sample of 1 season’s worth of data.