The Case for Dennis Rodman, Part 2/4 (a)(ii)—Player Valuation and Unconventional Wisdom

In my last post in this series, I outlined and criticized the dominance of gross points (specifically, points per game) in the conventional wisdom about player value. Of course, serious observers have recognized this issue for ages, responding in a number of ways—the most widespread still being ad hoc (case by case) analysis. Not satisfied with this approach, many basketball statisticians have developed advanced “All in One” player valuation metrics that can be applied broadly.

In general, Dennis Rodman has not benefitted much from the wave of advanced “One Size Fits All” basketball statistics. Perhaps the most notorious example of this type of metric—easily the most widely disseminated advanced player valuation stat out there—is John Hollinger’s Player Efficiency Rating:

In addition to ranking Rodman as the 7th best player on the 1995-96 Bulls championship team, PER is weighted to make the league average exactly 15—meaning that, according to this stat, Rodman (career PER: 14.6) was actually a below average player. While Rodman does significantly better in a few predictive stats (such as David Berri’s Wages of Wins) that value offensive rebounding very highly, I think that, generally, those who subscribe to the Unconventional Wisdom typically accept one or both of the following: 1) that despite Rodman’s incredible rebounding prowess, he was still just a very good a role-player, and likely provided less utility than those who were more well-rounded, or 2) that, even if Rodman was valuable, a large part of his contribution must have come from qualities that are not typically measurable with available data, such as defensive ability.

My next two posts in this series will put the lie to both of those propositions. In section (b) of Part 2, I will demonstrate Rodman’s overall per-game contributions—not only their extent and where he fits in the NBA’s historical hierarchy, but exactly where they come from. Specifically, contrary to both conventional and unconventional wisdom, I will show that his value doesn’t stem from quasi-mystical unmeasurables, but from exactly where we would expect: extra possessions stemming from extra rebounds. In part 3, I will demonstrate (and put into perspective) the empirical value of those contributions to the bottom line: winning. These two posts are at the heart of The Case for Dennis Rodman, qua “case for Dennis Rodman.”

But first, in line with my broader agenda, I would like to examine where and why so many advanced statistics get this case wrong, particularly Hollinger’s Player Efficiency Rating. I will show how, rather than being a simple outlier, the Rodman data point is emblematic of major errors that are common in conventional unconventional sports analysis – both as a product of designs that disguise rather than replace the problems they were meant to address, and as a product of uncritically defending and promoting an approach that desperately needs reworking.

Player Efficiency Ratings

John Hollinger deserves much respect for bringing advanced basketball analysis to the masses. His Player Efficiency Ratings are available on ESPN.com under Hollinger Player Statistics, where he uses them as the basis for his Value Added (VA) and Expected Wins Added (EWA) stats, and regularly features them in his writing (such as in this article projecting the Miami Heat’s 2010-11 record), as do other ESPN analysts. Basketball Reference includes PER in their “Advanced” statistical tables (present on every player and team page), and also use it to compute player Value Above Average and Value Above Replacement (definitions here).

The formula for PER is extremely complicated, but its core idea is simple: combine everything in a player’s stat-line by rewarding everything good (points, rebounds, assists, blocks, and steals), and punishing everything bad (missed shots, turnovers). The value of particular items are weighted by various league averages—as well as by Hollinger’s intuitions—then the overall result is calculated on a per-minute basis, adjusted for league and team pace, and normalized on a scale averaging 15.

Undoubtedly, PER is deeply flawed. But sometimes apparent “flaws” aren’t really “flaws,” but merely design limitations. For example: PER doesn’t account for defense or “intangibles,” it is calculated without resort to play-by-play data that didn’t exist prior to the last few seasons, and it compares players equally, regardless of position or role. For the most part, I will refrain from criticizing these constraints, instead focusing on a few important ways that it fails or even undermines its own objectives.

Predictivity (and: Introducing Win Differential Analysis)

Though Hollinger uses PER in his “wins added” analysis, its complete lack of any empirical component suggests that it should not be taken seriously as a predictive measure. And indeed, empirical investigation reveals that it is simply not very good at predicting a player’s actual impact:

This bubble-graph is a product of a broader study I’ve been working on that correlates various player statistics to the difference in their team’s per-game performance with them in and out of the line-up. The study’s dataset includes all NBA games back to 1986, and this particular graph is based on the 1300ish seasons in which a player who averaged 20+ minutes per game both missed and played at least 20 games. Win% differential is the difference in the player’s team’s winning percentage with and without him (for the correlation, each data-point is weighted by the smaller of games missed or played. I will have much more to write about nitty-gritty of this technique in separate posts).

So PER appears to do poorly, but how does it compare to other valuation metrics?

SecFor (or “Secret Formula”) is the current iteration of an empirically-based “All in One” metric that I’m developing—but there is no shame in a speculative purely a priori metric losing (even badly) as a predictor to the empirical cutting-edge.

However, as I admitted in the introduction to this series, my statistical interest in Dennis Rodman goes way back. One of the first spreadsheets I ever created was in the early 1990’s, when Rodman still played for San Antonio. I knew Rodman was a sick rebounder, but rarely scored—so naturally I thought: “If only there were a formula that combined all of a player’s statistics into one number that would reflect his total contribution.” So I came up with this crude, speculative, purely a priori equation:

Points + Rebounds + 2*Assists + 1.5*Blocks + 2*Steals – 2*Turnovers.

Unfortunately, this metric (which I called “PRABS”) failed to shed much light on the Rodman problem, so I shelved it. PER shares the same intention and core technique, albeit with many additional layers of complexity. For all of this refinement, however, Hollinger has somehow managed to make a bad metric even worse, getting beaten by my OG PRABS by nearly as much as he is able to beat points per game—the Flat Earth of basketball valuation metrics. So how did this happen?

Minutes

The trend in much of basketball analysis is to rate players by their per-minute or per-possession contributions. This approach does produce interesting and useful information, and they may be especially useful to a coach who is deciding who to give more minutes to, or to a GM who is trying to evaluate which bench player to sign in free agency.

But a player’s contribution to winning is necessarily going to be a function of how much extra “win” he is able to get you per minute and the number of minutes you are able to get from him. Let’s turn again to win differential:

For this graph, I set up a regression using each of the major rate stats, plus minutes played (TS%=true shooting percentage, or one half of average points per shot, including free throws and 3 pointers). If you don’t know what a “normalized coefficient” is, just think of it as a stat for comparing the relative importance of regression elements that come in different shapes and sizes. The sample is the same as above: it only includes players who average 20+ minutes per game.

Unsurprisingly, “minutes per game” is more predictive than any individual rate statistic, including true shooting. Simply multiplying PER by minutes played significantly improves its predictive power, managing to pull it into a dead-heat with PRABS (which obviously wasn’t minute-adjusted to begin with).

I’m hesitant to be too critical of the “per minute” design decision, since it is clearly an intentional element that allows PER to be used for bench or rotational player valuation, but ultimately I think this comes down to telos: So long as PER pretends to be an arbiter of player value—which Hollinger himself relies on for making actual predictions about team performance—then minutes are simply too important to ignore. If you want a way to evaluate part-time players and how they might contribute IF they could take on larger roles, then it is easy enough to create a second metric tailored to that end.

Here’s a similar example from baseball that confounds me: Rate stats are fine for evaluating position players, because nearly all of them are able to get you an entire game if you want—but when it comes to pitching, how often someone can play and the number of innings they can give you is of paramount importance. E.g., at least for starting pitchers, it seems to me that ERA is backwards: rather than calculate runs allowed per inning, why don’t they focus on runs denied per game? Using a benchmark of 4.5, it would be extremely easy to calculate: Innings Pitched/2 – Earned Runs. So, if a pitcher gets you 7 innings and allows 2 runs, their “Earned Runs Denied” (ERD) for the game would be 1.5. I have no pretensions of being a sabermetrician, and I’m sure this kind of stat (and much better) is common in that community, but I see no reason why this kind of statistic isn’t mainstream.

More broadly, I think this minutes SNAFU is reflective of an otherwise reasonable trend in the sports analytical community—to evaluate everything in terms of rates and quality instead of quantity—that is often taken too far. In reality, both may be useful, and the optimal balance in a particular situation is an empirical question that deserves investigation in its own right.

PER Rewards Shooting (and Punishes Not Shooting)

As described by David Berri, PER is well-known to reward inefficient shooting:

“Hollinger argues that each two point field goal made is worth about 1.65 points. A three point field goal made is worth 2.65 points. A missed field goal, though, costs a team 0.72 points. Given these values, with a bit of math we can show that a player will break even on his two point field goal attempts if he hits on 30.4% of these shots. On three pointers the break-even point is 21.4%. If a player exceeds these thresholds, and virtually every NBA player does so with respect to two-point shots, the more he shoots the higher his value in PERs. So a player can be an inefficient scorer and simply inflate his value by taking a large number of shots.”

The consequences of this should be properly understood: Since this feature of PER applies to every shot taken, it is not only the inefficient players who inflate their stats. PER gives a boost to everyone for every shot: Bad players who take bad shots can look merely mediocre, mediocre players who take mediocre shots can look like good players, and good players who take good shots can look like stars. For Dennis Rodman’s case—as someone who took very few shots, good or bad— the necessary converse of this is even more significant: since PER is a comparative statistic (even directly adjusted by league averages), players who don’t take a lot of shots are punished.
Structurally, PER favors shooting—but to what extent? To get a sense of it, let’s plot PER against usage rate:

^{Note: Data includes all player seasons since 1986. Usage % is the percentage of team possessions that end with a shot, free throw attempt, or turnover by the player in question. For most practical purposes, it measures how frequently the player shoots the ball.}

That R-squared value corresponds to a correlation of .628, which might seem high for a component that should be in the denominator. Of course, correlations are tricky, and there are a number of reasons why this relationship could be so strong. For example, the most efficient shooters might take the most shots. Let’s see:

Actually, that trend-line doesn’t quite do it justice: that R-squared value corresponds to a correlation of .11 (even weaker than I would have guessed).

I should note one caveat: The mostly flat relationship between usage and shooting may be skewed, in part, by the fact that better shooters are often required to take worse shots, not just more shots—particularly if they are the shooter of last resort. A player that manages to make a mediocre shot out of a bad situation can increase his team’s chances of winning, just as a player that takes a marginally good shot when a slam dunk is available may be hurting his team’s chances. Presently, no well-known shooting metrics account for this (though I am working on it), but to be perfectly clear for the purposes of this post: neither does PER. The strong correlation between usage rate and PER is unrelated. There is nothing in its structure to suggest this is an intended factor, and there is nothing in its (poor) empirical performance that would suggest it is even unintentionally addressed. In other words, it doesn’t account for complex shooting dynamics either in theory or in practice.

Duplicability and Linearity

PER strongly rewards broad mediocrity, and thus punishes lack of the same. In reality, not every point that a player scores means their team will score one more point, just as not every rebound grabbed means that their team will get one more possession. Conversely—and especially pertinent to Dennis Rodman—not every point that a player doesn’t score actually costs his team a point. What a player gets credit for in his stat line doesn’t necessarily correspond with his actual contribution, because there is always a chance that the good things he played a part in would have happened anyway. This leads to a whole set of issues that I typically file under the term “duplicability.”

A related (but sometimes confused) effect that has been studied extensively by very good basketball analysts is the problem of “diminishing returns” – which can be easily illustrated like this: if you put a team together with 5 players that normally score 25 points each, it doesn’t mean that your team will suddenly start scoring 125 points a game. Conversely—and again pertinent to Rodman—say your team has 5 players that normally score 20 points each, and you replace one of them with somebody that normally only scores 10, that does not mean that your team will suddenly start scoring only 90. Only one player can take a shot at a time, and what matters is whether the player’s lack of scoring hurts his team’s offense or not. The extent of this effect can be measured individually for different basketball statistics, and, indeed, studies have showed wide disparities.

As I will discuss at length in Part 2(c), despite hardly ever scoring, differential stats show that Rodman didn’t hurt his teams offenses at all: even after accounting for extra possessions that Rodman’s teams gained from offensive rebounds, his effect on offensive efficiency was statistically insignificant. In this case (as with Randy Moss), we are fortunate that Rodman had such a tumultuous career: as a result, he missed a significant number of games in a season several times with several different teams—this makes for good indirect data. But, for this post’s purposes, the burning question is: Is there any direct way to tell how likely a player’s statistical contributions were to have actually converted into team results?

This is an extremely difficult and intricate problem (though I am working on it), but it is easy enough to prove at least one way that a metric like PER gets it wrong: it treats all of the different components of player contribution linearly. In other words, one more point is worth one more point, whether it is the 15th point that a player scores or the 25th, and one more rebound is worth one more rebound, whether it is the 8th or the 18th. While this equivalency makes designing an all-in one equation much easier (at least for now, my Secret Formula metric is also linear), it is ultimately just another empirically testable assumption.

I have theorized that one reason Rodman’s PER stats are so low compared to his differential stats is that PER punishes his lack of mediocre scoring, while failing to reward the extremeness of his rebounding. This is based on the hypothesis that certain extreme statistics would be less “duplicable” than mediocre ones. As a result, the difference between a player getting 18 rebounds per game vs. getting 16 per game could be much greater than the difference between them getting 8 vs. getting 6. Or, in other words, the marginal value of rebounds would (hypothetically) be increasing.

Using win percentage differentials, this is a testable theory. Just as we can correlate an individual player’s statistics to the win differentials of his team, we can also correlate hypothetical statistics the same way. So say we want to test a metric like rebounds, except one that has increasing marginal value built in: a simple way to approximate that effect is to make our metric increase exponentially, such as using rebounds squared. If we need even more increasing marginal value, we can try rebounds cubed, etc. And if our metric has several different components (like PER), we can do the same for the individual parts: the beauty is that, at the end of the day, we can test—empirically—which metrics work and which don’t.

For those who don’t immediately grasp the math involved, I’ll go into a little detail: A linear relationship is really just an exponential relationship with an exponent of 1. So let’s consider a toy metric, “PR,” which is calculated as follows: Points + Rebounds. This is a linear equation (exponent = 1) that could be rewritten as follows: (Points)^1 + (Rebounds)^1. However, if, as above, we thought that both points and rebounds should have increasing marginal values, we might want to try a metric (call it “PRsq”) that combined points and rebounds squared, as follows: (Points)^2 + (Rebounds)^2. And so on. Here’s an example table demonstrating the increase in marginal value:

The fact that each different metric leads to vastly different magnitudes of value is irrelevant: for predictive purposes, the total value for each component will be normalized — the relative value is what matters (just as “number of pennies” and “number of quarters” are equally predictive of how much money you have in your pocket). So applying this concept to an even wider range of exponents for several relevant individual player statistics, we can empirically examine just how “exponential” each statistic really is:

For this graph, I looked at each of the major rate metrics (plus points per game) individually. So, for each player-season in my (1986-) sample, I calculated the number of points, points squared, points^3rd. . . points^10th power, and then correlated all of these to that player’s win percentage differential. From those calculations, we can find roughly how much the marginal value for each metric increases, based on what exponent produces the best correlation: The smaller the number at the peak of the curve, the more linear the metric is—the higher the number, the more exponential (i.e., extreme values are that much more important). When I ran this computation, the relative shape of each curve fit my intuitions, but the magnitudes surprised me: That is, many of the metrics turned out to be even more exponential than I would have guessed.

As I know this may be confusing to many of my readers, I need to be absolutely clear: the shape of each curve has nothing to do with the actual importance of each metric. It only tells us how much that particular metric is sensitive to very large values. E.g., the fact that Blocks and Assists peak on the left and sharply decline doesn’t make them more or less important than any of the others, it simply means that having 1 block in your scoreline instead of 0 is relatively just as valuable as having 5 blocks instead of 4. On the other extreme, turnovers peak somewhere off the chart, suggesting that turnover rates matter most when they are extremely high.

For now, I’m not trying to draw a conclusive picture about exactly what exponents would make for an ideal all-in-one equation (polynomial regressions are very very tricky, though I may wade into those difficulties more in future blog posts). But as a minimum outcome, I think the data strongly supports my hypothesis: that many stats—especially rebounds—are exponential predictors. Thus, I mean this less as a criticism of PER than as an explanation of why it undervalues players like Dennis Rodman.

Gross, and Points

In subsection (i), I concluded that “gross points” as a metric for player valuation had two main flaws: gross, and points. Superficially, PER responds to both of these flaws directly: it attempts to correct the “gross” problem both by punishing bad shots, and by adjusting for pace and minutes. It attacks the “points” problem by adding rebounds, assists, blocks, steals, and turnovers. The problem is, these “solutions” don’t match up particularly well with the problems “gross” and “points” present.
The problem with the “grossness” of points certainly wasn’t minutes (note: for historical comparisons, pace adjustments are probably necessary, but the jury is still out on the wisdom of doing the same on a team-by-team basis within a season). The main problem with “gross” was shooting efficiency: If someone takes a bunch of shots, they will eventually score a lot of points. But scoring points is just another thing that players do that may or may not help their teams win. PER attempted to account for this by punishing missed shots, but didn’t go far enough. The original problem with “gross” persists: As discussed above, taking shots helps your rating, whether they are good shots or not.

As for “points”: in addition to any problems created by having arbitrary (non-empirical) and linear coefficients, the strong bias towards shooting causes PER to undermine its key innovation—the incorporation of non-point components. This “bias” can be represented visually:

^{Note: This data comes from a regression to PER including each of the rate stats corresponding to the various components of PER.}

This pie chart is based on a linear regression including rate stats for each of PER’s components. Strictly, what it tells us is the relative value of each factor to predicting PER if each of the other factors were known. Thus, the “usage” section of this pie represents the advantage gained by taking more shots—even if all your other rate stats were fixed. Or, in other words, pure bias (note that the number of shots a player takes is almost as predictive as his shooting ability).

For fun, let’s compare that pie to the exact same regression run on Points Per Game rather than PER:

^{Note: These would not be the best variables to select if you were actually trying to predict a player’s Points Per Game. Note also that “Usage” in these charts is NOT like “Other”—while other variables may affect PPG, and/or may affect the items in this regression, they are not represented in these charts.}

Interestingly, Points Per Game was already somewhat predictable by shooting ability, turnovers, defensive rebounding, and assists. While I hesitate to draw conclusions from the aesthetic comparison, we can guess why perhaps PER doesn’t beat PPG as significantly as we might expect: it appears to share much of the same DNA. (My more wild and ambitious thoughts suspect that these similarities reflect the strength of our broader pro-points bias: even when designing an All-in-One statistic, even Hollinger’s linear, non-empirical, a priori coefficients still mostly reflect the conventional wisdom about the importance of many of the factors, as reflected in the way that they relate directly to points per game).

I could make a similar pie-chart for Win% differential, but I think it might give the wrong impression: these aren’t even close to the best set of variables to use for that purpose. Suffice it to say that it would look very, very different (for an imperfect picture of how much so, you can compare to the values in the Relative Importance chart above).

Conclusions

The deeper irony with PER is not just that it could theoretically be better, but that it adds many levels of complexity to the problem it purports to address, ultimately failing in strikingly similar ways. It has been dressed up around the edges with various adjustments for team and league pace, incorporation of league averages to weight rebounds and value of possession, etc. This is, to coin a phrase, like putting lipstick on a pig. The energy that Hollinger has spent on dressing up his model could have been better spent rethinking the core of it.

In my estimation, this pattern persists among many extremely smart people who generate innovative models and ideas: once created, they spend most of their time—entire careers even—in order: 1) defending it, 2) applying it to new situations, and 3) tweaking it. This happens in just about every field: hard and soft sciences, economics, history, philosophy, even literature. Give me an academic who creates an interesting and meaningful model, and then immediately devotes their best efforts to tearing it apart! In all my education, I have had perhaps two professors who embraced this approach, and I would rank both among my very favorites.

This post and the last were admittedly relatively light on Rodman-specific analysis, but that will change with a vengeance in the next two. Stay tuned.

Update (5/13/11): Commenter “Yariv” correctly points out that an “exponential” curve is technically one in the form y^x (such as 2^x, 3^x, etc), where the increasing marginal value I’m referring to in the “Linearity” section above is about terms in the form x^y (e.g., x^2, x^3, etc), or monomial terms with an exponent not equal to 1. I apologize for any confusion, and I’ll rewrite the section when I have time.

The Case for Dennis Rodman, Part 2/4 (a)(i)—Player Valuation and Conventional Wisdom

Dennis Rodman is a – perhaps the – classic hard case for serious basketball valuation analysis. The more you study him, the more you are forced to engage in meta-analysis: that is, examining the advantages and limitations of the various tools in the collective analytical repertoire. Indeed, it’s even more than a hard case, it’s an extremely important one: it is just these conspicuously difficult situations where reliable analytical insight could be most useful, yet depending on which metric you choose, Rodman is either a below-average NBA player or one of the greatest of all time. Moreover, while Rodman may be an “extreme” of sorts, this isn’t Newtonian Physics: the problems with player valuation modeling that his case helps reveal – in both conventional and unconventional forms – apply very broadly.

This section will use Dennis Rodman as a case study for my broader critique of both conventional and unconventional player valuation methods. Sub-section (i) introduces my criticism and deals with conventional wisdom, and sub-section (ii) deals with unconventional wisdom and beyond. Section (b) will then examine how valuable Rodman was specifically, and why. Background here, here, here, here, and here.

First – A Quick Meta-Critique:

Why is it that so many sports-fans pooh-pooh advanced statistical analysis, yet, when making their own arguments, spout nothing but statistics?

[So-and-so] scored 25 points per game last season, solidifying their position in the NBA elite.
[Random QB] had ten 3000-yard passing seasons, he is sooo underrated.
[Player x]’s batting average is down 50 points, [team y] should trade him while they still can.

Indeed, the vast majority of people are virtually incapable of making sports arguments that aren’t stats-based in one way or another. Whether he knows it or not, Joe Average is constantly learning and refining his preferred models, which he then applies to various problems, for a variety of purposes — not entirely unlike Joe Academic. Yet chances are he remains skeptical of the crazy-talk he hears from the so-called “statistical experts” — and there is truth to this skepticism: a typical “fan” model is extremely flexible, takes many more variables from much more diverse data into account, and ultimately employs a very powerful neural network to arrive at its conclusions. Conversely, the “advanced” models are generally rigid, naïve, over-reaching, hubristic, prove much less than their creators believe, and claim even more. Models are to academics like screenplays are to Hollywood waiters: everyone has one, everyone thinks theirs is the best, and most of them are garbage. The broad reliability of “common sense” over time has earned it the benefit of the doubt, despite its high susceptibility to bias and its abundance of easily-provable errors.

The key is this: While finding and demonstrating such error is easy enough, successfully doing so should not – as it so often does – lead one (or even many) to presume that it qualifies them to replace that wisdom, in its entirety, with their own.

I believe something like this happened in the basketball analytic community: reacting to the manifest error in conventional player valuation, the statisticians have failed to recognize the main problem – one which I will show actually limits their usefulness – and instead have developed an “unconventional” wisdom that ultimately makes many of the same mistakes.

Conventional Wisdom – Points, Points, Points:

The standard line among sports writers and commentators today is that Dennis Rodman’s accomplishments “on the court” would easily be sufficient to land him in the Hall of Fame, but that his antics “off the court” may give the voters pause. This may itself be true, but it is only half the story: If, in addition to his other accomplishments, Rodman had scored 15 points a game, I don’t think we’d be having this discussion, or really even close to having this discussion (note, this would be true whether or not those 15 points actually helped his teams in any way). This is because the Hall of Fame reflects the long-standing conventional wisdom about player valuation: that points (especially per game) are the most important measure of a player’s (per game) contribution.
Whether most people would explicitly endorse this proposition or not, it is still reflected in systematic bias. The story goes something like this: People watch games to see the players do cool things, like throw a ball from a long distance through a tiny hoop, and experience pleasure when it happens. Thus, because pleasure is good, they begin to believe that those players must be the best players, which is then reinforced by media coverage that focuses on point totals, best dunks plays of the night, scoring streaks, scoring records, etc. This emphasis makes them think these must also be the most important players, and when they learn about statistics, that’s where they devote their attention. Everyone knows about Kobe’s 81 points in a game, but how many people know about Scott Skiles’s 30 assists? or Charles Oakley’s 35 rebounds? or Rodman’s 18 offensive boards? or Shaq’s 15 blocks? Many fans even know that Mark Price is the all-time leader in free throw percentage, or that Steve Kerr is the all-time leader in 3 point percentage, but most have never even heard of rebound percentage, much less assist percentage or block percentage. And, yes, for those who vote for the Hall of Fame, it is also reflected in their choices. Thus, before dealing with any fall-out for his off-court “antics,” the much bigger hurdle to Dennis Rodman’s induction looks like this:

This list is the bottom-10 per-game scorers (of players inducted within 25 years of their retirement). If Rodman were inducted, he would be the single lowest point-scorer in HoF history. And looking at the bigger picture, it may even be worse than that. Here’s a visual of all 89 Hall of Famers with stats (regardless of induction time), sorted from most points to fewest:

So not only would he be the lowest point scorer, he would actually have significantly fewer points than a (linear) trend-line would predict the lowest point scorer to have (and most of the smaller bars just to the left of Rodman were Veteran’s Committee selections). Thus, if historical trends reflect the current mood of the HoF electorate, resistance is to be expected.

The flip-side, of course, is the following:

Note: this graphic only contains the players for whom this stat is available, though, as I demonstrated previously, there is no reason to believe that earlier players were any better.
Clearly, my first thought when looking at this data was, “Who the hell is this guy with a TRB% of only 3.4?” That’s only 1 out of every *30* rebounds!^* The league average is (obviously) 1 out of 10. Muggsy Bogues — the shortest player in the history of the NBA (5’3”) — managed to pull in 5.1%, about 1 out of every 20. On the other side, of course, Rodman would pace the field by a wide margin – wider, even, than the gap between Jordan/Chamberlain and the field for scoring (above). Of course, the Hall of Fame traditionally doesn’t care that much about rebounding percentages:

So, of eligible players, 24 of the top 25 leaders in points per game are presently in the Hall (including the top 19 overall), while only 9 of the top 25 leaders in total rebound percentage can say the same. This would be perfectly rational if, say, PPG was way way more important to winning than TRB%. But this seems unlikely to me, for at least two reasons: 1) As a rate stat, TRB% shouldn’t be affected significantly by game or team pace, as PPG is; and 2) TRB% has consequences on both offense and defense, whereas PPG is silent about the number of points the player/team has given up. To examine this question, I set up a basic correlation of team stats to team winning percentage for the set of every team season since the introduction of the 3-point shot. Lo and behold, it’s not really close:

Yes, correlation does not equal causation, and team scoring and rebounding are not the same as individual scoring and rebounding. This test isn’t meant to prove conclusively that rebounding is more important than scoring, or even gross scoring — though, at the very least, I do think it strongly undermines the necessity of the opposite: that is, the assumption that excellence in gross point-scoring is indisputably more significant than other statistical accomplishments.
Though I don’t presently have the data to confirm, I would hypothesize (or, less charitably, guess) that individual TRB% probably has a more causative effect on team TRB% than individual PPG does on team PPG [see addendum] (note, to avoid any possible misunderstanding, I mean this only w/r/t PPG, not points-per-possession, or anything having to do with shooting percentages, true or otherwise). Even with the proper data, this could be a fairly difficult hypothesis to test, since it can be hard to tell (directly) whether a player scoring a lot of points causes his team to score a lot of points, or vice versa. However, that hypothesis seems to be at least partially supported by studies that others have conducted on rebound rates – especially on the offensive side (where Rodman obviously excelled).

The conventional wisdom regarding the importance of gross points is demonstrably flawed on at least two counts: gross, and points. In sub-section (ii), I will look at how the analytical community attempted to deal with these problems, as well as at how they repeated them.
^*(It’s Tiny Archibald)

Addendum (4/20/11):

I posted this as a Graph of the Day a while back, and thought I should add it here:

More info in the original post, but the upshot is that my hypothesis that “individual TRB% probably has a more causative effect on team TRB% than individual PPG does on team PPG” appears to be confirmed (the key word is “differential”).

Graph of the Day: Rodman, Visualized—An Outlier in Motion

(Just press play.)

The Case for Dennis Rodman, Part 1/4 (c)—Rodman v. Ancient History

One of the great false myths in basketball lore is that Wilt Chamberlain and Bill Russell were Rebounding Gods who will never be equaled, and that dominant rebounders like Dennis Rodman should count their blessings that they got to play in a era without those two deities on the court. This myth is so pervasive that it is almost universally referenced as a devastating caveat whenever sports commentators and columnists discuss Rodman’s rebounding prowess. In this section, I will attempt to put that caveat to death forever.

The less informed version of the “Chamberlain/Russell Caveat” (CRC for short) typically goes something like this: “Rodman led the league in rebounding 7 times, making him the greatest re bounder of his era, even though his numbers come nowhere near those of Chamberlain and Russell.” It is true that, barring some dramatic change in the way the game is played, Chamberlain’s record of 27.2 rebounds per game, set in the 1960-61 season, will stand forever. This is because, due to the fast pace and terrible shooting, the typical game in 1960-61 featured an average of 147 rebounding opportunities. During Rodman’s 7-year reign as NBA rebounding champion (from 1991-92 through 1997-98), the typical game featured just 84 rebounding opportunities. Without further inquiry, this difference alone means that Chamberlain’s record 27.2 rpg would roughly translate to 15.4 in Rodman’s era – over a full rebound less than Rodman’s ~16.7 rpg average over that span.

The slightly more informed (though equally wrong) version of the CRC is a plea of ignorance, like so: “Rodman has the top 7 rebounding percentages since the NBA started to keep the necessary statistics in 1970. Unfortunately, there is no game-by-game or individual opponent data prior to this, so it is impossible to tell whether Rodman was as good as Russell or Chamberlain” (this point also comes in many degrees of snarky, like, “I’ll bet Bill and Wilt would have something to say about that!!!”). We may not have the necessary data to calculate Russell and Chamberlain’s rebounding rates, either directly or indirectly. But, as I will demonstrate, there are quite simple and extremely accurate ways to estimate these figures within very tight ranges (which happen to come nowhere close to Dennis Rodman).

Before getting into rebounding percentages, however, let’s start with another way of comparing overall rebounding performance: Team Rebound Shares. Simply put, this metric is the percentage of team rebounds that were gotten by the player in question. This can be done for whole seasons, or it can be approximated over smaller periods, such as per-game or per-minute, even if you don’t have game-by-game data. For example, to roughly calculate the stat on a per-game basis, you can simply take a player’s total share of rebounds (their total rebounds/team’s total rebounds), and divide by the percentage of games they played (player gms/team gms). I’ve done this for all of Rodman, Russell and Chamberlain’s seasons, and organized the results as follows:

As we can see, Rodman does reasonably well in this metric, still holding the top 4 seasons and having a better average through 7. This itself is impressive, considering Rodman averaged about 35 minutes per game and Wilt frequently averaged close to 48.

I should note, in Chamberlain’s favor, that one of the problems I have with PER and its relatives is that they don’t give enough credit for being able to contribute extra minutes, as Wilt obviously could. However, since here I’m interested more in each player’s rebounding ability than in their overall value, I will use the same equation as above (plus dividing by 5, corresponding to the maximum minutes for each player) to break the team rebounding shares down by minute:

This is obviously where Rodman separates himself from the field, even pulling in >50% of his team’s rebounds in 3 different seasons. Of course, this only tells us what it tells us, and we’re looking for something else: Total Rebounding percentage. Thus, the question naturally arises: how predictive of TRB% are “minute-based team rebound shares”?

In order to answer this question, I created a slightly larger data-set, by compiling relevant full-season statistics from the careers of Dennis Rodman, Dwight Howard, Tim Duncan, David Robinson, and Hakeem Olajuwon (60 seasons overall). I picked these names to represent top-level rebounders in a variety of different situations (and though these are somewhat arbitrary, this analysis doesn’t require a large sample). I then calculated TRS by minute for each season and divided by 2 — roughly corresponding to the player’s share against 10 players instead of 5. Thus, all combined, my predictive variable is determined as follows:

$PV=\frac{Player Rebounds/Team Rebounds}{Player Minutes/Team Minutes}/10$

Note that this formula may have flaws as an independent metric, but if it’s predictive enough of the metric we really care about — Total Rebound % — those no longer matter. To that end, I ran a linear regression in Excel comparing this new variable to the actual values for TRB%, with the following output:

If you don’t know how to read this, don’t sweat it. The “R Square” of .98 pretty much means that our variable is almost perfectly predictive of TRB%. The two numbers under “Coefficients” tell us the formula we should use to make predictions based on our variable:

$Predicted TRB\% = 1.08983*PV - .01154$

Putting the two equations together, we have a model that predicts a player’s rebound percentage based on 4 inputs:

$TRB\% = 1.08983 * \frac{Player Rebounds/Team Rebounds}{Player Minutes/Team Minutes} /10 - .0115$

Now again, if you’re familiar with regression output, you can probably already see that this model is extremely accurate. But to demonstrate that fact, I’ve created two graphs that compare the predicted values with actual values, first for Dennis Rodman alone:

And then for the full sample:

So, the model seems solid. The next step is obviously to calculate the predicted total rebound percentages for each of Wilt Chamberlain and Bill Russell’s seasons. After this, I selected the top 7 seasons for each of the three players and put them on one graph (Chamberlain and Russell’s estimates vs. Rodman’s actuals):

It’s not even close. It’s so not close, in fact, that our model could be way off and it still wouldn’t be close. For the next two graphs, I’ve added error bars to the estimation lines that are equal to the single worst prediction from our entire sample (which was a 1.21% error, or 6.4% of the underlying number): [I should add a technical note, that the actual expected error should be slightly higher when applied to “outside” situations, since the coefficients for this model were “extracted” from the same data that I tested the model on. Fortunately, that degree of precision is not necessary for our purposes here.] First Rodman vs. Chamberlain:

Then Rodman vs. Russell:

In other words, if the model were as inaccurate in Russell and Chamberlain’s favor as it was for the worst data point in our data set, they would still be crushed. In fact, over these top 7 seasons, Rodman beats R&C by an average of 7.2%, so if the model understated their actual TRB% every season by 5 times as much as the largest single-season understatement in our sample, Rodman would still be ahead [edit: I’ve just noticed that Pro Basketball Reference has a TRB% listed for each of Chamberlain’s last 3 seasons. FWIW, this model under-predicts one by about 1%, over-predicts one by about 1%, and gets the third almost on the money (off by .1%)].

To stick one last dagger in CRC’s heart, I should note that this model predicts that Chamberlain’s best TRB% season would have been around 20.16%, which would rank 67th on the all-time list. Russell’s best of 20.08 would rank 72nd. Arbitrarily giving them 2% for the benefit of the doubt, their best seasons would still rank 22nd and 24th respectively.

The Case for Dennis Rodman, Part 1/4 (b)—Defying the Laws of Nature

In this post I will be continuing my analysis of just how dominant Dennis Rodman’s rebounding was. Subsequently, section (c) will cover my analysis of Wilt Chamberlain and Bill Russell, and Part 2 of the series will begin the process of evaluating Rodman’s worth overall.

For today’s analysis, I will be examining a particularly remarkable aspect of Rodman’s rebounding: his ability to dominate the boards on both ends of the court. I believe this at least partially gets at a common anti-Rodman argument: that his rebounding statistics should be discounted because he concentrated on rebounding to the exclusion of all else. This position was publicly articulated by Charles Barkley back when they were both still playing, with Charles claiming that he could also get 18+ rebounds every night if he wanted to. Now that may be true, and it’s possible that Rodman would have been an even better player if he had been more well-rounded, but one thing I am fairly certain of is that Barkley could not have gotten as many rebounds as Rodman the same way that Rodman did.

The key point here is that, normally, you can be a great offensive rebounder, or you can be a great defensive rebounder, but it’s very hard to be both. Unless you’re Dennis Rodman:

To prepare the data for this graph, I took the top 1000 rebounding seasons by total rebounding percentage (the gold-standard of rebounding statistics, as discussed in section (a)), and ranked them 1-1000 for both offensive (ORB%) and defensive (DRB%) rates. I then scored each season by the higher (larger number) ranking of the two. E.g., if a particular season scored a 25, that would mean that it ranks in the top 25 all-time for offensive rebounding percentage and in the top 25 all-time for defensive rebounding percentage (I should note that many players who didn’t make the top 1000 seasons overall would still make the top 1000 for one of the two components, so to be specific, these are the top 1000 ORB% and DRB% seasons of the top 1000 TRB% seasons).

This score doesn’t necessarily tell us who the best rebounder was, or even who was the most balanced, but it should tell us who was the strongest in the weakest half of their game (just as you might rank the off-hand of boxers or arm wrestlers). Fortunately, however, Rodman doesn’t leave much room for doubt: his 1994-1995 season is #1 all-time on both sides. He has 5 seasons that are dual top-15, while no other NBA player has even a single season that ranks dual top-30. The graph thus shows how far down you have to go to find any player with n number of seasons at or below that ranking: Rodman has 6 seasons register on the (jokingly titled) “Ambicourtedness” scale before any other player has 1, and 8 seasons before any player has 2 (for the record, Charles Barkley’s best rating is 215).

This outcome is fairly impressive alone, and it tells us that Rodman was amazingly good at both ORB and DRB – and that this is rare — but it doesn’t tell us anything about the relationship between the two. For example, if Rodman just got twice as many rebounds as any normal player, we would expect him to lead lists like this regardless of how he did it. Thus, if you believe the hypothesis that Rodman could have dramatically increased his rebounding performance just by focusing intently on rebounds, this result might not be unexpected to you.

The problem, though, is that there are both competitive and physical limitations to how much someone can really excel at both simultaneously. Not the least of which is that offensive and defensive rebounds literally take place on opposite sides of the floor, and not everyone gets up and set for every possession. Thus, if someone wanted to cheat toward getting more rebounds on the offensive end, it would likely come, at least in some small part, at the expense of rebounds on the defensive end. Similarly, if someone’s playing style favors one, it probably (at least slightly), disfavors the other. Whether or not that particular factor is in play, at the very least you should expect a fairly strong regression to the mean: thus, if a player is excellent at one or the other, you should expect them to be not as good at the other, just as a result of the two not being perfectly correlated. To examine this empirically, I’ve put all 1000 top TRB% seasons on a scatterplot comparing offensive and defensive rebound rates:

Clearly there is a small negative correlation, as evidenced by the negative coefficient in the regression line. Note that technically, this shouldn’t be a linear relationship overall – if we graphed every pair in history from 0,0 to D,R, my graph’s trendline would be parallel to the tangent of that curve as it approaches Dennis Rodman. But what’s even more stunning is the following:

Rodman is in fact not only an outlier, he is such a ridiculously absurd alien-invader outlier that when you take him out of the equation, the equation changes drastically: The negative slope of the regression line nearly doubles in Rodman’s absence. In case you’ve forgotten, let me remind you that Rodman only accounts for 12 data points in this 1000 point sample: If that doesn’t make your jaw drop, I don’t know what will! For whatever reason, Rodman seems to be supernaturally impervious to the trade-off between offensive and defensive rebounding. Indeed, if we look at the same graph with only Rodman’s data points, we see that, for him, there is actually an extremely steep, upward sloping relationship between the two variables:

In layman’s terms, what this means is that Rodman comes in varieties of Good, Better, and Best — which is how we would expect this type of chart to look if there were no trade-off at all. Yet clearly the chart above proves that such a tradeoff exists! Dennis Rodman almost literally defies the laws of nature (or at least the laws of probability).

The ultimate point contra Barkley, et al, is that if Rodman “cheated” toward getting more rebounds all the time, we might expect that his chart would be higher than everyone else’s, but we wouldn’t have any particular reason to expect it to slope in the opposite direction. Now, this is slightly more plausible if he was “cheating” on the offensive side on the floor while maintaining a more balanced game on the defensive side, and there are any number of other logical speculations to be made about how he did it. But to some extent this transcends the normal “shift in degree” v. “shift in kind” paradigm: what we have here is a major shift in degree of a shift in kind, and we don’t have to understand it perfectly to know that it is otherworldly. At the very least, I feel confident in saying that if Charles Barkley or anyone else really believes they could replicate Rodman’s results simply by changing their playing styles, they are extremely naive.

Addendum (4/20/11):

Commenter AudacityOfHoops asks:

I don’t know if this is covered in later post (working my way through the series – excellent so far), or whether you’ll even find the comment since it’s 8 months late, but … did you create that same last chart, but for other players? Intuitively, it seems like individual players could each come in Good/Better/Best models, with positive slopes, but that when combined together the whole data set could have a negative slope.

I actually addressed this in an update post (not in the Rodman series) a while back:

A friend privately asked me what other NBA stars’ Offensive v. Defensive rebound % graphs looked like, suggesting that, while there may be a tradeoff overall, that doesn’t necessarily mean that the particular lack of tradeoff that Rodman shows is rare. This is a very good question, so I looked at similar graphs for virtually every player who had 5 or more seasons in the “Ambicourtedness Top 1000.” There are other players who have positively sloping trend-lines, though none that come close to Rodman’s. I put together a quick graph to compare Rodman to a number of other big name players who were either great rebounders (e.g., Moses Malone), perceived-great rebounders (e.g., Karl Malone, Dwight Howard), or Charles Barkley:

By my accounting, Moses Malone is almost certainly the 2nd-best rebounder of all time, and he does show a healthy dose of “ambicourtedness.” Yet note that the slope of his trendline is .717, meaning the difference between him and Rodman’s 2.346 is almost exactly twice the difference between him and the -.102 league average (1.629 v .819).

The Case for Dennis Rodman, Part 1/4 (a)—Rodman v. Jordan

For reasons which should become obvious shortly, I’ve split Part 1 of this series into sub-parts. This section will focus on rating Rodman’s accomplishments as a rebounder (in painstaking detail), while the next section(s) will deal with the counterarguments I mentioned in my original outline.

For the uninitiated, the main stat I will be using for this analysis is “rebound rate,” or “rebound percentage,” which represents the percentage of available rebounds that the player grabbed while he was on the floor. Obviously, because there are 10 players on the floor for any given rebound, the league average is 10%. The defensive team typically grabs 70-75% of rebounds overall, meaning the average rates for offensive and defensive rebounds are approximately 5% and 15% respectively. This stat is a much better indicator of rebounding skill than rebounds per game, which is highly sensitive to factors like minutes played, possessions per game, and team shooting and shooting defense. Unlike many other “advanced” stats out there, it also makes perfect sense intuitively (indeed, I think the only thing stopping it from going completely mainstream is that the presently available data can technically only provide highly accurate “estimates” for this stat. When historical play-by-play data becomes more widespread, I predict this will become a much more popular metric).

Dennis Rodman has dominated this stat like few players have dominated any stat. For overall rebound % by season, not only does he hold the career record, he led the league 8 times, and holds the top 7 spots on the all-time list (red bars are Rodman):

^{Note this chart only goes back as far as the NBA/ABA merger in 1976, but going back further makes no difference for the purposes of this argument. As I will explain in my discussion of the “Wilt Chamberlain and Bill Russell Were Rebounding Gods” myth, the rebounding rates for the best rebounders tend to get worse as you go back in time, especially before Moses Malone.}
As visually impressive as that chart may seem, it is only the beginning of the story. Obviously we can see that the Rodman-era tower is the tallest in the skyline, but our frame of reference is still arbitrary: e.g., if the bottom of the chart started at 19 instead of 15, his numbers would look even more impressive. So one thing we can do to eliminate bias is put the average in the middle, and count percentage points above or below, like so:

With this we get a better visual sense of the relative greatness of each season. But we’re still left with percentage points as our unit of measurement, which is also arbitrary: e.g., how much better is “6%” better? To answer this question, in addition to the average, we need to calculate the standard deviation of the sample (if you’re normally not comfortable working with standard deviations, just think of them as standardized units of measurement that can be used to compare stats of different types, such as shooting percentages against points per game). Then we re-do the graph using standard deviations above or below the mean, like so:

^{Note this graph is actually exactly the same shape as the one above, it’s just compressed to fit on a scale from –3 to +8 for easy comparison with subsequent graphs. The SD for this graph is 2.35%.}
There is one further, major, problem with our graph: As strange as it may sound, Dennis Rodman’s own stats are skewing the data in a way that biases the comparison against him. Specifically, with the mean and standard deviation set where they are, Rodman is being compared to himself as well as to others. E.g., notice that most of the blue bars in the graph are below the average line: this is because the average includes Rodman. For most purposes, this bias doesn’t matter much, but Rodman is so dominant that he raises the league average by over a percent, and he is such an outlier that he alone nearly doubles the standard deviation. Thus, for the remaining graphs targeting individual players, I’ve calculated the average and standard deviations for the samples from the other players only:

^{Note that a negative number in this graph is not exactly a bad thing: that person still led the league in rebounding % that year. The SD for this graph is 1.22%.}
But not all rebounding is created equal: Despite the fact that they get lumped together in both conventional rebounding averages and in player efficiency ratings, offensive rebounding is worth considerably more than defensive rebounding. From a team perspective, there is not much difference (although not necessarily *no* difference – I suspect, though I haven’t yet proved, that possessions beginning with offensive rebounds have higher expected values than those beginning with defensive rebounds), but from an individual perspective, the difference is huge. This is because of what I call “duplicability”: simply put, if you failed to get a defensive rebound, there’s a good chance that your team would have gotten it anyway. Conversely, if you failed to get an offensive rebound, the chances of your team having gotten it anyway are fairly small. This effect can be very crudely approximated by taking the league averages for offensive and defensive rebounding, multiplying by .8, and subtracting from 1. The .8 comes from there being 4 other players on your team, and the subtraction from 1 gives you the value added for each rebound: The league averages are typically around 25% and 75%, so, very crudely, you should expect your team to get around 20% of the offensive and 60% of the defensive rebounds that you don’t. Thus, each offensive rebound is adding about .8 rebounds to your team’s total, and each defensive rebound is adding about .4. There are various factors that can affect the exact values one way or the other, but on balance I think it is fair to assume that offensive rebounds are about twice as valuable overall.

To that end, I calculated an adjusted rebounding % for every player since 1976 using the formula (2ORB% + DRB%)/3, and then ran it through all of the same steps as above:

Mindblowing, really. But before putting this graph in context, a quick mathematical aside: If these outcomes were normally distributed, a 6 standard deviation event like Rodman’s 1994-1995 season would theoretically happen only about once every billion seasons. But because each data point on this chart actually represents a maximum of a large sample of (mostly) normally distributed seasonal rebounding rates, they should instead be governed by the Gumbel distribution for extreme values: this leads to a much more manageable expected frequency of approximately once every 400 years (of course, that pertains to the odds of someone like Rodman coming along in the first place; now that we’ve had Rodman, the odds of another one showing up are substantially higher). In reality, there are so many variables at play from era to era, season to season, or even team to team, that a probability model probably doesn’t tell us as much as we would like (also, though standard deviations converge fairly quickly, the sample size is relatively modest).

Rather than asking how abstractly probable or improbable Rodman’s accomplishments were, it may be easier to get a sense of his rebounding skill by comparing this result to results of the same process for other statistics. To start with, note that weighting the offensive rebounding more heavily cuts both ways for Rodman: after the adjustment, he only holds the top 6 spots in NBA history, rather than the top 7. On the other hand, he led the league in this category 10 times instead of 8, which is perfect for comparing him to another NBA player who led a major statistical category 10 times — Michael Jordan:

^{Red bars are Jordan. Mean and standard deviation are calculated from 1976, excluding MJ, as with Rodman above.}

As you can see, the data suggests that Rodman was a better rebounder than Jordan was a scorer. Of course, points per game isn’t a rate stat, and probably isn’t as reliable as rebounding %, but that cuts in Rodman’s favor. Points per game should be more susceptible to varying circumstances that lead to extreme values. Compare, say, to a much more stable stat, Hollinger’s player efficiency rating:

Actually, it is hard to find any significant stat where someone has dominated as thoroughly as Rodman. One of the closest I could find is John Stockton and the extremely obscure “Assist %” stat:

^{Red bars are Stockton, mean and SD are calculated from the rest.}

Stockton amazingly led the league in this category 15 times, though he didn’t dominate individual seasons to the extent that Rodman did. This stat is also somewhat difficult to “detangle” (another term/concept I will use frequently on this blog), since assists always involve more than one player. Regardless, though, this graph is the main reason John Stockton is (rightfully) in the Hall of Fame today. Hmm…

The Case for Dennis Rodman, Part 0/4—Outline

[Note: Forgive the anachronisms, but since this page is still the landing-spot for a lot of new readers, I’ve added some links to the subsequent articles into this post. There is also a much more comprehensive outline of the series, complete with a table of relevant points and a selection of charts and graphs available in The Case for Dennis Rodman: Guide.]

If you’ve ever talked to me about sports, you probably know that one of my pet issues (or “causes” as my wife calls them), is proving the greatness of Dennis Rodman. I admit that since I first saw Rodman play — and compete, and rebound, and win championships — I have been fascinated. Until recently, however, I thought of him as the ultimate outlier: someone who seemed to have unprecedented abilities in some areas, and unprecedented lack of interest in others. He won, for sure, but he also played for the best teams in the league. His game was so unique — yet so enigmatic — that despite the general feeling that there was something remarkable going on there, opinions about his ultimate worth as a basketball player varied immensely — as they continue to today. In this four-part series, I will attempt to end the argument.

While there may be room for reasonable disagreement about his character, his sportsmanship, or how and whether to honor his accomplishments, my research and analysis has led me to believe — beyond a reasonable doubt — that Rodman is one of the most undervalued players in NBA history. From an analytical perspective, leaving him off of the Hall of Fame nominee list this past year was truly a crime against reason. But what makes this issue particularly interesting to me is that it cuts “across party lines”: the conventional wisdom and the unconventional wisdom both get it very wrong. Thus, by examining the case of Dennis Rodman, not only will I attempt to solve a long-standing sports mystery, but I will attempt to illustrate a few flaws with the modern basketball-analytics movement.

In this post I will outline the major prongs of my argument. But first, I would like to list the frequently-heard arguments I will *not* be addressing:

“Rodman won 5 NBA titles! Anyone who is a starter on 5 NBA champions deserves to be in the Hall of Fame!” [As an intrinsic matter, I really don’t care that he won 5 NBA championships, except inasmuch as I’d like to know how much he actually contributed. I.e., is he more like Robert Horry, or more like Tim Duncan?]
“Rodman led the league in rebounding *7 times*: Anyone who leads the league in a major statistical category that many times deserves to be in the Hall of Fame!” [This is completely arbitrary. Rodman’s rebounding prowess is indeed an important factor in this inquiry, but “leading the league” in some statistical category has no intrinsic value, except inasmuch as it actually contributed to winning games.]
“Rodman was a great defender! He could effectively defend Michael Jordan and Shaquille O’Neal in their primes! Who else could do that?” [Actually, I love this argument as a rhetorical matter, but unfortunately I think defensive skill is still too subjective to be quantified directly. Of course all of his skills — or lack thereof — are relevant to the bottom line.]
“Rodman was such an amazing rebounder, despite being only 6 foot 7!” [Who cares how tall he was, seriously?]

Rather, in the subsequent parts in this series, these are the arguments I will be making:

Rodman was a better rebounder than you think: Rodman’s ability as a rebounder is substantially underrated. Rodman was a freak, and is unquestionably — by a wide margin — the greatest rebounder in NBA history. In this section I will use a number of statistical metrics to demonstrate this point (preview factoid: Kevin Garnett’s career rebounding percentage is lower than Dennis Rodman’s career *offensive* rebounding percentage). I will also specifically rebut two common counterarguments: 1) that Rodman “hung out around the basket”, and only got so many rebounds because he focused on it exclusively [he didn’t], and 2) that Bill Russell and Wilt Chamberlain were better rebounders [they weren’t].
Rodman’s rebounding was more valuable than you think: The value of Rodman’s rebounding ability is substantially underrated. Even/especially by modern efficiency metrics that do not accurately reward the marginal value of extra rebounds. Conversely, his lack of scoring ability is vastly overrated, even/especially by modern efficiency metrics that inaccurately punish the marginal value of not scoring.
Rodman was a bigger winner than you think: By examining Rodman’s +/- with respect to wins and losses — i.e., comparing his teams winning percentages with him in the lineup vs. without him in the lineup — I will show that the outcomes suggest he had elite-level value. Contrary to common misunderstanding, this actually becomes *more* impressive after adjusting for the fact that he played on very good teams to begin with.
Rodman belongs in the Hall of Fame [or not]: [Note this section didn’t go off as planned. Rodman was actually selected for the HoF before I finished the series, so section 4 is devoted to slightly more speculative arguments about Rodman’s true value.] Having wrapped up the main quantitative prongs, I will proceed to audit the various arguments for and against Rodman’s induction into the Hall of Fame. I believe that both sides of the debate are rationalizable — i.e., there exist reasonable sets of preferences that would justify either outcome. Ultimately, however, I will argue that the most common articulated preferences, when combined with a proper understanding of the available empirical evidence, should compel one to support Rodman‘s induction. To be fair, I will also examine which sets of preferences could rationally compel you to the opposite conclusion.

Stay tuned….