While I’m admittedly a little sad that this blog won’t be coming back any time soon, this should obviously be great news for people who enjoy my work: Backed by ESPN/FiveThirtyEight data and resources, it will be better and there will be more of it. My responsibilities at FiveThirtyEight will be similar to what I’d been doing here already: conducting original research, writing articles, and blogging. Except full time. And paid.

(Yeah, it’s basically my dream job.)

Of course, for many of you reading this, this is probably your first time visiting this site. In which case: welcome! For a primer on who the hell I am, you might want to read the “about Ben” and “about this blog” pages, or you can skip those and just read some of my articles. My best known work is undoubtedly The Case For Dennis Rodman, which is incredibly long—

—but has a guide, which can be found here. And in case you’ve heard rumors, yes, it speculates that Rodman—in a very specific way—may have been more valuable than Michael Jordan.

However, if I had to pick just a handful of articles to best represent my ideas and interests, it might look something like this:

Quantum Randy Moss—An Introduction to Entanglement

The Aesthetic Case Against 18 Games

The Case for Dennis Rodman, Part 4/4(a): All-Hall?

Bayes’ Theorem, Small Samples, and WTF is Up With NBA Finals Markets?

A Defense of Sudden-Death Playoffs in Baseball

Last week I came across this ESPN article (citing this Forbes article) about how Bill Belichick is the highest-paid coach in American sports:

Bill Belichick tops the list for the second year in a row following the retirement of Phil Jackson, the only coach to have ever made an eight-figure salary. Belichick is believed to make $7.5 million per year. Doc Rivers is the highest-paid NBA coach at $7 million.

Congrats to Belichick for a worthy accomplishment! Though I still think it probably under-states his actual value, at least relative to NFL players. As I tweeted:

Alternate headline: Bill Belichick Still Woefully Underpaid m.espn.go.com/general/blogs/…

— Benjamin Morris (@skepticalsports) May 23, 2012

Of course, coaches’ salaries are different from players': they aren’t constrained by the salary cap, nor are they boosted by the mandatory revenue-sharing in the players’ collective bargaining agreement. Yet, for comparison, this season Belichick will make a bit more than a third of what Peyton Manning will in Denver. As I’ve said before, I think Belichick and Manning have been (almost indisputably) the most powerful forces in the modern NFL (maybe ever). Here’s the key visual from my earlier post, updated to include last season (press play):

^{The x axis is wins in season n, y axis is wins in season n+1.}

Naturally, Belichick has benefited from having Tom Brady on his team. However, Brady makes about twice as much as Belichick does, and I think you would be hard-pressed to argue that he’s *twice* as valuable—and I think top QB’s are probably * underpaid* relative to their value anyway.

But being high on Bill Belichick is about more than just his results. He is well-loved in the analytical community, particularly for some of his high-profile 4th down and other in-game tactical decisions. But I think those flashy calls are merely a symptom of his broader commitment to making intelligent win-maximizing decisions—a commitment that is probably even more evident in the decisions he has made and strategies he has pursued in his role as the Patriots’ General Manager.

But rather than sorting through everything Belichick has done that I like, I want to take a quick look at one recent adjustment that really impressed me: the Patriots out-of-character machinations in the 2012 draft.

One of the unheralded elements to the Patriots’ success—perhaps rivaling Tom Brady himself in actual importance—is their penchant for stock-piling draft-picks in the “sweet spot” of the NFL draft (late 1st to mid-2nd round), where picks have the most surplus value. Once again, here’s the killer graph from the famous Massey-Thaler study on the topic:

In the 11 drafts since Belichick took over, the Patriots have made 17 picks between numbers 20 and 50 overall, the most in the NFL (the next-most is SF with 15, league average is obv 11). To illustrate how unusual their draft strategy has been, here’s a plot of their 2nd round draft position vs. their total wins over the same period:

Despite New England having the highest win percentage (not to mention most Super Bowl wins and appearances) over the period, there are 15 teams with lower average draft positions in the 2nd round. For comparison, they have the 2nd lowest average draft position in the 1st round and 7th lowest in the third.

Of course, the new collective bargaining agreement includes a rookie salary scale. Without going into all the details (in part because they’re extremely complicated and not entirely public), the key points are that it keeps total rookie compensation relatively stable while flattening the scale at the top, reducing guaranteed money, and shortening the maximum number of years for each deal.

These changes should all theoretically flatten out the “value curve” above. Here’s a rough sketch of what the changes seem to be attempting:

Since the original study was published, the dollar values have gone up and the top end has gotten more skewed. I adjusted the Y-axis to reflect the new top, but didn’t adjust the curve itself, so it should actually be somewhat steeper than it appears. I tried to make the new curves as conceptually accurate as I could, but they’re not empirical and should be considered more of an “artist’s rendition” of what I think the NFL is aiming for.

With a couple of years of data, this should be a very interesting issue to revisit. But, for now, I think it’s unlikely that the curve will actually be flattened very much. If I had to guess, I think it may end up “dual-peaked”: By far the greatest drop in guaranteed money will be for top QB prospects taken with the first few picks. These players already provide the most value, and are the main reason the original M/T performance graph inclines so steeply on the left. Additionally, they provide an opportunity for *continued surplus value* beyond the length of the initial contract. This should make the top of the draft extremely attractive, at least in years with top QB prospects.

On the other hand, I think the bulk of the effect on the rest of the surplus-value curve will be to shift it to the left. My reasons for thinking this are much more complicated, and include my belief that the original Massey/Thaler study has problems with its valuation model, but the extremely short version is that I have reason to believe that people systematically overvalue upper/middle 1st round picks.

Since I’ve been following the Patriots’ 2nd-round-oriented drafting strategy for years now, naturally my first thoughts after seeing the details of the new deal went to how this could kill their edge. Here’s a question I tweeted at the Sloan conference:

For Football panel: Is new CBA going to hurt the Patriots, who built a dynasty partly by fleecing the league w 2nd round draft picks? #SSAC

— Benjamin Morris (@skepticalsports) March 3, 2012

Actually, my concern about the Patriots drafting strategy was two-fold:

- The Patriots favorite place to draft could obviously lose its comparative value under the new system. If they left their strategy as-is, it could lead to their picking sub-optimally. At the very least, it should eliminate their exploitation opportunity.
- Though a secondary issue for this post, at some point taking an extreme bang-for-your-buck approach to player value can run into diminishing returns and cause stagnation. Since you can only have so many players on your roster or on the field at a time, your ability to hoard and exploit “cheap” talent is constrained. This is a particularly big concern for teams that are already pretty good, especially if they already have good “value” players in a lot of positions: At some point, you need players who are less cheap but higher quality, even if their value per dollar is lower than the alternative.

Of course, if you followed the draft, you know that the Patriots, entering the draft with far fewer picks than usual, still *traded up* in the 1st round, twice.

Taken out of context, these moves seem extremely out of character for the Patriots. Yet the moves are perfectly consistent with an approach that understands and attacks my concerns: Making fewer, higher-quality picks is essentially the correct solution, and if the value-curve has indeed shifted up as I expect it has, the new epicenter of the Patriots’ draft activity may be directly on top of the new sweet spot.

The entire affair reminds me of an old piece of poker wisdom that goes something like this: In a mixed game with one truly expert poker player and a bunch of completely outclassed amateurs, the expert’s biggest edge wouldn’t come in the poker variant with which he has the most expertise, but in some ridiculous spontaneous variant with tons of complicated made-up rules.

I forget where I first read the concept, but I know it has been addressed in various ways by many authors, ranging from Mike Caro to David Sklansky. I believe it was the latter (though please correct me if I’m wrong), who specifically suggested a Stud variant some of us remember fondly from childhood:

Several different games played only in low-stakes home games are called

Baseball, and generally involve many wild cards (often3s and9s), paying the pot for wild cards, being dealt an extra upcard upon receiving a4, and many other ad-hoc rules (for example, the appearance of the queen of spades is called a “rainout” and ends the hand, or that either red 7 dealt face-up is a rainout, but if one player has both red 7s in the hole, that outranks everything, even a 5 of a kind). These same rules can be applied to no peek, in which case the game is called “night baseball”.

The main ideas are that A) the expert would be able to adapt to the new rules much more quickly, and B) all those complicated rules make it much more likely that he would be able to find profitable exploitations (for Baseball in particular, there’s the added virtue of having several betting rounds per hand).

It will take a while to see how this plays out, and of course the abnormal outcome could just be a circumstances-driven coincidence rather than an explicit shift in the Patriots’ approach. But if my intuitions about the situation are right, Belichick may deserve extra credit for making deft adjustments in a changing landscape, much as you would expect from the Baseball-playing shark.

]]>In the meantime, I went through the 2010 play-by-play dataset and kluged a proxy stat from the *actual* clock, reflecting the number of seconds passed since a team took possession. Here’s a chart summarizing the number and outcomes of possessions of various lengths:

The orange X’s represent the number of league-wide possessions in which the first shot took place at the indicated time. The red diamonds represent the average number of points scored on those possessions (including from any subsequent shots following an offensive rebound, etc).

We should expect there to be a constant trade-off at any given time between taking a shot “now” and waiting for a better one to open up: the deeper you get into a possession, the more your shot standards should drop. And, indeed, this is reflected in the graph by the downward-sloping curve.

For now, I’m just throwing this out there. Though it represents a very basic idea, it is difficult to overstate its importance:

- Accounting for the clock can help evaluate players where standard efficiency ratings break down. Most simply, you can take the results of each shot and compare them to the expected value of a shot taken under the same amount of time-pressure. E.g., if someone averages .9 points per attempt with only a couple of seconds left, you can spot value where normal efficiency calculations wouldn’t.
- Actually, I’ve calculated just such preliminary “value-added” shooting for the entire league (with pretty interesting results), but I’d like to see more accurate data before posting or basing any substantial analysis on it. Among other problems, I think the right side of the curve is overly
*generous*, as it includes possessions where it took a while to get the clock started (a process that is, unfortunately, highly variable), or where time was added and the cause wasn’t scored (also disappointingly common). - Examining this information can tell you some things about the league generally: For example, it’s interesting to me that there’s a noticeable dip right around where the most shots actually take place (14 to 16 seconds in). Though speculative, I suspect that this is when players are most likely to settle for mediocre 2 point jumpers. Similarly, but a bit more difficultly, you can compare the actual curve with a derived curve to examine whether NBA players, on the whole, seem to wait too long (or not long enough) to pull the trigger.

With better data, the possibilities would open up further (even moreso when combined with other play-by-play information, like shot type, position, defense, etc). For example, you could look at the curve for individual players and impute whether they should be more or less aggressive with their shot selection.

So, yeah, if any of you can direct me to a dataset that has what I want, please let me know.

]]>First I attended the Football Analytics despite finding it disappointing last year, and, alas, it wasn’t any better. Eric Mangini must be the only former NFL coach willing to attend, b/c they keep bringing him back:

Just sat down for Football Analytics and I’m already bleh. In some ways, Mangini is worse than Brian Burke, b/c he acts like he cares. #SSAC

— Benjamin Morris (@skepticalsports) March 3, 2012

Overall, I spent more time in day 2 going to niche panels, research paper presentations and talking to people.

The last, in particular, was great. For example, I had a fun conversation with Henry Abbott about Kobe Bryant’s lack of “clutch.” This is one of Abbott’s pet issues, and I admit he makes a good case, particularly that the Lakers are net losers in “clutch” situations (yes, relative to other teams), even over the periods where they have been dominant otherwise.

Kobe is kind of a pivotal case in analytics, I think. First, I’m a big believer in “Count the Rings, Son” analysis: That is, leading a team to multiple championships is really hard, and only really great players do it. I also think he stands at a kind of nexus, in that stats like PER give spray shooters like him an unfair advantage, but more finely tuned advanced metrics probably over-punish the same. Part of the burden of Kobe’s role is that he *has* to take a lot of bad shots—the relevant question is how good he is at his job.

Abbott also mentioned that he liked one of my tweets, but didn’t know if he could retweet the non-family-friendly “WTF”:

Looking over the agenda, I don’t see “American Idol Analytics” anywhere. WTF? Competitive Singing is America’s 2nd favorite sport!#SSAC

— Benjamin Morris (@skepticalsports) March 2, 2012

I also had a fun conversation with Neil Paine of Basketball Reference. He seemed like a very smart guy, but this may be attributable to the fact that we seemed to be on the same page about so many things. Additionally, we discussed a very fun hypo: How far back in time would you have to go for the Charlotte Bobcats to be the odds-on favorites to win the NBA Championship?

As for the “sideshow” panels, they’re generally more fruitful and interesting than the ESPN-moderated super-panels, but they offer fewer easy targets for easy blog-griping. If you’re really interested in what went down, there is a ton of info at the SSAC website. The agenda can be found here. Information on the speakers is here. And, most importantly, videos of the various panels can be found here.

*Featuring Dean Oliver, Bill James, and others.*

This was a somewhat interesting, though I think slightly off-target, panel. They spent a lot of time talking about new data and metrics and pooh-poohing things like RBI (and even OPS), and the brave new world of play-by-play and video tracking, etc. But too much of this was discussing a different granularity of data than what can be improved in the current granularity levels. Or, in other words:

Solving box score problems w/ PBP or video data is fundamentally not “rebooting the box score.” What should be in box score but isn’t? #ssac

— Benjamin Morris (@skepticalsports) March 3, 2012

James acquitted himself a bit on this subject, arguing that boatloads of new data isn’t useful if it isn’t boiled down into useful metrics. But a more general way of looking at this is: If we were starting over from scratch, with a box-score-sized space to report a statistical game summary, and a similar degree of game-scoring resources, what kinds of things would we want to include (or not) that are different from what we have now? I can think of a few:

- In basketball, it’s archaic that free-throws aren’t broken down into bonus free throws and shot-replacing free throws.
- In football, I’d like to see passing stats by down and distance, or at least in a few key categories like 3rd and long.
- In baseball, I’d like to see “runs relative to par” for pitchers (though this can be computed easily enough from existing box scores).

In this panel, Dean Oliver took the opportunity to plug ESPN’s bizarre proprietary Total Quarterback Rating. They actually had another panel devoted just to this topic, but I didn’t go, so I’ll put a couple of thoughts here.

First, I don’t understand why ESPN is pushing this as a proprietary stat. Sure, no-one knows how to calculate regular *old-fashioned* quarterback ratings, but there’s a certain comfort in at least knowing it’s a real thing. It’s a bit like Terms of Service agreements, which people regularly sign without reading: at least you know the terms are out there, so *someone* actually cares enough to read them, and presumably they would raise a stink if you had to sign away your soul.

As for what we do know, I may write more on this come football season, but I have a couple of problems:

One, I hate the “clutch effect.” TQBR makes a special adjustment to value clutch performance even *more* than its generic contribution to winning. If anything, clutch situations in football are so bizarre that they should count *less*. In fact, when I’ve done NFL analysis, I’ve often just cut the 4th quarter entirely, and I’ve found I get better results. That may sound crazy, but it’s a bit like how some very advanced Soccer analysts have cut goal-scoring from their models, instead just focusing on how well a player advances the ball toward his goal: even if the former *matters more*, its unreliability may make it *less useful*.

Dean Oliver: You can criticize QBR, but nothing better to replace it. Hm. Try QBR minus the distorting clutch adjustment. #SSAC

— Benjamin Morris (@skepticalsports) March 3, 2012

Two, I’m disappointed in the way they “assign credit” for play outcomes:

Division of credit is the next step. Dividing credit among teammates is one of the most difficult but important aspects of sports. Teammates rely upon each other and, as the cliché goes, a team might not be the sum of its parts. By dividing credit, we are forcing the parts to sum up to the team, understanding the limitations but knowing that it is the best way statistically for the rating.

I’m personally very interested in this topic (and have discussed it with various ESPN analytics guys since long before TQBR was released). This is basically an attempt to address the entanglement problem that permeates football statistics. ESPN’s published explanation is pretty cryptic, and it didn’t seem clear to me whether they were profiling individual players and situations or had created credit-distribution algorithms league-wide.

At the conference, I had a chance to talk with their analytics guy who designed this part of the metric (his name escapes me), and I confirmed that they modeled credit distribution for the entire league and are applying it in a blanket way. Technically, I guess this is a step in the right direction, but it’s purely a reduction of noise and doesn’t address the real issue. What I’d really like to see is like a recursive model that imputes how much credit various players deserve broadly, then uses those numbers to re-assign credit for particular outcomes (rinse and repeat).

*Rajiv Maheswaran, and other nerds*.

This presentation was so awesome that I offered them a hedge bet for the “Best Research Paper” award. That is, I would bet on them at even money, so that if they lost, at least they would receive a consolation prize. They declined. And won. Their findings are too numerous and interesting to list, so you should really check it out for yourself.

Obviously my work on the Dennis Rodman mystery makes me particularly interested in their theories of why certain players get more rebounds than others, as I tweeted in this insta-hypothesis:

So, upshot: Dennis Rodman’s incredible value could have come from him simply stepping into open spaces rather than following the ball. #SSAC

— Benjamin Morris (@skepticalsports) March 3, 2012

Following the presentation, I got the chance to talk with Rajiv for quite a while, which was amazing. Obviously they don’t have any data on Dennis Rodman directly, but Rajiv was also interested in him and had watched a lot of Rodman video. Though anecdotal, he did say that his observations somewhat confirmed the theory that a big part of Rodman’s rebounding advantage seemed to come from handling space very well:

- Even when away from the basket, Rodman typically moved to the
*open space*immediately following a shot. This is a bit different from how people often think about rebounding as aggressively*attacking the ball*(or as being able to near-psychically predict where the ball is going to come down. - Also rather than simply attacking the board directly, Rodman’s first inclination was to insert himself between the nearest opponent and the basket. In theory, this might slightly decrease the chances of getting the ball when it heads in toward his previous position, but would make up for it by dramatically increasing his chances of getting the ball when it went toward the other guy.
- Though a little less purely strategical, Rajiv also thought that Rodman was just incredibly good at #2. That is, he was just exceptionally good at jockeying for position.

To some extent, I guess this is just rebounding fundamentals, but I still think it’s very interesting to think about the indirect probabilistic side of the rebounding game.

Quick tangent: At one point, I thought Neil Paine summed me up pretty well as a “contrarian to the contrarians.” Of course, I’m don’t think I’m contrary for the sake of contrariness, or that I’m a negative person (I don’t know how many times I’ve explained to my wife that just because I hated a movie doesn’t mean I didn’t enjoy it!), it’s just that my mind is naturally inclined toward considering the limitations of whatever is put in front of it. Sometimes that means criticizing the status quo, and sometimes that means criticizing its critics.

So, with that in mind, I thought Bill James’s showing at the conference was pretty disappointing, particularly his interview with Bill Simmons.

I have a lot of respect for James. I read his Historical Baseball Abstract and enjoyed it considerably more than Moneyball. He has a very intuitive and logical mind. He doesn’t say a bunch of shit that’s not true, and he sees beyond the obvious. In Saturday’s “Rebooting the Box-score” panel, he made an observation that having 3 of 5 people on the panel named John implied that the panel was [likely] older than the rest of the room. This got a nice laugh from the attendees, but I don’t think he was kidding. And whether he was or not, he still gets 10 kudos from me for making the closest thing to a Bayesian argument I heard all weekend. And I dutifully snuck in for a pic with him:

James was somewhat ahead of his time, and perhaps he’s still one of the better sports analytic minds out there, but in this interview we didn’t really get to hear him analyze anything, you know, *sportsy*. This interview was all about Bill James and his bio and how awesome he was and how great he is and how hard it was for him to get recognized and how much he has changed the game and how, without him, the world would be a cold, dark place where ignorance reigned and nobody had ever heard of “win maximization.”

Bill Simmons going this route in a podcast interview doesn’t surprise me: his audience is obviously much broader than the geeks in the room, and Simmons knows his audience’s expectations better than anyone. What got to me was James’s willingness to play along, and everyone else’s willingness to eat it up. Here’s an example of both, from the conference’s official Twitter account:

Quote of the day RT @SloanSportsConf: “this conference is a culmination of 30 years of my work” — Bill James #SSAC

— MIT Sports Conf. (@SloanSportsConf) March 3, 2012

Perhaps it’s because I never really liked baseball, and I didn’t really know anyone did any of this stuff until recently, but I’m pretty certain that Bill James had virtually zero impact on my own development as a sports data-cruncher. When I made my first PRABS-style basketball formula in the early 1990’s (which was absolutely terrible, but is still more predictive than PER), I had no idea that any sports stats other than the box score even existed. By the time I first heard the word “sabermetrics,” I was deep into my own research, and didn’t bother really looking into it deeply until maybe a few months ago.

Which is not to say I had no guidance or inspiration. For me, a big epiphanous turning point in my approach to the analysis of games did take place—after I read David Sklansky’s Theory of Poker. While ToP itself was published in 1994, Sklansky’s similar offerings date back to the 70s, so I don’t think any broader causal pictures are possible.

More broadly, I think the claim that sports analytics wouldn’t have developed without Bill James is preposterous. Especially if, as i assume we do, we firmly believe we’re right. This isn’t like L. Ron Hubbard and Incident II: being for sports analytics isn’t like having faith in a person or his religion. It simply means trying to think more rigorously about sports, and using all of the available analytical techniques we can to gain an advantage. Eventually, those who embrace the right will win out, as we’ve seen begin to happen in sports, and as has already happened in nearly every other discipline.

Indeed, by his own admission, James liked to stir controversy, piss people off, and talk down to the old guard whenever possible. As far as we know, he may have set the cause of sports analytics back, either by alienating the people who could have helped it gain acceptance, or by setting an arrogant and confrontational tone for his disciples (e.g., the uplifting “don’t feel the need to explain yourself” message in Moneyball). I’m not saying that this is the case or even a likely possibility, I’m just trying to illustrate that giving someone credit for all that follows—even a pioneer like James—is a dicey game that I’d rather not participate in, and that he definitely shouldn’t.

On a more technical note, one of his oft-quoted and re-tweeted pearls of wisdom goes as follows:

Bill James on whether we’ve exhausted all baseball advanced stats: “We’ve only taken a bucket of knowledge from a sea of ignorance.” #ssac

— Gill Alexander (@beatingthebook) March 2, 2012

Sounds great, right? I mean, not really, I don’t get the metaphor: if the sea is full of ignorance, why are you collecting water from it with a bucket rather than some kind of filtration system? But more importantly, his argument in defense of this claim is amazingly weak. When Simmons asked what kinds of things he’s talking about, he repeatedly emphasized that we have no idea whether a college sophomore will turn out to be a great Major League pitcher. True, but, um, *we never will*. There are too many variables, the input and outputs are too far apart in time, and the contexts are too different. This isn’t the sea of ignorance, it’s a sea of unknowns.

Which gets at one of my big complaints about stats-types generally. A lot of people seem to think that stats are all about making exciting discoveries and answering questions that were previously unanswerable. Yes, sometimes you get lucky and uncover some relationship that leads to a killer new strategy or to some game-altering new dynamic. But most of the time, you’ll find static. A good statistical thinker doesn’t try to reject the static, but tries to understand it: Figuring out what you can’t know is just as important as figuring out what you can know.

On Twitter I used this analogy:

I also don’t know whether this coin will come up heads or tails, but that doesn’t mean I have a poor understanding of coin-flipping. #SSAC

— Benjamin Morris (@skepticalsports) March 2, 2012

Success comes with knowing more true things and fewer false things than the other guy.

]]>Like basketball, teams with championship-winning experience outperform their regular-season records in the playoffs, especially if they make it to the Super Bowl.

So, a bit like my 5-by-5 model, I wanted to come up with a simple metric for picking the Super Bowl winner. Unlike its NBA cousin, however, this method only applies to the championship game, not to the entire playoffs. The main question is, how much better does a team with more Super Bowl winning experience do than it’s opponent?

I feel bad about my text/graphs ratio this week, so I thought I’d tell this story in pictures. Before testing the question, we need to pick the best time period. So, for what number of years does the metric “pick the team with the most super bowl wins” most often pick the ultimate winner:

This was a little surprising to me already: I thought for sure the best *n* would be a small number, but it turns out to be 6.

Counting 2012, there have been 26 Super Bowls where one team has won more championships in the previous 6 years than the other. Of those games, the team with the greater number has won 20, or 77% of the time—including the Giants. [True story: I was going to publish something on this research *before* this year’s Super Bowl, but, knowing that it predicted a New York win against the heavily favored Patriots, I chickened out.]

Of course, I’m sure most of you are just itching to pounce right now: Clearly the team with the most recent Super Bowl wins is usually going to be better, right? So clearly this must be confounding this result. So let’s compare it to the predictive accuracy of SRS (Simple Rating System, aka “Margin of Victory adjusted for Strength of Schedule”):

Looking at all 46 Super Bowls, the team with the higher SRS has won 26, or 57%. In Super Bowls where no team had more Super Bowl wins, SRS performs *slightly* better, correctly picking 12/20 (60%). But the real story is in the games where both had something to say: When SRS and L6 agreed, the team they both picked won 11/14 (79%). But when SRS and L6 *disagreed—*in other words, where one team had a higher SRS, but the other had more Super Bowl wins in the previous 6 years—the team with the paper qualifications lost to the team with the championship experience 9 of 12 times (75%).

Now, your next thought might be that the years when L6 trumped SRS were probably the years when the teams were very close. But you’d be wrong:

The average SRS difference in 9 years where the L6 team won is actually higher than in the 3 years when it lost!

So how much does L6 add overall? Well, let’s first create a simple method, a bit like 5-by-5:

- If one team has more Super Bowl wins in the previous 6 years, pick them.
- Otherwise, pick the team with the best SRS.

Following this method, you would correctly pick 32 of the 46 Super Bowls (70%), for a 10% improvement overall, despite step 1 only even applying in about half of the games (also, note that if you just picked randomly in the 20 Super Bowls where L6 doesn’t apply, you would still be expected to get 30 right overall).

Finally, to try to quantify the difference in predictive value between the two measures, I plugged them both into a logistic regression:

As you can see, L6 is much more predictive, though the 95% confidence intervals do overlap. (Though I should also note, this last chart is based on the regression I ran *prior* to this year’s game, which ended up being another victory for the championship experience side.)

A lot has happened in my life since then: I finished my Rodman series, won the ESPN Stat Geek Smackdown (which, though I am obviously happy to have won, is not really that big a deal—all told, the scope of the competition is about the same as picking a week’s worth of NFL games), my wife and I had a baby, and, oh yeah, I learned a ton about the breadth, depth, and nature of the sports analytics community.

For the most part, I used Twitter as sort of my de facto notebook for the conference. Thus, I’m sorry if I’m missing a bunch of lengthier quotes and/or if I repeat a bunch of things you already saw in my live coverage, but I will try to explain a few things in a bit more detail.

For the most part, I’ll keep the recap chronological. I’ve split this into two parts: Part 1 covers Friday, up to but not including the Bill Simmons/Bill James interview. Part 2 covers that interview and all of Saturday.

From the pregame tweets, John Hollinger observed that 28 NBA teams sent representatives (that we know of) this year. I also noticed that the New England Revolution sent 2 people, while the New England Patriots sent none, so I’m not sure that number of official representatives reliably indicates much.

The conference started with some bland opening remarks by Dean David Schmittlein. Tangent: I feel like political-speak (thank everybody and say nothing) seems to get more and more widespread every year. I blame it on fear of the internet. E.g., in this intro segment, somebody made yet another boring joke about how there were no women present (personally, I thought there were significantly more than last year), and was followed shortly thereafter by a female speaker, understandably creating a tiny bit of awkwardness. If that person had been more important (like, if I could remember his name to slam him), I doubt he would have made that joke, or any other joke. He would have just thanked everyone and said nothing.

*Featuring Gary Bettman (NHL), Rob Manfred (MLB), Adam Silver (NBA), Steve Tisch (NYG) and Michael Wilbon moderating.*

This panel really didn’t have much of a theme, it was mostly Wilbon creatively folding a bunch of predictable questions into arbitrary league issues. E.g.: ” “What do you think about Jeremy Lin?!? _{And, you know, overseas expansion blah blah}.”

I don’t get the massive cultural significance of Jeremy Lin, personally. I mean, he’s not the first ethnically Chinese player to have NBA success (though he is perhaps the first short one). The discussion of China, however, was interesting for other reasons. Adam Silver claimed that Basketball is already more popular in China than soccer, with over 300 million Chinese people playing it. Those numbers, if true, are pretty mind-boggling.

Finally, there was a whole part about labor negotiations that was pretty well summed up by this tweet:

Opening panel summary: league execs are very smart and have done a great job with labor negotiations according to league execs. #ssac

— Jeremy Schmidt (@Bucksketball) March 2, 2012

*Featuring Brian Burke, Peter Chiarelli, Mike Milbury and others.*

The panel started with Peter Chiarelli being asked how the world champion Boston Bruins use analytics, and in an ominous sign, he rambled on for a while about how, when it comes to scouting, they’ve learned that weight is probably more important than height.

Overall, it was a bit like any scene from the Moneyball war room, with Michael Schuckers (the only pro-stats guy) playing the part of Jonah Hill, but without Brad Pitt to protect him.

When I think of Brian Burke, I usually think of Advanced NFL Stats, but apparently there’s one in Hockey as well. Burke is GM/President of the Toronto Maple Leafs. At one point he was railing about how teams that use analytics have never won anything, which confused me since I haven’t seen Toronto hoisting any Stanley Cups recently, but apparently he did win a championship with the Mighty Ducks in 2007, so he clearly speaks with absolute authority.

This guy was a walking talking quote machine for the old school. I didn’t take note of all the hilarious and/or non-sensical things he said, but for some examples, try searching Twitter for “#SSAC Brian Burke.” To give an extent of how extreme, someone tweeted this quote at me, and I have no idea if he actually said it or if this guy was kidding.

@skepticalsports ‘Hockey is played with a stick and a puck, not your little calculator and your spreadsheets, Poindexter.’ – Brian Burke

— Brian Woodburn (@MustRockTheRed) March 2, 2012

In other words, Burke was literally too over the top to effectively parody.

On the other hand, in the discussion of concussions, I thought Burke had sort of a folksy realism that seemed pretty accurate to me. I think his general point is right, if a bit insensitive: If we really changed hockey so much as to eliminate concussions entirely, it would be a whole different sport (which he also claimed no one would watch, an assertion which is more debatable imo). At the end of the day, I think professional sports mess people up, including in the head. But, of course, we can’t ignore the problem, so we have to keep proceeding toward some nebulous goal.

Mike Milbury, presently a card-carrying member of the media, seemed to mostly embrace the alarmist media narrative, though he did raise at least one decent point about how the increase in concussions—which most people are attributing to an increase in diagnoses—may relate to recent rules changes that have sped up the game.

But for all that, the part that frustrated me the most was when Michael Schuckers, the legitimate hockey statistician at the table, was finally given the opportunity to talk. 90% of the things that came out of his mouth were various snarky ways of asserting that face-offs don’t matter. I mean, I assume he’s 100% right, but just had no clue how to talk to these guys. Find common ground: you both care about scoring goals, defending goals, and winning. Good face-off skill get you the puck more often in the right situations. The question is how many extra possessions you get and how valuable those possessions are? And finally, what’s the actual decision in question?

*Featuring Scott Boras, Scott Boras, Scott Boras, some other guys, Scott Boras, and, oh yeah, Bill James.*

In stark constrast to the Hockey panel, the Baseball guys pretty much bent over backwards to embrace analytics as much as possible. As I tweeted at the time:

Watching Hockey Analytics panel before Baseball Analytics panel is like watching Wheel of Fortune before Jeopardy. #SSAC #oldjoke

— Benjamin Morris (@skepticalsports) March 2, 2012

Scott Boras seems to like hearing Scott Boras talk. Which is not so bad, because Scott Boras actually did seem pretty smart and well informed: Among other things, Scott Boras apparently has a secret internal analytics team. To what end, I’m not entirely sure, since Scott Boras also seemed to say that most GM’s overvalue players relative to what Scott Boras’s people tell Scott Boras.

At this point, my mind wandered:

Fantasizing about Belichick being on one of these panels, but answering every question “I just try to give us the best chance to win.” #SSAC

— Benjamin Morris (@skepticalsports) March 2, 2012

How awesome would that be, right?

Anyway, in between Scott Boras’s insights, someone asked this Bill James guy about his vision for the future of baseball analytics, and he gave two answers:

- Evaluating players from a variety of contexts other than the minor leagues (like college ball, overseas, Cubans, etc).
- Analytics will expand to look at the needs of the entire enterprise, not just individual players or teams.

Meh, I’m a bit underwhelmed. He talked a bit about #1 in his one-on-one with Bill Simmons, so I’ll look at that a bit more in my review of that discussion. As for #2, I think he’s just way way off: The business side of sports is already doing tons of sophisticated analytics—almost certainly way more than the competition side—because, you know, it’s business.

E.g., in the first panel, there was a fair amount of discussion of how the NBA used “sophisticated modeling” for many different lockout-related analyses (I didn’t catch the Ticketing Analytics panel, but from its reputation, and from related discussions on other panels, it sounds like that discipline has some of the nerdiest analysis of all).

Scott Boras let Bill James talk about a few other things as well: E.g., James is not a fan of new draft regulations, analogizing them to government regulations that “any economist would agree” inevitably lead to market distortions and bursting bubbles. While I can’t say I entirely disagree, I’m going to go out on a limb and guess that his political leanings are probably a bit Libertarian?

*Featuring Jeff Van Gundy, **Mike Zarren, **John Hollinger, and Mark Cuban Dean Oliver.*

If every one of these panels was Mark Cuban + foil, it would be just about the most awesome weekend ever (though you might not learn the most about analytics). So I was excited about this one, which, unfortunately, Cuban missed. Filling in on zero/short notice was Dean Oliver. Overall, here’s Nathan Walker’s take:

Basketball Panel Summary: “Too many variables. Too much noise. Stats are hard.” #ssac

— Nathan Walker (@bbstats) March 2, 2012

This panel actually had some pretty interesting discussions, but they flew by pretty fast and often followed predictable patterns, something like this:

- Hollinger says something pro-stats, though likely way out of his depth.
- Zarren brags about how they’re already doing that and more on the Celtics.
- Oliver says something smart and nuanced that attempts to get at the underlying issues and difficulties.
- Jeff Van Gundy uses forceful pronouncements and “common sense” to dismiss his strawman version of what the others have been saying.

E.g.:

“Michael Jordan was pretty good. That’s revolutionary.” <- Van Gundy to Oliver (but kind of took Dean out of context). #SSAC

— Benjamin Morris (@skepticalsports) March 2, 2012

Zarren talked about how there is practically more data these days than they know what to do with. This seems true and I think it has interesting implications. I’ll discuss it a little more in Part 2 re: the “Rebooting the Box Score” talk.

There was also an interesting discussion of trades, and whether they’re more a result of information asymmetry (in other words, teams trying to fleece each other), or more a result of efficient trade opportunities (in other words, teams trying to help each other). Though it really shouldn’t matter—you trade when you think it will help you, whether it helps your trade partner is mostly irrelevant—Oliver endorsed the latter. He makes the point that, with such a broad universe of trade possibilities, looking for mutually beneficial situations is the easiest way to find actionable deals. Fair enough.

*Featuring coaching superstars Jeff Van Gundy, Eric Mangini, and Bill Simmons. Moderated by Daryl Morey.*

OK, can I make the obvious point that Simmons and Morey apparently accidentally switched role cards? As a result, this talk featured a lot of Simmons attacking coaches and Van Gundy defending them. I honestly didn’t remember Mangini was on this panel until looking back at the book (which is saying something, b/c Mangini usually makes my blood boil).

There was almost nothing on, say, *how to evaluate coaches*, say, by analyzing how well their various decisions comported with the tenets of win maximization. There *was* a lengthy (and almost entirely non-analytical) discussion of that all-important question of whether an NBA coach should foul or not up by 3 with little time left. Fouling probably has a tiny edge, but I think it’s too close and too infrequent to be very interesting (though obviously not as rare, it reminds me a bit of the impassioned debates you used to see on Poker forums about whether you should fast-play or slow-play flopped quads in limit hold’em).

There was what I thought was a funny moment when Bill Simmons was complaining about how teams seem to recycle mediocre older coaches rather than try out young, fresh talent. But when challenged by Van Gundy, Simmons drew a blank and couldn’t think of anyone. So, Bill, this is for you. Here’s a table of NBA coaches who have coached at least 1000 games for at least 3 different teams, while winning fewer than 60% of their games and without winning any championships:

[table “8” not found /]Note that I’m not necessarily agreeing with Simmons: Winning championships in the NBA is hard, especially if your team lacks uber-stars (you know, Michael Jordan, Magic Johnson, Dennis Rodman, et al).

Honestly, I got a little carried away with my detailed analysis/screed on Bill James, and I may have to do a little revising. So due to some other pressing writing commitments, you can probably expect Part 2 to come out this Saturday (Friday at the earliest).

]]>This anomaly certainly captures the imagination, and I’ve received multiple requests for comment. E.g., a friend from my old poker game emails:

Just heard that the Packers have given up more yards than they’ve gained and was wondering how to explain this. Obviously the Packers’ defense is going to be underrated by Yards Per Game metrics since they get big leads and score quickly yada yada, but I don’t see how this has anything to do with the fact they’re being outgained. I assume they get better starting field position by a significant amount relative to their opponents so they can have more scoring drives than their opponents while still giving up more yards than they gain, but is that backed up by the stats?

Last week Advanced NFL Stats posted a link to this article from Smart Football looking into the issue in a bit more depth. That author does a good job examining what this stat means, and whether or not it implies that Green Bay isn’t as good as they seem (he more or less concludes that it doesn’t).

But that doesn’t really answer the question of how the anomaly is even possible, much less how or why it came to be. With that in mind, I set out to solve the problem. Unfortunately, after having looked at the issue from a number of angles, and having let it marinate in my head for a week, I simply haven’t found an answer that I find satisfying. But, what the hell, one of my resolutions is to pull the trigger on this sort of thing, so I figure I should post what I’ve got.

The first thing to do when you come across something that seems “crazy on its face” is to investigate how crazy it *actually is* (frequently the best explanation for something unusual is that it needs no explanation). In this case, however, I think the Packers’ yardage anomaly is, indeed, “pretty crazy.” Not otherworldly crazy, but, say, on a scale of 1 to “Kurt Warner being the 2000 MVP,” it’s at least a 6.

First, I was surprised to discover that just last year, the New England Patriots *also* had the league’s best record (14-2), and *also* managed to lose the yardage battle. But despite such a recent example of a similar anomaly, it is still statistically pretty extreme. Here’s a plot of more or less every NFL team season from 1936 through the present, excluding seasons where the relevant stats weren’t available or were too incomplete to be useful (N=1647):

The green diamond is the Packers net yardage vs. Win%, and the yellow triangle is their net yardage vs. Margin of Victory (net points). While not exactly Rodman-esque outliers, these do turn out to be very historically unusual:

Using the trendline equation on the graph above (plus basic algebra), we can use a team’s season Win percentage to calculate their expected yardage differential. With that prediction in hand, we can compare how much each team over or under-performed its “expectation”:

Both the 2011 Packers and the 2010 Patriots are in the top 5 all-time, and I should note that the 1939 New York Giants disparity is slightly overstated, because I excluded tie games entirely (ties cause problems elsewhere b/c of perfect correlation with MOV).

Toward the conclusion of that Smart Football article, the author notes that Green Bay’s Margin of Victory isn’t as strong as their overall record, noting that the Packers “Pythagorian Record” (expectation computed from points scored and points allowed) is more like 11-5 or 12-4 than 15-1 (note that getting from extremely high Win % to very high MOV is incidental: 15-win teams are *usually* 11 or 12 win teams that have experienced good fortune). Green Bay’s MOV of 12.5 is a bit lower than the historical average for 15-1 teams (13.8) but don’t let this mislead you: the disparity between the yardage differential that we would expect based on Green Bay’s MOV and their actual result (using a linear projection, as above) is every bit as extreme as what we saw from Win %:

And here, in histogram form:

So, while not the most unusual thing to ever happen in sports, this anomaly is certainly unusual enough to look into.

For the record, the Packers’ MOV -> yard diff error is 3.23 standard deviations above the mean, while the Win% -> yard diff is 3.28. But since MOV correlates more strongly with the target stat (note an average error of only 125 yards instead of 170), a similar degree of abnormality leaves it as the more stable and useful metric to look at.

Thus, the problem can be framed as follows: The 2011 Packers fell around 2000 yards (the 125.7 above * 16 games) short of their expected yardage differential. Where did that 2000 yard gap come from?

Before getting started, I should note that, out of necessity, some of these “explanations” are more descriptive than actually explanatory, and even the ones that seem plausible and significant are hopelessly mixed up with one another. At the end of the day, I think the question of “What happened?” is addressable, though still somewhat unclear. The question of “Why did it happen?” remains largely a mystery: The most substantial claim that I’m willing to make with any confidence is that none of the obvious possibilities are sufficient explanations by themselves.

While I’m somewhat disappointed with this outcome, it makes sense in a kind of Fermi Paradox, “Why Aren’t They Here Yet?” kind of way. *I.e.*, if any of the straightforward explanations (e.g., that their stats were skewed by turnovers or “garbage time” distortions) could actually create an anomaly of this magnitude, we’d expect it to have happened more often.

And indeed, the data is actually consistent with a number of different factors (granted, with significant overlap) being present at once.

As suggested in the email above, one theoretical explanation for the anomaly could be the Packers’ presumably superior field position advantage. I.e., with their offense facing comparatively shorter fields than their opponents, they could have literally had fewer yards available to gain. This is an interesting idea, but it turns out to be kind of a bust.

The Packers did enjoy a reciprocal field position advantage of about 5 yards. But, unfortunately, there doesn’t seem to be a noticeable relationship between average starting field position and average yards gained per drive (which would have to be true *ex ante* for this “explanation” to have any meaning):

^{Note: Data is from the Football Outsiders drive stats.}

This graph plots both offenses and defenses from 2011. I didn’t look at more historical data, but it’s not really necessary: Even if a larger dataset revealed a statistically significant relationship, the large error rate (which converges quickly) means that it couldn’t alter expectation in an individual case by more than a fraction of a yard or so per possession. Since Green Bay only traded 175ish possessions this season, it couldn’t even make a dent in our 2000 missing yards (again, that’s if it existed at all).

On the other hand, one thing in the F.O. drive stats that almost certainly IS a factor, is that the Packers had a net of 10 fewer possessions this season than their opponents. As Green Bay averaged 39.5 yards per possession, this difference alone could account for around 400 yards, or about 20% of what we’re looking for.

Moreover, 5 of those 10 possessions come from a disparity in “zero yard touchdowns,” or net touchdowns scored by their defense and special teams: The Packers scored 7 of these (5 from turnovers, 2 from returns) while only allowing 2 (one fumble recovery and one punt return). Such scores widen a team’s MOV without affecting their total yardage gap.

[Warning: this next point is a bit abstract, so feel free to skip to the end.] Logically, however, this doesn’t quite get us where we want to go. The relevant question is “What would the yardage differential have been if the Packers had the same number of possessions as their opponents?” Some percentage of our 10 counterfactual drives would result in touchdowns regardless. Now, the Packers scored touchdowns on 37% of their actual drives, but scored touchdowns on *at least* 50% of their counterfactual drives (the ones that we can *actually* account for via the “zero yard touchdown” differential). Since touchdown drives are, on average, *longer* than non-touchdown drives, this means that the ~400 yards that can be attributed to the possession gap is at least somewhat understated.

When considering this issue, probably the first thing that springs to minds is that the Packers have won a lot of games easily. It seems highly plausible that, having rushed out to so many big leads, the Packers must have played a huge amount of “garbage time,” in which their defense could have given up a lot of “meaningless” yards that had no real consequence other than to confound statisticians.

The proportion of yards on each side of the ball that came after Packers games got out of hand should be empirically checkable—but, unfortunately, I haven’t added 2011 Play-by-Play data to my database yet. That’s okay, though, because there are other ways—perhaps even more interesting ways—to attack the problem.

In fact, it’s pretty much right up my alley: Essentially, what we are looking for here is yet another permutation of “Reverse Clutch” (first discussed in my Rodman series, elaborated in “Tim Tebow and the Taxonomy of Clutch”). Playing soft in garbage time is a great way for a team to “underperform” in statistical proxies for true strength. In football, there are even a number of sound tactical and strategic reasons why you should explicitly sacrifice yards in order to maximize your chances of winning. For example, if you have a late lead, you should be more willing to soften up your defense of non-sideline runs and short passes—even if it means giving up more yards on average than a conventional defense would—since those types of plays hasten the end of the game. And the converse is true on offense: With a late lead, you want to run plays that avoid turnovers and keep the clock moving, even if it means you’ll be more predictable and easier to defend.

So how might we expect this scenario to play out statistically? Recall, by definition, “clutch” and “reverse clutch” look the same in a stat sheet. So what kind of stats—or relationships between stats—normally indicate “clutchness”? As it turns out, Brian Burke at Advanced NFL Stats has two metrics pretty much at the core of everything he does: Expected Points Added, and Win Percentage Added. The first of these (EPA) takes the down and distance before and after each play and uses historical empirical data to model how much that result *normally* affects a team’s point differential. WPA adds time and score to the equation, and attempts to model the impact each play has on the team’s chances of winning.

A team with “clutch” results—whether by design or by chance—might be expected to perform better in WPA (which ultimately just adds up to their number of wins) than in EPA (which basically measures generic efficiency).

For most aspects of the game, the relationship between these two is strong enough to make such comparisons possible. Here are plots of this comparison for each of the 4 major categories (2011 NFL, Green Bay in green), starting with passing offense (note that the comparison is technically between wins added *overall* and expected points *per play*):

Obviously there’s nothing strikingly abnormal about Green Bay’s results in these graphs, but there are small deviations that are perfectly *consistent* with the garbage time/reverse clutch theory. For the passing game (offense and defense), Green Bay seems to hew pretty close to expectation. But in the rushing game they do have small but noticeable disparities on both sides of the ball. Note that in the scenario I described where a team intentionally trades efficiency for win potential, we would expect the difference to be most acute in the running game (which would be under-defended on defense and overused on offense).

Specifically: Green Bay’s offensive running game has a WPA of 1.1, despite having an EPA per play of zero (which corresponds to a WPA of .25). On defense, the Packers’ EPA/p is .07, which should correspond to an expected WPA of 1.0, while their actual result is .59.

Clearly, both of these effects are small, considering there isn’t a perfect correlation. But before dismissing them entirely, I should note that we don’t immediately know how much of the variation in the graphs above is due to *variance* for *a given team* and how much is due to variation *between* teams. Moreover, without knowing the balance, the fact that both variance and variation contribute to the “entropy” of the observed relationship between EPA/p and WPA, the *actual* relationship between the two is likely to be stronger than these graphs would make it seem.

The other potential problem is that this comparison is between wins and points, while the broader question is comparing points to yards. But there’s one other statistical angle that helps bridge the two, while supporting the speculated scenario to boot: Green Bay gained 3.9 yards per attempt on offense, and allowed 4.7 yards per attempt on defense—while the league average is 4.3 yards per attempt. So, at least in terms of raw yardage, Green Bay performed “below average” in the running game by about .4 yards/attempt on each side of the ball. Yet, the combined WPA for the Packers running game ** is positive!** Their net rushing WPA is +.5, despite having an

So, if we thought this wasn’t a statistical artifact, there would be two obvious possible explanations: 1) That Green Bay has a sub-par running game that has happened to be very effective in important spots, or 2) that Green Bay actually has an average (or better) running game that has *appeared ineffective* (especially as measured by yards gained/allowed) in *less* important spots. Q.E.D.

For the sake of this analysis, let’s assume that the observed difference for Green Bay here really is a product of strategic adjustments stemming from (or at least related to) their winning ways, how much of our 2000 yard disparity could it account for?

So let’s try a crazy, wildly speculative, back-of-the-envelope calculation: Give Green Bay and its opponents the same number of rushing attempts that they had this season, but with both sides gaining an *average* number of yards per attempt. The Packers had 395 attempts and their opponents had 383, so at .4 yards each, the yardage differential would swing by 311 yards. So again, interesting and plausibly significant, but doesn’t even come close to explaining our anomaly on its own.

One of the more notable features of the Packers season is their incredible +22 turnover margin. How they managed that and whether it was simply variance or something more meaningful could be its own issue. But in this context, give them the +22, how helpful is that *as an explanation* for the yardage disparity? Turnovers affect scores and outcomes a ton, but are relatively neutral w/r/t yards, so surely this margin is relevant. But exactly how much does it neutralize the problem?

Here, again, we can look at the historical data. To predict yardage differential based on MOV *and* turnover differential, we can set up an extremely basic linear regression:

The R-Square value of .725 means that this model is pretty accurate (MOV alone achieved around .66). Both variables are extremely significant (from p value, or absolute value of t-stat). Based on these coefficients, the resulting predictive equation is

YardsDiff = 7.84*MOV – 23.3*TOdiff/gm

Running the dataset through the same process as above (comparing predictions with actual results and calculating the total error), here’s how the new rankings turns out:

In other words, if we account for turnovers in our predictions, the expected/actual yardage discrepancy drops from ~125 to ~70 yards per game. This obv makes the results somewhat less extreme, though still pretty significant: 11th of 1647. Or, in histogram form:

So what’s the bottom line? At 69.5 yards per game, the total “missing” yardage drops to around 1100. Therefore, inasmuch as we accept it as an “explanation,” Green Bay’s turnover differential seems to account for about 900 yards.

It’s probably obvious, but important enough to say anyway, that there is *extensive overlap* between this “explanation” and our others above: E.g., the interception differential contributes to the possession differential, and is exacerbated by garbage time strategy, which causes the EPA/WPA differential, etc.

Finally, I have to address a potential cause of this anomaly that I would almost rather not: The elusive “Bend But Don’t Break” defense. It’s a bit like the Dark Matter of this scenario: I can prove it exists, and estimate about how much is there, but that doesn’t mean I have any idea what it is or where it comes from, and it’s almost certainly not as sexy as people think it is.

Typically, “Bend But Don’t Break” is the description that NFL analysts use for bad defenses that get lucky. As a logical and empirical matter, they mostly don’t make sense: Pretty much every team in history (save, possibly, the 2007 New England Patriots) has a steeply inclined expected points by field position curve. See, e.g., the “Drive Results” chart in this post. Any time you “bend” enough to give up first downs, you’re giving up expected points. In other words, barring special circumstances, there is simply no way to trade significant yards for a decreased chance of scoring.

Of course, you *can* have defenses that are stronger at defending various parts of the field, or certain down/distance combinations, which could have the net effect of allowing fewer points than you would expect based on yards allowed, but that’s not some magical defensive rope-a-dope strategy, it’s just being better at some things than others.

But for whatever reason, on a drive-by-drive basis, did the Green Bay defense “bend” more than it “broke”? In other words, did they give up fewer points than expected?

And the answer is “yes.” Which should be unsurprising, since it’s basically a minor variant of the original problem. In other words, it begs the question.

In fact, with everything that we’ve looked at so far, this is pretty much all that is left: if there weren’t a significant “Bend But Don’t Break” effect observable, the yardage anomaly would be literally impossible.

And, in fact, this observation “accounts” for about 650 yards, which, combined with everything else we’ve looked at (and assuming a modest amount of overlap), puts us in the ballpark of our initial 2000 yard discrepancy.

Some of the things that seem speculative above *must* be true, because there has to be an accounting: even if it’s completely random, dumb luck with no special properties and no elements of design, there still has to be an avenue for the anomaly to manifest.

So, given that some speculation is necessary, the best I can do is offer a sort of “death by a thousand cuts” explanation. If we take the yardage explained by turnovers, the “dark matter” yards of “bend but don’t break”, and then roughly half of our speculated consequences of the fewer drives/zero yard TD’s and the “Garbage Time” reverse-clutch effect (to account for overlap), you actually end up with around 2100 yards, with a breakdown like so:

So why cut drives and reverse clutch in half instead of the others? Mostly just to be conservative. We have to account for overlap somewhere, and I’d rather leave more in the unknown than in the known.

At the end of the day, the stars definitely had to align for this anomaly to happen: Any one of the contributing factors may have been slightly unusual, but combine them and you get something rare.

]]>This may be a bit of a surprise coming from a statistically-oriented self-professed skeptic, but I’m a complete believer in “clutch.” In this case, my skepticism is aimed more at those who deny clutch out of hand: The principle that “**Clutch does not exist**” is treated as something of a sacred tenet by many adherents of the Unconventional Wisdom.

On the other hand, my belief in Clutch doesn’t necessarily mean I believe in mystical athletic superpowers. Rather, I think the “clutch” effect—that is, scenarios where the performance of some teams/players genuinely improves when game outcomes are in the balance—is perfectly rational and empirically supported. Indeed, the simple fact that winning is a statistically significant predictive variable on top of points scored and points allowed—demonstrably true for each of the 3 major American sports—is very nearly proof enough.

The differences between my views and those of clutch-deniers are sometimes more semantic and sometimes more empirical. In its broadest sense, I would describe “clutch” as a property inherent in players/teams/coaches who systematically perform better than normal in more important situations. From there, I see two major factors that divide clutch into a number of different types: 1) Whether or not the difference is a product of the individual or team’s own skill, and 2) whether their performance in these important spots is abnormally good relative to* their* performance (in less important spots), whether it is good relative to the *typical* performance in those spots, or both. In the following chart, I’ve listed the most common types of Clutch that I can think of, a couple of examples of each, and how I think they break down w/r/t those factors (click to enlarge):

Here are a few thoughts on each:

I first discussed the concept of “reverse clutch” in this post in my Dennis Rodman series. Put simply, it’s a situation where someone has clutch-like performance by virtue of playing *badly* in *less* important situations.

While I don’t think this is a particularly common phenomenon, it may be relevant to the Tebow discussion. During Sunday’s Broncos/Pats game, I tweeted that at least one commentator seemed to be flirting with the idea that maybe Tebow would be better off throwing more interceptions. Noting that, for all of Tebow’s statistical shortcomings, his interception rate is ridiculously low, and then noting that Tebow’s “ugly” passes generally err on the *ultra-cautious* side, the commentator seemed poised to put the two together—if just for a moment—before his partner steered him back to the mass media-approved narrative.

If you’re not willing to take the risks that sometimes lead to interceptions, you may also have a harder time completing passes, throwing touchdowns, and doing all those things that quarterbacks normally do to win games. And, for the most part, we know that Tebow is almost religiously (pun intended) committed to avoiding turnovers. However, in situations where your team is trailing in the 4th quarter, you may have no choice but to let loose and take those risks. Thus, it is *possible* that a Tim Tebow who takes risks more optimally is actually a significantly better quarterback than the Q1-Q3 version we’ve seen so far this season, and the 4th quarter pressure situations he has faced have simply brought that out of him.

That may sound farfetched, and I certainly wouldn’t bet my life on it, but it also wouldn’t be unprecedented. Though perhaps a less extreme example, early in his career Ben Roethlisburger played on a Pittsburgh team that relied mostly on its defense, and was almost painfully conservative in the passing game. He won a ton, but with superficially unimpressive stats, a fairly low interception rate, and loads of “clutch” performances. His rookie season he passed for only 187 yards a game, yet had SIX 4th quarter comebacks. Obviously, he eventually became regarded as an elite QB, with statistics to match.

A lot of professional athletes are *not* clutch, or, more specifically, are anti-clutch. See, e.g., professional kickers. They succumb under pressure, just as any non-professionals might. While most professionals probably have a much greater capacity for handling pressure situations than amateurs, there are still significant relative imbalances between them. The athletes who do NOT choke under pressure are thus, by comparison, clutch.

Some athletes may be more “mentally tough” than others. I love Roger Federer, and think he is among the top two tennis player of all time (Bjorn Borg being the other), and in many ways I even think he is under-appreciated despite all of his accolades. Yet, he has a pretty crap record in the closest matches, especially late in majors: lifetime, he is 4-7 in 5 set matches in the Quarterfinals or later, including a 2-4 record in his last 6. For comparison, Nadal is 4-1 in similar situations (2-1 against Federer), and Borg won 5-setters at an 86% clip.

Extremely small sample, sure. But compared to Federer’s normal expectation on a set by set basis over the time-frame (even against tougher competition), the binomial probability of him losing that much without significantly diminished 5th set performance is extremely low:

Thus, as a Bayesian matter, it’s likely that a portion of Rafael Nadal’s apparent “clutchness” can be attributed to Roger Federer.

In the finale to my Rodman series, I discussed a fictional player named “Bjordson,” who is my amalgamation of Michael Jordan, Larry Bird, and Magic Johnson, and I noted that this player has a slightly higher Win % differential than Rodman.

Now, I could do a whole separate post (if not a whole separate series) on the issue, but it’s interesting that Bjordson also has an extremely high X-Factor: that is, the average difference between their actual Win % differential and the Win % differential that would be predicted by their Margin of Victory differential is, like Rodman’s, around 10% (around 22.5% vs. 12.5%). [Note: Though the X-Factors are similar, this is subjectively a bit less surprising than Rodman having such a high W% diff., mostly because I started with W% diff. this time, so some regression to the mean was expected, while in Rodman’s case I started with MOV, so a massively higher W% was a shocker. But regardless, both results are abnormally high.]

Now, I’m sure that the vast majority of sports fans presented with this fact would probably just shrug and accept that Jordan, Bird and Johnson must have all been uber-clutch, but I doubt it. Systematically performing super-humanly better than you are normally capable of is extremely difficult, but systematically performing worse than you are normally capable of is pretty easy. Rodman’s high X-Factor was relatively easy to understand (as Reverse Clutch), but these are a little trickier.

Call it speculation, but I suspect that a major reason for this apparent clutchiness is that being a super-duper-star has its privileges. E.g.:

In other words, ref bias may help super-stars win even more than their super-skills would dictate.

I put Tim Tebow in the chart above as perhaps having a bit of “reputational clutch” as well, though not because of officiating. Mostly it just seemed that, over the last few weeks, the Tebow media frenzy led to an environment where practically everyone on the field was going out of their minds—one way or the other—any time a game got close late.

Numbers 4 and 5 in the chart above are pretty closely related. The main distinction is that #4 can be role-based and doesn’t necessarily imply any particular advantage. In fact, you could have a relatively poor player overall who, by virtue of their specific skillset, becomes significantly more valuable in endgame situations. E.g., closing pitchers in baseball: someone with a comparatively high ERA might still be a good “closing” option if they throw a high percentage of strikeouts (it doesn’t matter how many home runs you normally give up if a single or even a pop-up will lose the game).

Straddling 4 and 5 is one of the most notorious “clutch” athletes of all time: Reggie Miller. Many years ago, I read an article that examined Reggie’s career and determined that he *wasn’t* clutch because he hit an relatively *normal* percentage of 3 point shots in clutch situations. I didn’t even think about it at the time, but I wish I could find the article now, because, if true, it almost certainly proves exactly the opposite of what the authors intended.

The amazing thing about Miller is that his jump shot was *so* ugly. My theory is that the sheer bizarreness of his shooting motion made his shot extremely hard to defend (think Hideo Nomo in his rookie year). While this didn’t necessarily make him a great shooter under *normal* circumstances, he could suddenly become extremely valuable in any situations where there is no time to set up a shot and heavy perimeter defense is a given. Being able to hit ANY shots under those conditions is a “clutch” skill.

Though other types of skills can fit into this branch of the tree, I think endgame tactics is the area where teams, coaches, and players are most likely to have disparate impacts, thus leading to significant advantages w/r/t winning. The simple fact is that endgames are very different from the rest of games, and require a whole different mindset. Meanwhile, leagues select for people with a wide variety of skills, leaving some much better at end-game tactics than others.

Win expectation supplants point expectation. If you’re behind, you have to take more risks, and if you’re ahead, you have to avoid risks—even at the cost of expected value. If you’re a QB, you need to consider the whole range of outcomes of a play more than just the average outcome or the typical outcome. If you’re a QB who is losing, you need to throw pride out the window and throw interceptions! There is clock management, knowing when to stay in bounds and when to go down. As a baseball manager, you may face your most difficult pitching decisions, and as a pitcher, you may have to make unusual pitch decisions. A batter may have to adjust his style to the situation, and a pitcher needs to anticipate those adjustments. Etc., etc., ad infinitum. They may not be as flashy as Reggie Miller 3-ball, but these little things add up, and are probably the most significant source of Clutchness in sports.

I listed this separately (rather than as an example of 4 or 5) just because I think it’s not as simple and neat as it seems.

While conditioning and fitness are important in every sport, and they tend to be more important later in games, they’re almost too pervasive to be “clutch” as I described it above. The fact that most major team sports have more or less uniform game lengths means that conditioning issue should manifest similarly basically *every night*, and should therefore be reflected in most conventional statistics (like minutes played, margin of victory, etc), not just in those directly related to winning.

Ultimately, I think conditioning has the greatest impact on “clutchness” in Tennis, where it is often the deciding factor in close matches

And finally, we get to the Holy Grail of Clutch. This is probably what most “skeptics” are thinking of when they deny the existence of Clutch, though I think that such denials—even with this more limited scope—are generally overstated. If such a quality exists, it is obviously going to be extremely rare, so the various statistical studies that fail to find it prove very little.

The most likely example in mainstream sports would seem to be pre-scandal Tiger Woods. In his prime, he had an advantage over the field in nearly every aspect of the game, but golf is a fairly high variance sport, and his scoring average was still only a point or two lower than the competition. Yet his Sunday prowess is well documented: He has gone 48-4 in PGA tournaments when entering the final round with at least a share of the lead, including an 11-1 record with *only* *a share* of the lead. Also, to go a bit more esoteric, Woods has successfully defended a title 22 times. So, considering he has 71 career wins, and at least 22 of them had to be first timers, that means his title defense record is closer to 40-45%, depending on how often he won titles many times in a row. Compare this to his overall win-rate of 27%, and the idea that he was able to elevate his game when it mattered to him the most is even more plausible.

Of course, I still contend that the most clutch thing I have ever seen is Packattack’s final jump onto the .1 wire in his legendary A11 run. Tim Tebow, eat your heart out!

]]>For now, the big news is that Major League Baseball is finally going to have realignment, which will most likely lead to an extra playoff team, and a one game Wild Card series between the non–division winners. I’m not normally one who tries to comment on current events in sports (though, out of pure frustration, I almost fired up WordPress today just to take shots at Tim Tebow—even with nothing original to say), but this issue has sort of a counter-intuitive angle to it that motivated me to dig a bit deeper.

Conventional wisdom on the one game playoff is pretty much that it’s, well, **super** crazy. E.g., here’s Jayson Stark’s take at ESPN:

But now that the alternative to finishing first is a ONE-GAME playoff? Heck, you’d rather have an appendectomy than walk that tightrope. Wouldn’t you?

Though I think he actually *likes* the idea, precisely *because* of the loco factor:

So a one-game, October Madness survivor game is what we’re going to get. You should set your DVRs for that insanity right now.

In the meantime, we all know what the potential downside is to this format. Having your entire season come down to one game isn’t fair. Period.

I wouldn’t be too sure about that. What is fair? As I’ve noted, MLB playoffs are basically a crapshoot anyway. In my view, any move that MLB can make toward having the more accomplished team *win more often* is a positive step. And, as crazy as it sounds, that is likely exactly what a one game playoff will do.

The reason is simple: home field advantage. While smaller than in other sports, the home team in baseball still wins around 55% of the time, and more games means a smaller *percentage* of your series games played at home. While longer series’ eventually lead to better teams winning more often, the margins in baseball are so small that it takes a significant edge for a team to prefer to play ANY road games:

^{Note: I calculated these probabilities using my favorite binom.dist function in Excel. Specifically, where the number of games needed to win a series is k, this is the sum from x=0 to x=k of the p(winning x home games) times p(winning at least k-x road games).}

So assuming each team is about as good as their records (which, regardless of the accuracy of the assumption, is how *they deserve* to be treated), a team needs about a 5.75% generic advantage (around 9-10 games) to prefer even a *seven* game series to a single home game.

But what about the incredible injustice that could occur when a *really* good team is forced to play some scrub? E.g., Stark continues:

It’s a lock that one of these years, a 98-win wild-card team is going to lose to an 86-win wild-card team. And that will really, really seem like a miscarriage of baseball justice. You’ll need a Richter Scale handy to listen to talk radio if that happens.

But you know what the answer to those complaints will be?

“You should have finished first. Then you wouldn’t have gotten yourself into that mess.”

Stark posits a 12 game edge between two wild card teams, and indeed, this *could* lead to a *slightly* worse spot for the better team than a longer series. 12 games corresponds to a 7.4% generic advantage, which means a 7-game series would improve the team’s chances by about 1% (oh, the humanity!). But the alternative almost certainly wouldn’t be seven games anyway, considering the first round of the playoffs is already only five. At that length, the “miscarriage of baseball justice” would be about 0.1% (and vs. 3 games, sudden death is still preferable).

If anything, consider the implications of the massive gap on the *left* side of the graph above: If anyone is getting screwed by the new setup, it’s not the team with the better record, it’s a *better team* with a *worse* record, who won’t get as good a chance to demonstrate their *actual* superiority (though that team’s chances are still around 50% better than they would have been under the current system). And those are the teams that really did “[get themselves] into that mess.”

Also, the scenario Stark posits is *extremely* unlikely: basically, the difference between 4th and 5th place is never 12 games. For comparison, this season the difference between the *best* record in the NL and the Wild Card Loser was only 13 games, and in the AL it was only seven. Over the past ten seasons, each Wild Card team and their 5th place finisher were separated by an average of 3.5 games (about 2.2%):

Note that no cases over this span even rise above the seven game “injustice line” of 5.75%, much less to the nightmare scenario of 7.5% that Stark invokes. The standard deviation is about 1.5%, and that’s *with* the present imbalance of teams (note that the AL is pretty consistently higher than the NL, as should be expected)—after realignment, this plot should tighten even further.

Indeed, considering the typically small margins between contenders in baseball, on average, this “insane” sudden death series may end up being the fairest round of the playoffs.

]]>^{Note: Data points are QB’s in the Super Bowl era who were drafted #1 overall and started at least half of their team’s games as rookies (excluding Matthew Stafford and Sam Bradford for lack of ripeness). Peyton Manning and Jim Plunkett each threw 4.9% interceptions and won one Super Bowl, so I slightly adjusted their numbers to make them both visible, though the R-squared value of .7287 is accurate to the original (a linear trend actually performs slightly better—with an R-squared of .7411—but I prefer the logarithmic one aesthetically).
}

Notice the relationship is almost *perfectly* ironic: Excluding Steve Bartowski (5.9%), no QB with a lower interception percentage has won more Super Bowls than *any* QB with a higher one. Overall (including Steve B.), the seven QB’s with the highest rates have 12 Super Bowl rings, or an average of 1.7 per (and obv the remaining six have none). And it’s not just Super Bowls: those seven also have 36 career Pro Bowl selections between them (average of 5.1), to just seven for the remainder (average of 1.2).

As for significance, obviously the sample is tiny, but it’s large enough that it would be an astounding statistical artifact if there were actually *nothing* behind it (though I should note that the symmetricality of the result would be remarkable even *with* an adequate explanation for its “ironic” nature). I have some broader ideas about the underlying dynamics and implications at play, but I’ll wait to examine those in a more robust context. Besides, rank speculation is fun, so here are a few possible factors that spring to mind:

**Potential for selection effect**: Most rookie QB’s who throw a lot of interceptions get benched. Teams may be more likely to let their QB continue playing when they have more confidence in his abilities—and presumably such confidence correlates (at least to some degree) with actually*having*greater abilities.**The San Antonio gambit**: Famously, David Robinson missed most of the ’96-97 NBA season with back and foot injuries, allowing the Spurs to bomb their way into getting Tim Duncan, sending the most coveted draft pick in many years to a team that, when healthy, was already somewhat of a contender (also preventing a drool-worthy Iverson/Duncan duo in Philadelphia). Similarly, if a quality QB prospect bombs out in his rookie campaign—for whatever reason, including just “running bad”—his team may get all of the structural and competitive advantages of a true bottom-feeder (such as higher draft position), despite actually having 1/3 of a quality team (*i.e.*, a good quarterback) in place.**Gunslingers are just better**: This is my favorite possible explanation, natch. There are a lot of variations, but the most basic idea goes like this: While ultimately a good QB on a good team will end up having lower interception rates, interceptions are not necessarily bad. Much like going for it on 4th down, often the best win-maximizing choice that a QB can make is to “gamble”—that is, to risking turning the ball over when the reward is appropriate. This can be play-dependent (like deep passes with high upsides and low downsides), or situation-dependent (like when you’re way behind and need to give yourself the chance to get lucky to have a chance to win).*E.g.*: In defense of Brett Favre—who, in crunch time, could basically be counted on to deliver you either a win or multiple “ugly” INT’s—I’ve quipped: If a QB loses a game*without*throwing 4 interceptions, he probably isn’t trying hard enough. And, of course, this latter scenario should come up*a lot*for the crappy teams that just drafted #1 overall:*I.e.*, when your rookie QB is going 4-12 and*isn’t*throwing 20 interceptions, he’s probably doing something wrong.

[*Edit (9/24/2011) to add*: Considering David Meyer’s comment below, I thought I should make clear that, while my interests and tastes lie with #3 above, I don’t mean to suggest that I endorse it as the most likely or most significant factor contributing to this particular phenomenon (or even the broader one regarding predictivity of rookie INT%). While I do find it meaningful and relevant that this result is consistent with and supportive of some of my wilder thoughts about interceptions, risk-taking, and quarterbacking, overall I think that macroscopic factors are more likely to be the driving force in this instance.]

For the record, here are the 13 QB’s and their relevant stats:

[table “7” not found /]]]>

So it comes down to this: With Rodman securely in the Hall of Fame, and his positive impact conclusively demonstrated by the most skeptical standards of proof I can muster, what more is there to say? Repeatedly, my research on Rodman has led to unexpectedly extreme discoveries: Rodman was not just a great rebounder, but the greatest of all time—bar none. And despite playing mostly for championship contenders, his differential impact on winning was still the greatest measured of any player with data even remotely as reliable as his. The least generous interpretation of the evidence still places Rodman’s value well within the realm of the league’s elite, and in Part 4(a) I explored some compelling reasons why the more generous interpretation may be the most plausible.

Yet even that more generous position has its limitations. Though the pool of players I compared with Rodman was broadly representative of the NBA talent pool on the whole, it lacked a few of the all-time greats—in particular, the consensus greatest: Michael Jordan. Due to that conspicuous absence, as well as to the considerable uncertainty of a process that is better suited to proving broad value than providing precise individual ratings, I have repeatedly reminded my readers that, even though Rodman kept topping these lists and metrics, I did *NOT* mean to suggest that Rodman was actually greater than the greatest of them all. In this final post of this series, I will consider the opposite position: that there is a plausible argument (with evidence to back it up) that Rodman’s astounding win differentials—even taken completely at face value—may still understate his true value by a potentially game-changing margin.

First off, this argument was supposed to be an afterthought. Just a week ago—when I thought I could have it out the next morning—it was a few paragraphs of amusing speculation. But, as often seems to be the case with Dennis Rodman-related research, my digging uncovered a bit more than I expected.

The main idea has its roots in a conversation I had (over bruschetta) with a friend last summer. This friend is not a huge sports fan, nor even a huge stats geek, but he has an extremely sharp analytical mind, and loves, *loves* to tear apart arguments—and I mean that literally: He has a Ph.D. in Rhetoric. In law school, he was the guy who annoyed everyone by challenging almost everything the profs ever said—and though I wouldn’t say he was usually right, I would say he was usually onto something.

That night, I was explaining my then-brand new “Case for Dennis Rodman” project, which he was naturally delighted to dissect and criticize. After painstakingly laying out most of The Case—of course having to defend and explain many propositions that I had been taking for granted and needing to come up with new examples and explanations on the fly, just to avoid sounding like an idiot (seriously, talking to this guy can be intense)—I decided to try out this rhetorical flourish that made a lot of sense to me intuitively, but which had never really worked for anyone previously:

“Let me put it this way: Rodman was by far the best

third-bestplayer in NBA History.”

As I explained, “third best” in this case is sort of a term of art, not referring to quality, but to a player’s role on his team. *I.e.*, not the player a team is built around (1st best), or even the supporting player in a “dynamic duo” (like HOF 2nd-besters Scotty Pippen or John Stockton), but the guy who does the dirty work, who mostly gets mentioned in contexts like, “Oh yeah, who else was on that [championship] team? Oh that’s right, Dennis Rodman”).

“Ah, so how valuable is the best third-best player?”

At the time, I hadn’t completely worked out all of the win percentage differentials and other fancy stats that I would later on, but I had done enough to have a decent sense of it:

“Well, it’s tough to say when it’s hard to even define ‘third-best’ player, but [blah blah, ramble ramble, inarticulate nonsense] I guess I’d say he easily had 1st-best player value, which [blah blah, something about diminishing returns, blah blah] . . . which makes him the best 3rd-best player by a wide margin”.

“How wide?”

“Well, it’s not like he’s as valuable as Michael Jordan, but he’s the best 3rd-best player by a wider margin than Jordan was the best 1st-best player.”

“So you’re saying he was better than Michael Jordan.”

“No, I’m not saying that. Michael Jordan was clearly better.”

“OK, take a team with Michael Jordan and Dennis Rodman on it. Which would hurt them more, replacing Michael Jordan with the next-best primary scoring option in NBA history, or replacing Rodman with the next-best defender/rebounder in NBA history?”

“I’m not sure, but probably Rodman.”

“So you’re saying a team should dump Michael Jordan before it should dump Dennis Rodman?”

“Well, I don’t know for sure, I’m not sure exactly how valuable other defender-rebounders are, but regardless, it would be weird to base the whole argument on who happens to be the 2nd-best player. I mean, what if there were two Michael Jordan’s, would that make him the least valuable starter on an All-Time team?”

“Well OK, how common are primary scoring options that are in Jordan’s league value-wise?”

“There are none, I’m pretty sure he has the most value.”

“BALLPARK.”

“I dunno, there are probably between 0 and 2 in the league at any given time.”

“And how common are defender/rebounder/dirty workers that are in Rodman’s league value-wise?”

“There are none.”

“BALLPARK.”

“There are none. Ballpark.”

“So, basically, if a team had Michael Jordan and Dennis Rodman on it, and they could replace either with some random player ‘in the ballpark’ of the next-best player for their role, they should dump Jordan before they dump Rodman?”

“Maybe. Um. Yeah, probably.”

“And I assume that this holds for anyone other than Jordan?”

“I guess.”

“So say you’re head-to-head with me and we’re drafting NBA All-Time teams, you win the toss, you have first pick, who do you take?”

“I don’t know, good question.”

“No, it’s an easy question. The answer is: YOU TAKE RODMAN. You just said so.”

“Wait, I didn’t say that.”

“O.K., fine, I get the first pick. I’ll take Rodman. . . Because YOU JUST TOLD ME TO.”

“I don’t know, I’d have to think about it. It’s possible.”

Up to this point, I confess, I’ve had to reconstruct the conversation to some extent, but these last two lines are about as close to verbatim as my memory ever gets:

“So there you go, Dennis Rodman is the single most valuable player in NBA History. There’s your argument.”

“Dude, I’m not going to make that argument. I’d be crucified. Maybe, like, in the last post. When anyone still reading has already made up their mind about me.”

And that’s it. Simple enough, at first, but I’ve thought about this question a lot between last summer and last night, and it still confounds me: Could being the best “3rd-best” player in NBA history *actually *make Rodman the *best* player in NBA history? For starters, what does “3rd-best” even mean? The argument is a semantic nightmare in its own right, and an even worse nightmare to formalize well enough to investigate. So before going there, let’s take a step back:

At the time of that conversation, I hadn’t yet done my league-wide study of differential statistics, so I didn’t know that Rodman would end up having the highest I could find. In fact, I pretty much assumed (as common sense would dictate) that most star-caliber #1 players with a sufficient sample size would rank higher: after all, they have a greater number of responsibilities, they handle the ball more often, and should thus have many more opportunities for their reciprocal advantage over other players to accumulate. Similarly, if a featured player can’t play—potentially the centerpiece of his team, with an entire offense designed around him and a roster built to supplement him—you would think it would leave a gaping hole (at least in the short-run) that would be reflected heavily in his differentials. Thus, I assumed that Rodman probably wouldn’t even “stat out” as the best Power Forward in the field, making this argument even harder to sell. But as the results revealed, it turns out feature players are replaceable after all, and Rodman does just fine on his own. However, there are a couple of caveats to this outcome:

First, without much larger sample sizes, I wouldn’t say that game-by-game win differentials are precise enough to settle disputes between players of similar value. For example, the standard deviation for Rodman’s 22% adjusted win differential is still 5% (putting him less than a full standard deviation above some of the competition). This is fine for concluding that he was extremely valuable, but it certainly isn’t extreme enough to outright prove the seemingly farfetched proposition that he was actually the most valuable player overall. The more unlikely you believe that proposition to be, the less you should find this evidence compelling—this is a completely rational application of Bayes’ Theorem—and I’m sure most of you, *ex ante*, find the proposition very very unlikely. Thus, to make any kind of argument for Rodman’s superiority that anyone but the biggest Rodman devotees would find compelling, we clearly need more than win differentials.

Second, it really is a shame that a number of the very best players didn’t qualify for the study—particularly the ultimate Big Three: Michael Jordan, Magic Johnson, and Larry Bird (who, in maybe my favorite stat ever, never had a losing month in his entire career). As these three are generally considered to be in a league of their own, I got the idea: if we treated them as one player, would their combined sample be big enough to make an adequate comparison? Well, I had to make a slight exception to my standard filters to allow Magic Johnson’s 1987 season into the mix, but here are the results:

Adjusted Win percentage differential is Rodman’s most dominant value stat, and here, finally, Herr Bjordson edges him. Plus this may not fully represent these players’ true strength: the two qualifying Jordan seasons are from his abrupt return in 1994 and his first year with the Wizards in 2001, and both of Bird’s qualifying seasons are from the last two of his career, when his play may have been hampered by a chronic back injury. Of course, just about any more-conventional player valuation system would rank these players above (or way above) Rodman, and even my own proprietary direct “all-in-one” metric puts these three in their own tier with a reasonable amount of daylight between them and the next pack (which includes Rodman) below. So despite having a stronger starting position in this race than I would have originally imagined, I think it’s fair to say that Rodman is still starting with a considerable disadvantage.

So let’s assume that at least a few players offer more direct value than Dennis Rodman. But building a Champion involves more than putting together a bunch of valuable players: to maximize your chances of success, you must efficiently allocate a variety of scare resources, to obtain as much realized value as possible, through a massively complicated set of internal and external constraints.

For example, league rules may affect how much money you can spend and how many players you can carry on your roster. Game rules dictate that you only have so many players on the floor at any given time, and thus only have so many minutes to distribute. Strategic realities require that certain roles and responsibilities be filled: normally, this means you must have a balance of talented players who play different positions—but more broadly, if you hope to be successful, your team must have the ability to score, to defend, to rebound, to run set plays, to make smart tactical maneuvers, and to do whatever else that goes into winning. All of these little things that your team has to do can also be thought of as a limited resource: in the course of a game, you have a certain number of things to be done, such as taking shots, going after loose balls, setting up a screens, contesting rebounds, etc. Maybe there are 500 of these things, maybe 1000, who knows, but there are only so many to go around—and just as with any other scarce resource, the better teams will be the ones that squeeze the most value out of each opportunity.

Obviously, some players are better at some things than others, and may contribute more in some areas than others—but there will always be trade-offs. No matter how good you are, you will always occupy a slot on the roster and a spot on the floor, every shot you take or every rebound you get means that someone else can’t take that shot or get that rebound, and every dollar your team spends on you is a dollar they can’t spend on someone else. Thus, there are two sides to a player’s contribution: how much surplus value he provides, and how much of his team’s scarce resources he consumes.

The key is this: While most of the direct value a player provides is observable, either directly (through box scores, efficiency ratings, etc.) or indirectly (Adjusted +/-, Win Differentials), many of his *costs* are concealed.

Two players may provide seemingly identical value, but at different costs. In very limited contexts this can be extremely clear: thought it took a while to catch on, by now all basketball analysts realize that scoring 25 points per game on 20 shots is better than scoring 30 points a game on 40 shots. But in broader contexts, it can be much trickier. For example, with a large enough sample size, Win Differentials should catch almost anything: everything good that a player does will increase his team’s chances of winning when he’s on the floor, and everything bad that he does will decrease his team’s chances of losing when he’s not. Shooting efficiency, defense, average minutes played, psychological impact, hustle, toughness, intimidation—no matter how abstract the skill, it should still be reflected in the aggregate.

No matter how hard the particular skill (or weakness) is to identify or understand, if its consequences would eventually impact a player’s win differentials, (for these purposes) its effects are *visible*.

But there are other sources of value (or lack thereof) which won’t impact a player’s win differentials—these I will call “invisible.” Some are obvious, and some are more subtle:

“Return on Investment” is the prototypical example of invisible value, particularly in a salary-cap environment, where every dollar you spend on one player is a dollar you can’t spend on another. No matter how good a player is, if you give up more to get him than you get from him in return, your team suffers. Similarly, if you can sign a player for much less than he is worth, he may help your team more than other (or even better) players who would cost more money.

This value is generally “invisible” because the benefit that the player provides will only be realized when he plays, but the cost (in terms of limiting salary resources) will affect his team whether he is in the lineup or not. And Dennis Rodman was basically always underpaid (likely because the value of his unique skillset wasn’t fully appreciated at the time):

^{Note: For a fair comparison, this graph (and the similar one below) includes only the 8 qualifying Shaq seasons from before he began to decline.}

Aside from the obvious, there are actually a couple of interesting things going on in this graph that I’ll return to later. But I don’t really consider this a primary candidate for the “invisible value” that Rodman would need to jump ahead of Jordan, primarily for two reasons:

First, return on investment isn’t quite as important in the NBA as it is in some other sports: For example, in the NFL, with 1) so many players on each team, 2) a relatively hard salary cap (when it’s in place, anyway), and 3) no maximum player salaries, ROI is perhaps the single most important consideration for the vast majority of personnel decisions. For this reason, great NFL teams can be built on the backs of many underpaid good-but-not-great players (see my extended discussion of fiscal strategy in major sports here).

Second, as a subjective matter, when we judge a player’s quality, we don’t typically consider factors that are external to their actual athletic attributes. For example, a great NFL quarterback could objectively hurt his team if he is paid too much, but we still consider him great. When we ask “who’s the best point guard in the NBA,” we don’t say, “IDK, how much more does Chris Paul get paid than Jason Kidd?” Note this is basically a social preference: It’s conceivable that in some economically-obsessed culture, this sort of thing really would be the primary metric for player evaluation. But personally, and for the purposes of my argument, I prefer our more traditional values on this one.

In the “perfect timing” department, a commenter “Siddy Hall” recently raised a hypothetical very similar to my friend’s:

You get 8 people in a room, all posing as GM’s. We’re allowed to select 5 players each from the entire history of the NBA. Then we’ll have a tournament. At PF, I would grab Rodman. And I’m confident that I’d win because he’s on my team. He’d dominate the glass and harass and shutdown a superstar. I think he’s the finest PF to ever play the game.

Of course, you need to surround him with some scorers, but when is that ever a problem?

The commenter only *explicitly* goes so far as to say that Rodman would be the most valuable power forward. Yet he says he is “confident” that he would win, with the only caveat being that his team gets other scorers (which is a certainty). So, he thinks Rodman is the best PF by a wide enough margin that his team would be a favorite against the team that got Michael Jordan. Let me play the role of my friend above: whether he means to or not, he’s basically saying that Rodman is more valuable than Jordan.

In this example, “position” is the scarce resource. Just as a player can be valuable *for* the amount of money the team must spend on him, he can also be valuable *for his position*. But this value can be visible, invisible, or both.

This is probably easiest to illustrate in the NFL, where positions and responsibilities are extremely rigid. An example I used in response to the commenter is that an NFL kicker who could get you 2 extra wins per season could be incredibly valuable. These two extra wins obviously have *visible* value: By definition, this is a player for whom we would expect to observe a 2 game per season win differential. But there’s another, very important way in which this player’s value would be much greater. As I said in response to the commenter, a +2 kicker could even be more valuable than a +4 quarterback.

In order to play the 2 win kicker, the only cost is your kicker slot, which could probably only get you a fraction of a win even if you had one of the best in the league on your team (relevant background note: kickers normally don’t contribute much, particularly since bad kickers likely influence their teams to make better tactical decisions, and vice-versa). But to play a 4-win quarterback, the cost is your quarterback slot. While the average QB and the average kicker are both worth approximately 0 games, good quarterbacks are often worth much more, and good kickers are worth very little.

Put most simply, because there are no other +2 kickers, that kicker could get 2 wins for virtually ANY team. The +4 QB would only provide 2 wins for teams who would be unable to acquire a +2 quarterback by other means. Or you can think about it conversely: Team A signs the kicker, and Team B signs the QB. For the moment, Team B might appear better, but the most value they will ever be able to get out of their QB/Kicker tandem is +4 games plus epsilon. Team A, on the other hand, can get more value out of their QB/kicker combo than Team B simply by signing any QB worth +2 or greater, who are relatively common.

Why does this matter? Well, in professional sports, we care about one thing more than any other: championships. Teams that win championships do so by having the best roster with the most value. Players like our special kicker provide unique avenues to surplus value that even great other players can’t.

To generalize a bit, you could say that value vs. a replacement player is generally visible, as it will be represented in win differentials no matter who you play for. But a player’s value relative to the entire distribution of players at his position can lead to substantial invisible benefits, as it can substantially improve his team’s ability to build a championship contender.

Unfortunately, in basketball, such distinctions are much more nebulous. Sure, there are “positions,” but the spot where you line up on the floor is very different from the role you play. E.g., your primary scoring responsibilities can come from any position. And even then “roles” are dynamic and loosely defined (if at all)—some roles that are crucial to certain teams don’t even exist on others. Plus, teams win in different ways: you can do it by having 5 options on offense with 5 guys that can do everything (OK, this doesn’t happen very often, but the Pistons did it** **in 03-04), or you can be highly specialized and try to exploit the comparative advantages between your players (this seems to be the more popular model of late).

Rodman was a specialist. He played on teams that, for the most part, didn’t ask him to do more than what he was best at—and that probably helped him fully leverage his talents. But the truly amazing part is how much of a consistent impact he could have, on such a variety of different teams, and with seemingly *so few* responsibilities.

So let’s posit a particular type of invisible value and call it “I-Factor,” with the following elements:

- It improves your team’s chances of building a championship contender.
- It wouldn’t be reflected in your game-to-game win differential.
- It stems from some athletic or competitive skill or attribute.

In the dialogue above, I suggested that Rodman had an inordinate positive impact for a “3rd-best” player, and my friend suggested (insisted really) that this alone should vault him above great but more *ordinary* “1st-best” players, even if they had significantly more observable impact. Putting these two statements together, we have an examinable hypothesis: That Dennis Rodman’s value relative to his role constituted a very large “I-Factor.”

Because the value we’re looking for is (by definition) invisible, its existence is ridiculously hard—if not impossible—to prove empirically (which is why this argument is the dessert instead of the main course of this series).

However, there could be certain signs and indicators we can look for that would make the proposition more likely: specifically, things that would seem unusual or unlikely if the hypothesis were false, but which could be explainable either as causes or effects of the hypothesis being true.

Since the hypothesis posits both an effect (very large I-Factor), and a cause (unusually high value for his role), we should primarily be on the lookout for two things: 1) any interesting or unusual patterns that could be explainable as a consequence of Rodman having a large I-Factor, and 2) any interesting or unusual anomalies that could help indicate that Rodman had an excessive amount of value for his role.

To lighten the mood a bit, let’s start this section off with a riddle:

Q. What do you get for the team that has everything?

A. Dennis Rodman.

Our hypothetical Rodman I-Factor is much like that of our hypothetical super-kicker in the NFL example above. The reason that kicker was even more valuable than the 2 wins per season he could get you is that he could get those 2 wins for anyone. Normally, if you have a bunch of good players and you add more good players, the whole is less than the sum of its parts. In the sports analytics community, this is generally referred to as “diminishing returns.” An extremely simple example goes like this: Having a great quarterback on your team is great. Having a second great quarterback is maybe mildly convenient. Having a third great quarterback is a complete waste of space. But if you’re the only kicker in the league who is worth anywhere near 2 wins, your returns will basically *never* be diminished. In basketball, roles and responsibilities aren’t nearly as wed to positions as they are in football, but the principle is the same. There is only one ball, and there are only so many responsibilities: If the source of one player’s value overlaps the source of another’s, they will *both* have less impact. Thus, if Rodman’s hypothetical I-Factor were real, one thing we might expect to find is a similar lack of diminishing returns—in other words, an unusual degree of consistency.

And indeed, Rodman’s impact was remarkably consistent. His adjusted win differential held at between 17% and 23% for 4 different teams, all of whom were championship contenders to one extent or another. Obviously the Bulls and Pistons each won multiple championships. The two years that Rodman spent with the pre-Tim-Duncan-era Spurs, they won 55 and 62 games respectively (the latter led the league that season, though the Spurs were eliminated by eventual-champion Houston in the Western Conference Finals). In 1999, Rodman spent roughly half of the strike-shortened season on the Lakers; in that time the Lakers went 17-6, matching San Antonio’s league-leading winning percentage. But, in a move that was somewhat controversial with the Lakers players at the time, Rodman was released before the playoffs began, and the Lakers fell in the 2nd round—to the eventual-champion Spurs.

But consistency should only be evidence of invisible value if it is *unusual*—that is, if it exists where we wouldn’t expect it to. So let’s look at Rodman’s consistency from a couple of different angles:

The following graph is similar to my ROI graph above, except instead of mapping the player’s salary to his win differential, I’m mapping the *rest of the team’s* salary to his win differential:

^{Note: Though obviously it’s only one data point and doesn’t mean anything, I find it amusing that the one time Shaq played for a team that had a full salary-cap’s worth of players without him, his win differential dropped to the floor.}

So, basically, whether Rodman’s teams were broke or flush, his impact remained fairly constant. This is consistent with unusually low diminishing returns.

A potential objection I’ve actually heard a couple of times is that perhaps Rodman was able to have the impact he did because the circumstances he played in were particularly well-suited to never duplicating his skill-set: E.g., both Detroit and Chicago lacked dominant big men. Indeed, it’s plausible that part of his value came from providing the defense/rebounding of a dominant center, maximally leveraging his skill-set, and freeing up his teams to go with smaller, more versatile, and more offense-minded players at other positions (which could help explain why he had a greater impact on offensive efficiency than on defensive efficiency). However, all of this value would be *visible*. Moreover, the assumption that Rodman only played in these situations is false. Not only did Rodman play on very different teams with very different playing styles, he actually played on teams with *every possible combination* of featured players (or “1st and 2nd-best” players, if you prefer):

As we saw above, Rodman’s impact on all 4 teams was roughly the same. This too is consistent with an unusual lack of diminishing returns.

As I’ve said earlier, “role” can be very hard to define in the NBA relative to other sports. But to find meaningful evidence that Rodman provided an inordinate amount of value for his role, we don’t necessarily need to solve this intractable problem: we can instead look for “partial” or “imperfect” proxies. If some plausibly related proxy were to provide an unusual enough result, its actual relationship to the posited scenario could be self-reinforced—that is, the most likely explanation for the extremely unlikely result could be that it IS related to our hypothesis AND that our hypothesis is true.

So one scarce resource that is plausibly related to role is “usage.” Usage Rate is the percentage of team possessions that a player “uses” by taking a shot or committing a turnover. Shooters obviously have higher usage rates than defender/rebounders, and usage generally has little correlation with impact. But let’s take a look at a scatter-plot of qualifying players from my initial differential study (limited to just those who have positive raw win differentials):

The red dot is obviously Dennis Rodman. Bonus points to anyone who said “Holy Crap” in their heads when they saw this graph: Rodman has both the highest win differential and the lowest Usage Rate, once again taking up residence in Outlier Land.

Let’s look at it another way: Treating *possessions* as the scarce resource, we might be interested in how much win differential we get for every possession that a player *uses:*

Let me say this in case any of you forgot to think it this time:

**“Holy Crap!”**

Yes, the red dot is Dennis Rodman. Oh, if you didn’t see it, don’t follow the blue line, it won’t help.

This chart isn’t doctored, manipulated, or tailored in any way to produce that result, and it includes all qualifying players with positive win differentials. If you’re interested, the Standard Deviation on the non-Rodman players in the pool is .19. Yes, that’s right, Dennis Rodman is nearly 4.5 standard deviations above the NEXT HIGHEST player. Hopefully, you see the picture of what could be going on here emerging: If value per possession is any kind of proxy (even an imperfect one) for value relative to role, it goes a long way toward explaining how Rodman was able to have such incredible impacts on so many teams with so many different characteristics.

The irony here is that the very aspect of Rodman’s game that frequently causes people to discount his value (“oh, he only does one thing”) may be exactly the quality that makes him a strong contender for first pick on the all-time NBA playground.

Though the evidence is entirely circumstantial, I find the hypothesis very plausible, which in itself should be shocking. While I may not be ready to conclude that, yes, in fact, Rodman would actually be a more valuable asset to a potential championship contender than Michael freaking Jordan, I don’t think the opposite view is any stronger: That is, when you call that position crazy, conjectural, speculative, or naïve—as some of you inevitably will—I am fairly confident that, in light of the evidence, the default position is really no less so.

In fact, even if this hypothesis isn’t exactly true, I don’t think the next-most-likely explanation is that it’s completely false, and these outlandish outcomes were just some freakishly bizarre coincidence—it would be more likely that there is some alternate explanation that may be even *more* meaningful. Indeed, on some level, some of the freakish statistical results associated with Rodman are *so extreme* that it actually makes me doubt that the best explanation could actually stem from his athletic abilities. That is, he’s just a guy, how could he be so unusually good in such an unusual way? Maybe it actually IS more likely that the groupthink mentality of NBA coaches and execs accidentally DID leave a giant exploitable loophole in conventional NBA strategy; a loophole that Rodman fortuitously stumbled upon by having such a strong aversion to doing any of the things that he wasn’t the best at. If that is the case, however, the implications of this series could be even more severe than I intended.

Despite having spent time in law school, I’m not a lawyer. Indeed, one of the reasons I chose not to be one is because I get icky at the thought of picking sides first, and building arguments later.

In this case, I had strong intuitions about Rodman based on a variety of beliefs I had been developing about basketball value, combined with a number of seemingly-related statistical anomalies in Rodman’s record. Though I am naturally happy that my research has backed up those intuitions—even beyond my wildest expectations—I felt prepared for it to go the other way. But, of course, no matter how hard we try, we are all susceptible to bias.

Moreover, inevitably, certain non-material choices (style, structure, editorial, etc.) have to be made which emphasize the side of the argument that you are trying to defend. This too makes me slightly queasy, though I recognize it as a necessary evil in the discipline of rhetoric. My point is this: though I am definitely presenting a “case,” and it often appears one-sided, I have tried to conduct my research as neutrally as possible. If there is any area where you think I’ve failed in this regard, please don’t hesitate to let me know. I am willing to correct myself, beef up my research, or present compelling opposing arguments alongside my own; and though I’ve published this series in blog form, I consider this Case to be an ongoing project.

If you have any other questions, suggestions, or concerns, please bring them up in the comments (preferably) or email me and I will do my best to address them.

Finally, I would like to thank Nate Meyvis, Leo Wolpert, Brandon Wall, James Stuart, Dana Powers, and Aaron Nathan for the invaluable help they provided me by analyzing, criticizing, and/or ridiculing my ideas throughout this process. I’d also like to thank Jeff Bennett for putting me on this path, Scott Carder for helping me stay sane, and of course my wife Emilia for her constant encouragement.

]]>- Making the finalists this year after failing to make the semi-finalists last year made it more likely that last year’s snub really
*was*more about eligibility concerns than general antipathy or lack of respect toward him as a player. - The list of co-finalists was very favorable. First, Reggie Miller
*not*making the list was a boon, as he could have taken the “best player” spot, and Rodman would have lacked the goodwill to make it as one of the “overdue”—without Reggie, Rodman was clearly the most accomplished name in the field. Second, Chris Mullen being available to take the “overdue” spot was the proverbial “spoonful of sugar” that allowed the bad medicine of Rodman’s selection go down.

Congrats also to Artis Gilmore and Arvydas Sabonis. In my historical research, Gilmore’s name has repeatedly popped up as an excellent player, both by conventional measures (11-time All-Star, 1xABA Champion, 1xABA MVP, led league in FG% 7 times), and advanced statistical ones (NBA career leader in True Shooting %, ABA career leader in Win Shares and Win Shares/48, and a great all-around rebounder). It was actually only a few months ago that I first discovered—to my shock—that he was NOT in the Hall [*Note to self:* cancel plans for “The Case for Artis Gilmore”]. Sabonis was an excellent international player with a 20+ year career that included leading the U.S.S.R. to an Olympic gold medal and winning 8 European POY awards. I remember following him closely when he finally came to the NBA, and during his too-brief stint, he was one of the great per-minute contributors in the league (though obviously I’m not a fan of the stat, his PER over his first 5 season—which were from age 31-35—was 21.7, which would place him around 30th in NBA history). Though his sample size was too small to qualify for my study, his adjusted win percentage differential over his NBA career was a very respectable 9.95%, despite only averaging 24 minutes per game.

I was hesitant to publish Part 4 of this series before knowing whether Rodman made the Hall or not, as obviously the results shape the appropriate scope for my final arguments. So by necessity, this section has changed dramatically from what I initially intended. But I am glad I waited, as this gives me the opportunity to push the envelope of the analysis a *little* bit: Rather than simply wrapping up the argument for Rodman’s Hall-of-Fame candidacy, I’m going to consider some more ambitious ideas. Specifically, I will articulate two plausible arguments that Rodman may have been *even more* valuable than my analysis so far has suggested. The first of these is below, and the second—which is the most ambitious, and possibly the most shocking—will be published Monday morning in the final post of this series.

I am aware that I’ve picked up a few readers since joining “the world’s finest quantitative analysts of basketball” in ESPN’s TrueHoop Stat Geek Smackdown. If you’re new, the main things you need to know about this series are that it’s 1) extremely long (sprawling over 13 sections in 4 parts, plus a Graph of the Day), 2) ridiculously (almost comically) detailed, and 3) only partly about Dennis Rodman. It’s also a convenient vehicle for me to present some of my original research and criticism about basketball analysis.

Obviously, the series includes a lot of superficially complicated statistics, though if you’re willing to plow through it all, I try to highlight the upshots as much as possible. But there *is* a lot going on, so to help new and old readers alike, I have a newly-updated “Rodman Series Guide,” which includes a broken down list of articles, a sampling of some of the most important graphs and visuals, and as of now, a giant new table summarizing the entire series by post, including the main points on both sides of the analysis. It’s too long to embed here, but it looks kind of like this:

As I’ve said repeatedly, this blog isn’t just called “Skeptical” Sports because the name was available: When it comes to sports analysis—from the mundane to the cutting edge—I’m a skeptic. People make interesting observations, perform detailed research, and make largely compelling arguments—which is all valuable. The problems begin when then they start believing too strongly in their results: they defend and “develop” their ideas and positions with an air of certainty far beyond what is objectively, empirically, or logically justified.

With that said, and being *completely* honest, I think The Case For Dennis Rodman is practically overkill. As a skeptic, I try to keep my ideas in their proper context: There are plausible hypotheses, speculative ideas, interim explanations requiring additional investigation, claims supported by varying degrees of analytical research, propositions that have been confirmed by multiple independent approaches, and the things I believe so thoroughly that I’m willing to write 13-part series’ to prove them. That Rodman was a great rebounder, that he was an extremely valuable player, even that he was easily Hall-of-Fame caliber—these propositions all fall into that latter category: they require a certain amount of thoughtful digging, but beyond that they practically prove themselves.

Yet, surely, there must be a whole realm of informed analysis to be done that is probative and compelling but which might fall short of the rigorous standards of “true knowledge.” As a skeptic, there are very few things I would bet *my life* on, but as a gambler—even a skeptical one—there are a much greater number of things I would bet *my money* on. So as my final act in this production, I’d like to present a couple of interesting arguments for Rodman’s greatness that are both a bit more extreme and a bit more speculative than those that have come before. Fortunately, I don’t think it makes them any less important, or any less captivating:

There are two things from my analysis that should be abundantly clear: 1) Dennis Rodman was an great rebounder, and 2) Dennis Rodman had a great impact on winning. Not only do I feel I’ve proven both of these to be true beyond any doubt, I think I’ve proven them to be *even more true* than I could have possibly guessed. Which is to say, Rodman’s rebounding ability and his impact on winning both turned out to be *significantly greater* than I imagined at the outset.

Yet, however incredible those outcomes may have been, they are far from the most shocking. Rather, there are two specific discoveries that blew my mind. They are both important causal elements in my final analysis, but neither has any readily-apparent explanation behind them:

1). The nature-defying lack of tradeoff between Rodman’s offensive and defensive rebounding, as exemplified in this graph from Part 1(b):

The initial post of course, has a more detailed explanation, including a recently-added comparison with some other great rebounders.

And:

2). The fact that Rodman’s observed win percentage differential is so massively higher than what would be predicted by his already extremely high “Margin of Victory” differential. After adjusting for sample-size, the extremeness of his “X-Factor” was demonstrated in this histogram from Part 3(b):

And this graph actually doesn’t even account for the fact that extremely high predicted differentials should be *far more likely* to over-predict rather than under-predict a player’s actual differential. For those unfamiliar with “regression to the mean,” a quick illustration: Imagine 4 well-known sluggers who hit 25, 35, 45, and 55 homeruns in a particular season—who is most likely to have the largest *increase* in their HR total the following season? If you said they guy who hit 55, you are probably one of Barry Bonds’ attorneys.

If you’ve ever watched *House*, you’re probably aware that a key tenet in diagnostics is: When you see two extremely rare and/or unlikely symptoms in the same patient,* *even if they seem completely unrelated,* they are probably related*. Either one is somehow causing the other, or something else causes both, etc. This is a medical application of Occam’s razor, which is frequently misstated as a rule that “the simplest explanation is normally the best one.” There are many permutations of the razor, but here is an earlier and more accurate form:

*pluralitas non est ponenda sine necessitate*

or

*plurality should not be posited without necessity*

In practical terms, it means that we should favor the explanation with the fewest *number of assumptions*. In the medical case, you can either make two assumptions: 1) that the patient has incredibly rare condition *x,* and 2) that he has incredibly rare condition *y*; or one: that there is some condition *z* that you haven’t thought of yet (incredibly rare or not), from which *x* and *y* both follow.

Rodman’s case is no different: We have two incredibly bizarre phenomenon that practically beg for a unifying explanation, or at least an explanation that would make them seem less bizarre. In Part 3(b), I briefly speculated about such a theory: that Rodman may actually have been “Reverse Clutch”—which is to say, he may have played *worse* in situations where the game was *not* on the line. Specifically, the theory goes, to pad his rebounding statistics, Rodman may have played sub-optimally in meaningless situations *by going after extra rebounds*. If he picked his “selfish” spots judiciously—that is, when the outcome of the game really wouldn’t be affected—this would have little to no effect on his Win % differential: he should win the same amount as if he had played optimally throughout. But, his team’s dominating wins would be slightly less dominant, and their bad losses would be slightly worse, which *would* have a negative effect on his MOV differential.

Occam’s razor itself can be seen as a special case in the broader realm of Bayesian inference. But leaving the math and epistemology aside, the practical point is this: An explanation which itself seems like an incredible longshot, may indeed become likely, or even a virtual certainty, if it is the simplest way of explaining other phenomenon.

Before getting into the ins and outs of whether this could be a plausible explanation (or at least explanatory factor) for our two freakish anomalies, we should check to see whether the theory is consistent with available data.

After thinking about how this hypothesis could be tested, I decided to look at the relationship between offensive rebounds per minute and Margin of Victory in “blowout” games (which I’ve arbitrarily set at 10+ points). I chose offensive rebounds for a very specific reason: unlike defensive rebounds, they have virtually no correlation with winning. While getting an offensive rebound is obviously better than not getting one, their strong correlation with *missed shots* cancels this out almost perfectly. So let’s look at a simple scatter-plot for the entire league first. Here are all team games since 1986:

The red line is a trendline through just the blowout losses, and the blue is a trendline just through the blowout wins. They are hard to make out because they are virtually flat: there does not seem to be a meaningful relationship between offensive rebounds per minute and total margin for either team in blowout situations. Now let’s look at exactly the same graph, but just for Dennis Rodman:

We can clearly see upwardly sloping trendlines both in victory and defeat. In other words, in blowout games, the bigger the blowout, the more offensive rebounds Rodman grabbed per minute, *regardless of which side his team was on*. [Perhaps the “X” is for “X-Factor”?] This result is both consistent with and supportive of the theory: The wider the finishing margin, the more likely a game is to feature significant amounts of “junk” or “meaningless” time. And the wider the finishing margin, the more rebounds Rodman grabbed.

For comparison, let’s look at another great offensive rebounder and Hall of Famer, Moses Malone (from 1986 on):

Obviously, these trendlines aren’t increasing in either direction (in fact, they are both *slightly *decreasing). They are both much flatter, and lack the prominent “X” shape that Rodman’s had: Moses Malone doesn’t appear to demonstrate any propensity to become more rebound-happy in “non-crunch” time.

While these sample-sizes are too small to prove anything *on their own*, if this proposed theory is one of the more plausible causes for our two *provably* extreme and independent effects, the data being consistent with and moderately supportive of the hypothesis could easily be enough to lend it considerable credence.

Trying to draw potential conclusions about causal factors based on indirect (Bayesian) analysis like this is always tricky. First, you need to establish that your proposed theory, *if true*, really *would be* a plausible explanation (or significant contributing explanation) for your phenomena. And second, it has to be the *best explanation* possible. Meaning, your conclusion that *z* is the likely cause of *x* and *y* is entirely contingent on their not being a *q* that would be just as good of an explanation or better. In the real world (outside the safety of contrived exam questions), this second condition is much harder to satisfy: Even the most detailed studies, that include the most exhaustive considerations of conceivable factors, can be (and often are), scuttled by the tiniest failure of imagination. So on that front, I don’t have much to say except that I’ve thought about potential causes of these anomalies for a while now and haven’t thought of anything better, yet. [I won’t go into all of the details here, but since someone will probably bring it up if I don’t, I should note that I’ve looked into the theory that Rodman’s lack of tradeoff could be explained by his abnormally low usage rate (which I’ll discuss a bit more in section b). Though this sounded promising on its face, the data didn’t back it up: the relationship between usage and rebounding tradeoff for both the league and Rodman himself is negligible.] Of course, I would love to hear any other suggestions.

On the question of whether *this* explanation is plausible, I think so. As I’ve said before, professional basketball is a game of very small margins: 3 points is the difference between a Ray Allen jumper dropping in or rolling out, yet 3 points per game is the difference between the 2011 Boston Celtics and the 2011 Houston Rockets, and 3 points of MOV differential is what separates Kobe Bryant from Delonte West. I don’t think it would take that much intentional misconduct to shave several points off of a player’s MOV. And if you so desired, playing in a lot of lopsided games—as Rodman did—would give you ample opportunity to do so without meaningful consequences in the W/L columns.

But could it help explain the trade-off anomaly? Though it relies on considerably more speculative reasoning, I find the theory highly plausible in this context as well:

First, I imagine that the tradeoff between offensive and defensive rebounding rates exists as the consequence of very basic strategic choices: Imagine you’re a ball-boy and all you have to do is run out on the court and grab rebounds as efficiently as possible: you’d consistently run to the spot where you think the ball is most likely to go, wait there until the ball hit the rim, and then run wherever it actually goes. But an actual player has many competing factors: First, long before the shot is taken, he has to execute the offensive game-plan to maximize his team’s chances of scoring. Then once the play is no longer in his control, he has to position himself to maximize all potential outcomes: cheating too much toward the basket might leave you out of position when the defense gets the ball and opening up a lane for a quick break, or prevent you from getting back into your half-court defensive position in time. Similarly concerns exist on defense: being out of position could obviously open up a shot for your opponents, but could also hurt your team’s chances on the break, or leave you out of position for an effective transition after a made basket.

We don’t need to get into every gory detail of offensive and defensive strategy, but the main thing to notice is this: basketball is an extremely dynamic game, and the decisions you make have consequences all over the court. And it’s not just rebound positioning, obviously, it’s everything: every movement you make in one direction or another affects where you can be two seconds later, which affects what kind of plans can be developed for 5 seconds later, and so on. When you have a million little tradeoffs like this, the most efficient way to organize them is to group those that work together and complement each other, such as those that lean toward one side of the court or the other.

Of course, if you decided to eschew overall efficiency and go for every rebound regardless of the consequences, you would eliminate the constraining effect of, well, *playing good basketball*.

Indeed, this has always been the primary *theoretical* possibility for how Rodman managed to get so many rebounds on both ends of the floor and it’s by far the most common argument against his incredible rebounding stats: “Rodman didn’t care about anything else, all he did was go for rebounds.” But that argument has a devastating response: Rodman’s freakish rebounding statistics *couldn’t possibly* be the product of bad play, *because he played so damn good. *As I put it in Part 2/4(b):

This is where “The Case for Dennis Rodman Was a Great Rebounder” and “The Case for Dennis Rodman” join paths: Showing that Rodman got a lot of rebounds without also showing that this significantly improved his teams proves neither that he was a great player nor that he was a great rebounder.

But there’s an opening here that I didn’t (openly) consider at the time: Perhaps Rodman *did* sub-optimally inflate his rebounding statistics—at least in non-critical situations—but played *even more damn good* than we imagined, and could thus actually *get away with it*.

Finally, I should be precise about what’s at stake here: From his margin of victory differential alone, Rodman should rank right around the 98th percentile among full-time players. Combining his MOV differential and his Win % differential (using the normal predictive method), moves him up to the mid-99th percentile. And relying on his Win % differential alone would put him approximately in the top 99.98th. Generally, about 5% of full-time players make the Hall of Fame, meaning this would put him in the 99th percentile *of Hall of Famers*: In other words, he would deserve to be a shoo-in to make the Hall of Fame of the Hall of Fame. [If you’re interested in the statistical significance side, Rodman’s W% diff alone would put his p-value at <.0001, more than 3 times lower than Shaq’s.]

Indeed, if this theory—currently one of the only explanations for two independent virtually impossible statistical events, consistent with and supported by a cursory look at the data—is only *partially* true, and Rodman was at least *partially* a “selfish” player who cared more about padding his own rebounding stats than running up the score, not only would it imply that he was better than we thought, but that he should actually be *All-Hall*.

**Monday: Was Dennis Rodman Better Than Michael Jordan?**

The rules are simple: each “expert” calls the winner of each series and the number of games (e.g., Spurs in 6)—5 points are awarded for each correct winner, with an additional 2 points for getting the length as well.

Most of the first round matchups have heavy favorites, so there isn’t too much disagreement on the panel about outcomes. But while researching my picks on Thursday night, I had some interesting findings that seemed a bit at odds with a lot of the others’ comments. So rather than going into the nitty-gritty of each series, I thought I’d summarize a few of these broader instances of divergence. Beware, a lot of this is preliminary stuff. I do think it is all on pretty solid footing, but there is much more to be done:

**1. Form is overrated**

At one point or another, nearly every expert quoted in this article cites a team’s recent good or bad performance as evidence that the team may be better or worse than their overall record would indicate. I’ve been interested in this question for a long time, and have looked at it from many different angles. Ultimately, I’ve concluded that there is no special correlation between late-season performance and playoff success. In fact, the opposite is far more likely.

To examine this issue, I took the last 20 years of regular and post-season data, and broke the seasons down into 20 game quarters. I excluded the last 2 games of each season, which is mathematically more convenient and reduces a lot of tactical distortion (I also excluded games from the 1998-99 strike-shortened season). I then ran a number of regressions comparing regular and post-season performances of playoff teams. There are a lot of different ways to design this regression (should the regression be run on a game-by-game or series-by-series basis? etc.), but literally no permutation I could think of offered any significant support for the conventional approach of favoring recent results. For example, here are the results of a linear regression from wins by quarter-season to playoff series won (taller bars mean more predictive):

Aesthetically pleasing, no? As to why the later part of the season performs so poorly in these tests, it has been suggested that resting players and various other strategic incentives not to maximize winning may be the cause. That is almost certainly true to some extent, but I suspect it also has to do with the playoff structure itself: because of the drawn-out schedule, unvarying opponents, and high stakes, teams are better rested, better prepared, and more psychologically focused—not unlike they are at the *beginning* of each season.

**2. Winning is underrated**

I’ve discussed this previously with respect to randomly-selected regular season games: Stat geeks pay too little attention to winning and focus too much on MOV, SRS, offensive/defensive efficiency, and other snazzy derivatives of the same basic quality. A better way to predict outcomes is to use a combination of winning *and* winning margins, with the latter being weighted slightly more heavily. Interestingly, however, the playoffs completely turn this situation on its head: The difference in regular-season winning percentage between two teams is *much* more predictive of individual playoff game outcomes than the difference between their margins of victory. This holds true for both home and away games (separately), as well as for series outcomes as a whole. For example, here is bar-graph comparing the predictive power of a number of “candidate” variables when combined in a logistic regression (taller bars are more predictive):

This comes from the same 20 year sample, with the regression to 1st round playoff series outcomes. I’ll talk about “PYS” and “pace” a little more below. The key variable here is “SRSdiff” (SRS, or Simple Rating System, is a popular variant on Margin of Victory that accounts for opponent strength). Note that it is close to zero.

So why the difference between the post-season and regular season? As I’ve said before, I think there’s a demonstrable and quantifiable skill to winning that is completely separate from scoring and allowing points. Intuitively, it makes sense to me that this skill would translate into the playoff environment just fine.

**3. Playoff experience matters, at least on the road**

Anecdotally, it always seemed to me that a lot of teams go deep in the playoffs one year, have mediocre regular seasons the next, only to go deep in the playoffs again anyway. To test this, I created a new variable I haphazardly named (PYS) or “Previous Year’s Series.” Basically, you get one point for each playoff series you appeared in, plus one for winning it all. So a team that misses the playoffs gets a 0, and the NBA champion gets a 5. I then calculated the difference between the two teams for each playoff series (the same as with each of the other variables), and tried including and testing it in several more regression models.

What I found is that “PYS” is a useful predictive stat (beyond the team’s winning percentage), but mostly only for away games: that is, teams who won more in the previous year’s playoffs tend to do better in playoff road games than their current season’s winning percentage (and other stats) would normally indicate. Here’s a side-by-side comparison (though note this comes from a regression that includes a few other variables as well):

**4. Modeling 1st round series outcomes:**

The beauty of analyzing first round series is that you can avoid the complications and pitfalls of calculating or simulating a bunch of results. In this case, I chose to do a logistic regression directly onto chances of *winning the series*. After much fiddling, I found that the most accurate and reliable model is relatively simple (which often turns out to be the case). It uses only 3 variables: W% disparity, PYS disparity, and pace disparity, with W% being by far the most important. For the serious math nerds out there, here is the Excel equation:

*=1/(1+exp(-(-0.365+0.235*[pys]+13.235*[W-L%diff]+7.967E-02*[Pacediff])))*

**5. Pace is underrated**

It continues to baffle me how so many sports statisticians go out of their way to purge “pace” from their models (e.g., Hollinger’s team efficiency stats are all “per 100 possessions”).

I’ve said this before and I’ll say it again: your chances of winning are *necessarily* a function of your reciprocal advantage per trade of possession AND the number of such trades you fit into each game. Indeed, pace turns out to be one of only two major variables that are still statistically significant to predicting both game and series outcomes *even after* accounting for win percentage.

**6. Series lengths favor Margin of Victory**

To me, this was probably the most interesting bit of info I’ve unearthed in any of my research in a long time. We should expect series lengths (e.g. 5 games, 6 games, etc.) to be mostly a function of the how big of a favorite the favorite is. All the research points to Win % as being the stronger predictive metric in the playoffs, but as it turns out, MOV is actually the *best* predictor of series *length* (the numbers for this chart came from 3 separate regressions that used each of these metrics independently):

Why this is, so far I can only speculate: MOV rewards *dominance* in victories, while Win% rewards ability to win by whatever means. So, in theory, it’s kind of surprisingly unsurprising that MOV would be better at predicting the likelihood of domination while W% would be better at predicting bottom-line results. But the crazy and fascinating part is that the independent skill of winning actually appears to transfer from *individual games* to *complete series*.

In any case, the best regression to series lengths that I could design used margin of victory disparity, PYS disparity, and one final element:

**7. Lots of 3 point attempts = longer series**

When either team shoots a lot of 3 pointers, the series takes longer on average, but ultimately with the teams winning at about the same rate. One of the benefits of doing regressions onto the entire series outcome instead of to individual game outcomes is that sometimes you can find these relationships that literally couldn’t exist if all the games were truly independent variables. The extra volatility of the 3-point shot ought to be *black-boxed* into their winning percentage (I discuss what I call “black-boxing” in a lengthy tangent here), but it’s not. The implications are ephemeral, but interesting: I think it might be indirect evidence that shooters really *can* run hot and cold for extended periods of time.

**8. In a 7 game series, always pick 5 or 6 games:**

From a purely statistical standpoint, 5 and 6 games are the only two reasonable choices: even for the closest and/or most lopsided matches, the range of expected outcomes is simply not large enough to justify predicting sweeps or game 7’s. The 72-win Bulls 1st round match had an expected series length of around 4.6, which is *almost* small enough to round down, but that’s as close as any series in 20 years has gotten. Of course, the strategy in the Stat Geek Smackdown might be a little different: being slightly contrarian can be the correct move for maximizing your chances of winning.

*Update (5/1/11): *An emailer correctly points out that, for Smackdown purposes (where you get no credit for being close), the *mean* series length isn’t what matters, but the *mode*. E.g., in the 72-win Bulls case, the expected series length could be 4.6 with 4 still being the most likely outcome.

It was sloppy of me to equate the result of the length-predicting model with the most likely outcome. But that is actually *not *the model I used to determine that 5 and 6 games are better picks than 4 and 7—it was strictly intended to identify the most influential variables. Rather, my research into most likely series lengths (which is still new and constantly changing) is based on a larger empirical investigation and some (comparatively) advanced simulations that attempt to correctly account for underlying error rates and for teams that are likely to be stronger on the road or in elimination games, etc. I may post some of these results when I’ve refined them a bit more, but I stand by my claims that 4 game picks are for suckers—even in the most lopsided series—and that the C.W. against picking home teams in 6 is misguided.

Here are the panels and presentations that I attended, along with some of my thoughts:

*Featuring Malcolm Gladwell (Author of Outliers), Jeff Van Gundy (ESPN), and others I didn’t recognize.*

In this talk, Gladwell rehashed his absurdly popular maxim about how it takes 10,000 hours to master anything, and then made a bunch of absurd claims about talent. (Players with talent are at a disadvantage! Nobody wants to hire Supreme Court clerks! Etc.) The most re-tweeted item to come out of Day 1 *by far* was his highly speculative assertion that “a lot of what we call talent is the desire to practice.”

While this makes for a great motivational poster, IMO his argument in this area is tautological at best, and highly deceptive at worst. Some people have the gift of extreme talent, and some people have the gift of incredible work ethic. The streets of the earth are littered with the corpses of people who had one and not the other. Unsurprisingly, the most successful people tend to have both. To illustrate, here’s a random sample of 10,000 “people” with independent normally distributed work ethic and talent (each with a mean of 0, standard deviation of 1):

The blue dots (left axis) are simply Hard Work plotted against Talent. The red dots (right axis) are Hard Work plotted against the *sum* of Hard Work and Talent—call it “total awesome factor” or “success” or whatever. Now let’s try a little Bayes’ Theorem intuition check: You randomly select a person and they have an awesome factor of +5. What are the odds that they have a work ethic of better than 2 standard deviations above the mean? High? Does this prove that all of the successful people are just hard workers in disguise?

Hint: No. And this illustration is conservative: This sample is only 10,000 strong: increase to 10 billion, and the biggest outliers will be even more uniformly even harder workers (and they will all be extremely talented as well). Moreover, this “model” for greatness is just a sum of the two variables, when in reality it is probably closer to a product, which would lead to even greater disparities. E.g.: I imagine total greatness achieved might be something like great stuff produced per minute worked (a function of talent) times total minutes worked (a function of willpower, determination, fortitude, blah blah, etc).

The general problem with Gladwell I think is that his emphatic de-emphasis of talent (which has no evidence backing it up) cheapens his much stronger underlying observation that for any individual to fully maximize their potential takes the accumulation of a massive amount of hard work—and this is true for people *regardless* of what their full potential may be. Of course, this could just be a shrewd marketing ploy on his part: you probably sell more books by selling the hope of greatness rather than the hope of being an *upper-level* mid-manager (especially since you don’t have to worry about that hope going unfulfilled for at least 10 years).

*Featuring John Brenkus (ESPN Sports Science guy), Will Carroll (Sports Illustrated), and others. Moderated by Peter Keating (ESPN).*

From what I stayed for, this panel was pretty disappointing. There are two burning “analytical” questions about injuries that I would love to know the answers to:

- Is there really such a thing as “injury-prone” (i.e., how strongly do injuries correlate with more injuries)? And,
- How much do injuries affect a player’s future performance?

But rather than addressing these issues, the panel seemed to focus more on concussions and how bad they are, and whether pre-season games are worth it, etc. (including an incredible claim by one panelist that 70% of all NFL injuries occur within the first two weeks of minicamp, which I am not even close to believing without seeing the data for myself).

*By Tobias J. Moskowitz.*

This talk got a lot of buzz all day, but by the time I showed up it was almost over and observers were 3-deep out the door. The Cliffs Notes version: Biased refs. Unfortunately I missed most of the details, but apparently it relied heavily on European soccer data. I will definitely go back into my materials to look this one over, but it seems like a plausible and obviously very meaningful hypothesis.

*By Sandy Weill.*

This talk was an extremely rich look at a number of different things that can be examined with the most cutting-edge data out there. I think it may have literally been an infomercial for STATS Inc. (please correct me if I’m wrong), but it was very impressive regardless. The biggest disappointment to me is that their new system tracks approximately **1 million** data points ***per game*** (yes, that’s double-bold), yet STILL doesn’t tell us how much time was left on the shot clock when each shot was taken. Argh!

Highlights included:

- Number of defenders in the vicinity is much more important than proximity of defender(s).
- Shots immediately following a pass are more successful, even when controlling for distance and defensive positioning.
- The presenter did a long, detailed hypothesis-test about whether there was arbitrage opportunity for players to “step back” and take worse shots against softer defenses, but concluded that there wasn’t.
- Maybe the most interesting part to me was the ultra-nerdy discussion of how they cleaned the new data-set by matching up inconsistencies with the old data-set.

Overall, the guy admitted that he hadn’t really made many ostensibly exciting or counter-intuitive discoveries, but I think that’s probably a better result on balance. While it may not be as sexy, I think a system that confirms a bunch of prior beliefs is more likely to be right when it finally finds something wrong than one that finds fault everywhere it looks.

*Featuring Eric Mangini, Aaron Schatz, and others. Moderated by Gary Belsky (ESPN the Magazine).*

I was warned that this panel is usually a snore, and it was. Aaron seemed like the only one who knew what he was talking about, and even he was being modest and charitable. Some of the low-lights include:

- A lengthy discussion of “subjectivity” in football statistics, as if this were a meaningful problem. Um, data doesn’t have to be perfect to be useful. This is a clear case of “taboo”: “subjective” is a taboo word that people think is supposed to be bad w/r/t data, and thus they reflexively react negatively whenever they see “subjective” and “data” in the same paragraph. Then they say the data is “flawed” and go back to making decisions the old way—which is to say,
*based on their subjective intuitions*. - When asked about why NFL teams don’t go for it on 4th downs, Eric Mangini rambled on about how it may look like you should go for it based on league averages, but in the circumstances of a particular game there are a lot of other factors to consider, like your kicker might be tired or there might be too much wind or something. In addition to everything else that’s wrong about this answer, can I just note the obvious point that both of those things would seem to
*favor*going for it instead of kicking? - Somebody gave us this brilliant introduction to game theory: If teams started going for it on 4th down, defenses might adjust to try to stop them from converting on 4th down! While this might be relevant for fake punts and fake field goals, I’m pretty sure that in most cases defenses do that already (relatedly, moratorium on “Schrodinger’s Cat” metaphors, please).

Anyway, I didn’t stay to see if it got any better.

*Featuring Jeff Ma (of Brining Down the House fame), Michael Konik, and others. Moderated by Chad Millman (ESPN).*

I only caught a few minutes of this one. As a former professional gambler, I suspected that I would find this discussion tiresome, and I was right.

- Jeff Ma claimed that the main problem with sports gambling as a profitable enterprise is the psychological strain of potentially losing for “weeks” at a time (hmm… as opposed to, say, that other game that people play for money called
*the stock market*)? - Everyone on the panel seemed to be in agreement that the sports-betting markets are heading for near-perfect efficiency, which I guess is probably true. But I’m not sure it follows, as everyone up there seemed to think it does, that combined with the rake, this will
*necessarily*make sports-betting an unprofitable endeavor. This is pure (and fresh) speculation, but couldn’t the rake also provide a buffer*against*market perfection?: E.g., say the line based on casual money would be 7% off for a particular game, and the rake at the book is effectively 5%. We would expect the “smart” money to pour in immediately, but then stop once the line drops to within that 5%, where correcting the line is no longer profitable. This is functionally very similar to the rake being 0%, where the money would keep pouring in until the line was perfect. Either way, you only have to be able to beat the*smart money*by a small percentage in order for it to be profitable. The rake makes profitable opportunities more*rare*(as they will only exist when the casual money would be off by more than it), but if the smart money is rational, it shouldn’t make those opportunities any less exploitable. In fact, the relative scarcity*could*decrease the amount of smart money in the market overall, actually making it less efficient w/r/t individual bets (I’m not saying this scenario is accurate, just that it’s possible).

Anyway, as had been the case all day, the paper presentations were getting all the buzz, so I went to another of those:

*By Phillip Z. Maymin, Allan Maymin, and Eugene Shen.*

In this one, the authors really did take a counter-intuitive stand (at least relative to the intuitions of the analytical community), by claiming that the default NBA strategy of pulling starters with Quarter+1 fouls is actually a good thing. The main reasoning is this: while no one seems to disagree that pulling starters in foul trouble reduces the number of minutes they are able to contribute, there is actually strong empirical evidence that starters in foul trouble *play much worse*, making fewer minutes at full strength more valuable.

With apologies to anyone who was in the room, I very inarticulately tried to ask the author whether this didn’t in fact suggest that the problem was with players reacting to the possibility of ejection sub-optimally, rather than with the strategy of keeping them in itself. The author noted that playing worse to avoid ejection may itself be optimal, and we went around in circles a bit from there.

After wasting everyone’s time, I did get a chance to talk with one of the authors at length about this afterward, and we made much more headway. Upon reflection, I’m increasingly convinced that my point was correct. The issue breaks down like this:

- No one disputes that pulling your starters hurts your team by failing to maximize their number of minutes played. So for our purposes, it’s fair to assume that’s true.
- The main justification given for the default player-pulling strategy is that late-game minutes are more important than early-game minutes. While undoubtedly true to some (small) extent, the conventional wisdom almost certainly overstates this effect. In any case, the authors don’t appear to rely on this difference in forming their conclusions (obv I could be mistaken), so again, for purposes of analyzing their results, it’s fair to assume that all minutes are more or less equal.
- Now, for logical purposes, assume that you could command your players to play exactly the same as usual regardless of their foul situation.
- Consider the strategy of maximizing your best player’s minutes combined with the commandment to play the same regardless of number of fouls.
- Value-added per minute would be the same as normal, yet their average minutes would increase. Thus, this strategy weakly dominates the pulling strategy.

At this point, the author objected that players playing worse than normal to avoid being ejected may actually be playing *closer to optimal* than if they ignored their foul situation—which is to say, seemingly playing worse may actually be better for their team to some degree.

But this objection misses its mark: if the strategy of not pulling/instructing is better than the strategy of pulling, any further adjustments that the players make *toward* optimality should make this strategy *even better*.

Thus, working within the authors’ framework, and combined with the authors’ findings, I think it follows that players are probably making suboptimal adjustments to foul trouble (whether this is a result of bad instructions or inability to follow good instructions doesn’t matter).

*Featuring Mark Cuban, John Hollinger, Kevin Prichard, and Mike Zarren. Moderated by Marc Stein.*

This was the “main event” of the day, playing to a completely packed main hall. All-in-all, it was thoroughly entertaining and interesting, if not especially provocative or informative. As someone tweeted, Mark Cuban appears to be genetically engineered for panel discussions, and he was wearing a T-shirt (I believe the only one in the room) reading “talk nerdy to me.”

Here are some interesting things that came up:

- Cuban repeatedly said that the main problem with analytics is coaching: Every coach thinks they can coach a guy up, but you don’t know how a player will actually respond, etc. This seems like just another variable to me, but interesting to think about.
- Lots of interesting stuff on trades between Cuban, Pritchard, and Zarren. Cuban thinks all that action right before the deadline is arbitrary, but I think that’s incorrect as a game-theoretical matter: the deadline adds credibility to walk-away threats. Zarren seemed charmingly shaken-up by Perk trade. Very interesting to me: Cuban & Prichard both said that teams deal with each other differently based on their reputations as analytics-oriented or not. At first I thought they meant that the more analytical teams were shunned by the more old-fashioned, but then it also sounded like maybe they were saying more analytical teams are wary of each other.
- 2/3 of teams have analytics operations of some sort. I haven’t decided yet whether that sounds low or high.
- There was a modest discussion of the Heat losing close games and how much of a sample you need to start taking such trends seriously. I believe the answer is probably smaller than most analytical types think: As I’ve discussed before, it can be shown that winning is a skill beyond a random walk of points scored and allowed. I assume that a lot of this comes from superior execution in close games (although this could be looked at in more detail). Bayes’ Theorem does the rest.
- One of the audience questions was about why NBA players are such bad free throw shooters. Pritchard gave a series of lengthy answers having to do with pressure, but I think those miss the main point. The more you are required to do aside from free throws, the more variables are going to go into your selection to play in the NBA—thus, the more likely it is that your strengths will lie in other areas. NBA players are the most talented free throw shooters in the world
*for people with their skillsets*. This is almost true by definition for the present, but I suspect it is probably true historically as well. If we asked how good of a free throw shooter every player in the league is relative to the history of players who do*everything else*as well as they do, I suspect that modern players are the best free throw shooters ever. E.g., Dwight Howard may seem like a clanker, but he may be the best free throw shooter among people who play center as weIl as he does in NBA history.

*Featuring Del Harris, Mike Leach, Eric Mangini, and Steve Pagliuca. Moderated by Howard Beck (New York Times).*

I thought this panel was awful. Del Harris rambled on and on about stuff that apparently a lot of people found entertaining—maybe I was just burnt out. He did apparently use the term “statted out” repeatedly, which I love. As in: “That player statted out as a second round draft pick, but we thought he was better than that” (not an actual example).

Mangini again offered us a taste of his extensive wisdom, such as: when deciding whether to go for it on 4th down, teams should take distance and field position into account.

I can’t remember anything Leach said, but it wasn’t much better.

Overall, it seemed like a bunch of the old guard lamenting rather than celebrating the onset of sports analytics. They literally all guffawed over the idea that things were so much easier back before they had all these stats to confuse them.

*Featuring Mark Cuban, Mike Carey (actual NFL ref), Jon Wertheim (Sports Illustrated), and Phil Birnbaum. Moderated by Bill Simmons.*

This panel also featured Cuban, who was on his best behavior despite Bill Simmons’s repeated attempts to get him fined. Cuban offered many knowing smiles before declining to answer pointed questions, much to the audience’s amusement. A couple of notes:

- Bill Simmons was a great moderator. He introduced a number of provocative topics and asked relevant, very challenging, questions and follow-ups. He was such a great moderator, in fact, that he basically demonstrated the major flaw with all of the other panels that I saw: lack of confrontation! E.g., Simmons literally tried to start a labor war between the NFL and its refs, but Carey wouldn’t bite. He also asked Carey what his biggest mistake was as a ref, and Carey responded “probably agreeing to do this panel.”
- Shortly after Simmons cracked one of his many (nearly identical) jokes about how few women were in attendance, a woman asked a great question that I thought was very interesting and didn’t get an adequate response from the panel (paraphrasing): If the regular season is all about jockeying for home court advantage in the playoffs, and the main source of home court advantage is officiating bias, are we really sure that we want unbiased officials? Obv this seemed to be a natural offshoot of the earlier paper about where HFA comes from, but it was met mostly with platitudes about fairness. I’m not so sure: I mean, even more broadly, is home field advantage really something we’re willing to sacrifice? Not just in the playoffs, but from game to game: paying fans love it when home teams win! Of course, you can’t codify
*un*fairness, so perhaps the optimal solution is actually the status quo: tolerate biased refs but act like you don’t.

Overall, a great day for fans of sports analysis.

]]>The challenge here is this: My preferred method for rating the usefulness and reliability of various statistics is to see how accurate they are at predicting win differentials. But, now, the statistic I would like to test *actually is* win differential. The problem, of course, is that a player’s win differential is always going to be *exactly identical* to *his *win differential. If you’re familiar with the halting problem or Gödel’s incompleteness theorem, you probably know that this probably isn’t directly solvable: that is, I probably can’t design a metric for evaluating metrics that is capable of evaluating itself.

To work around this, our first step must be to independently assess the reliability of win predictions that are based on our inputs. As in sections (b) and (c), we should be able to do this on a team-by-team basis and adapt the results for player-by-player use. Specifically, what we need to know is the error distribution for the outcome-predicting equation—but this raises its own problems.

Normally, to get an error distribution of a predictive model, you just run the model a bunch of times and then measure the predicted results versus the actual results (calculating your average error, standard deviation, correlation, whatever). But, because my regression was to individual games, the error distribution gets “black-boxed” into the single-game win probability.

[A brief tangent: “Black box” is a term I use to refer to situations where the variance of your input elements gets sucked into the win percentage of a single outcome. *E.g.*, in the NFL, when a coach must decide whether to punt or go for it on 4th down late in a game, his decision one way or the other may be described as “cautious” or “risky” or “gambling” or “conservative.” But these descriptions are utterly vapid: w*ith respect to winning*, there is no such thing as a play that is more or less “risky” than any other—there are only plays that improve your chances of winning and plays that hurt them. One play may *seem* like a bigger “gamble,” because there is a larger immediate disparity between its possible outcomes, but a 60% chance of winning is a 60% chance of winning. Whether your chances comes from superficially “risky” plays or superficially “cautious” ones, outside the “black box” of the game, they are equally volatile.]

For our purposes, what this means is that we need to choose something else to predict: specifically, something that will have an accurate and measurable error distribution. Thus, instead of using data from 81 games to predict the probability of winning one game, I decided to use data from 41 season-games to predict a team’s winning percentage in its *other* 41 games.

To do this, I split every team season since 1986 in half randomly, 10 times each, leading to a dataset of 6000ish randomly-generated half-season pairs. I then ran a logistic regression from each half to the other, using team winning percentage and team margin of victory as the input variables and games won as the output variable. I then measured the distribution of *those outcomes*, which gives us a baseline standard deviation for our predicted wins metric for a 41 game sample.

Next, as I discussed briefly in section (b), we can adapt the distribution to other sample sizes, so long as everything is distributed normally (which, at every point in the way so far, it has been). This is a feature of the normal distribution: it is easy to predict the error distribution of larger and smaller datasets—your standard deviation will be directly proportional to the square-root of the ratio of the new sample size to the original sample size.

Since I measured the original standard deviations in games, I converted each player’s “Qualifying Minutes” into “Qualifying Games” by dividing by 36. So the sample-size-adjusted standard deviation is calculated like this:

=[41GmStDev]*SQRT([PlQualGames]/41)

Since the metrics we’re testing are all in percentages, we then divide the new standard deviation by the size of the sample, like so:

=([41GmStDev]*SQRT([PlQualGames]/41))/[PlQualGames]

This gives us a standard deviation for actual vs. predicted winning percentages for any sample size. Whew!

The good news is: now that we can generate standard deviations for each player’s win differentials, this allows us to calculate *p*-values for each metric, which allows us to finally address the big questions head on: How likely is it that this player’s performance was due to chance? Or, put another way: How much evidence is there that this player had a significant impact on winning?

The better news is: since our standard deviations are adjusted for sample size, we can greatly increase the size of the comparison pool, because players with smaller samples are “punished” accordingly. Thus, I dropped the 3-season requirement and the total minutes requirement entirely. The only remaining filters are that the player missed at least 15 games for each season in which a differential is computed, and that the player averaged at least 15 minutes per game played in those seasons. The new dataset now includes 1539 players.

Normally I don’t weight individual qualifying seasons when computing career differentials for qualifying players, because the weights are an *evidentiary* matter rather than an *impact* matter: when it comes to estimating a player’s *impact*, conceptually I think a player’s effect on team performance should be averaged across circumstances equally. But *this* comparison isn’t about whose stats indicate the most skill, but whose stats make for the best *evidence *of positive contribution. Thus, I’ve weighted each season (by the smaller of games missed or played) before making the relevant calculations.

So without further ado, here are Dennis Rodman’s statistical significance scores for the 4 versions of Win % differential, as well as where he ranks against the other players in our comparison pool:

Note: I’ve posted a complete table of z scores and p values for all 1539 players on the site. Note also that due to the weighting, some of the individual differential stats will be slightly different from their previous values.

You should be careful to understand the difference between this table of *p*-values and ranks vs. similar ones from earlier sections. In those tables, the *p*-value was determined *by* Rodman’s relative position in the pool, so the *p*-value and rank basically represented the same thing. In this case, the *p*-value is based on the expected error in the results. Specifically, they are the answer to the question “If Dennis Rodman actually had zero impact, how likely would it be for him to have posted these differentials over a sample of this size?” The “rank” is then where his answer ranks among the answers to the same question for the other 1538 players. Depending on your favorite flavor of win differential, Rodman ranks anywhere from 1st to 8th. His average rank among those is 3.5, which is 2nd only to Shaquille O’Neal (whose differentials are smaller but whose sample is much larger).

Of course, my preference is for the combined/adjusted stat. So here is my final histogram:

^{Note: N=1539.}

Now, to be completely clear, as I addressed in Part 3(a) and 2(b), so that I don’t get flamed (or stabbed, poisoned, shot, beaten, shot again, mutilated, drowned, and burned—metaphorically): Yes, actually I *AM* saying that, when it comes to *empirical evidence based on win differentials*, Rodman *IS* superior to Michael Jordan. This doesn’t mean he was the better player: for that, we can speculate, watch the tape, or analyze other sources of statistical evidence all day long. But for *this* source of information, in the final reckoning, win differentials provide *more evidence* of Dennis Rodman’s value than they do of Michael Jordan’s.

The best news is: That’s it. This is game, set, and match. If the 5 championships, the ridiculous rebounding stats, the deconstructed margin of victory, etc., aren’t enough to convince you, this should be: Looking at Win% and MOV differentials over the past 25 years, when we examine which players have the strongest, most reliable evidence that they were substantial contributors to their teams’ ability to win more basketball games, Dennis Rodman is among the tiny handful of players at the very very top.

]]>Long ago, the analytic community recognized this fact, and has moved *en masse* to MOV (and its ilk) as the main element in their predictive statistics. John Hollinger, for example, uses margin exclusively in his team power ratings—completely ignoring winning percentage—and these ratings are subsequently used for his playoff prediction odds, etc. Note, Hollinger’s model has a lot of baffling components, like heavily weighting a team’s performance in their last 10 games (or later in their last 25% of games), when there is no statistical evidence that L10 is any more predictive than first 10 (or any other 10). But this particular choice is of particular interest, especially as it is indicative of an almost uniform tendency among analysts to substitute MOV-style stats for winning percentage entirely.

This is both logically and empirically a mistake. As your sample size grows, winning percentage becomes more and more valuable. The reason for this is simple: Winning percentage is perfectly accurate—that is, it perfectly reflects what it is that we want to know—but has extremely high variance, while MOV is an imperfect *proxy*, whose usefulness stems primarily from its much *lower* variance. As sample sizes increase, the variance for MOV decreases towards 0 (which happens relatively quickly), but the gap between what it measures and what we want to know will persist in perpetuity. Thus, after a certain point, the “error” in MOV remains effectively constant, while the “error” in winning percentage continuously decreases. To get a simple intuitive sense of this, imagine the extremes: after 5 games, clearly you will have more faith in a team that has won 2 but has a MOV of +10 over a team that has won 3 but has a MOV of +1. But now imagine 1000 games with the same MOV’s and winning percentages: one team has won 400 and the other has won 600. If you had to place money on one of the two teams to win their next game, you would be a fool to favor the first. But beyond the intuitive point, this is essentially an empirical matter: with sufficient data, we should be able to establish the relative importance of each for any given sample-size.

So for this post, I’ve employed the same method that I used in section (b) to create our MOV-> Win% formula (logistic regression for all 55,000+ team games since 1986), except this time I included both Win % and MOV (over the team’s *other* 81 games) as the predictive variables. Here, first, are the coefficients and corresponding *p*-values (probability that the variable *is not* significant):

It is thus empirically incontrovertible that, even with an 81-game predictive sample, both MOV and Win% are statistically significant predictive factors. Also, for those who don’t eat logistic regression outputs for breakfast, I should be perfectly clear what this means: It doesn’t just mean that both W% and MOV are good at predicting W%—this is trivially true—it means that, even when you have one, using *the other as well* will make your predictions substantially better. To be specific, here is the formula that you would use to predict a team’s winning percentage based on these two variables:

Note: Again, *e* is euler’s number, or ~2.72. *wp* is the variable for winning % over the other 81 games, and *mv* is the variable for Margin of Victory over the other 81 games.

And again, for your home-viewing enjoyment, here is the corresponding Excel formula:

=1/(1+EXP(-(1.43*[W%]+.081[MOV]-.721)))

Finally, in order to visualize the relative importance of each variable, we can look at their standardized coefficients (shown here with 95% confidence bars):

^{Note: Standardized coefficients, again, are basically a unit of measurement for comparing the importance of things that come in different shapes and sizes.}

For an 81-game sample (which is about as large of a consistent sample as you can get in the NBA), Win% is about 60% as important as MOV when it comes to predicting outcomes. At the risk of sounding redundant, I need to make this extremely clear again: this does NOT mean that Win% is 60% as good at predicting outcomes as margin of victory (actually, it’s more like 98% as good at that)—it means that, when making your ideal prediction, which incorporates both variables, Win % gets 60% as much *weight* as MOV (as an aside, I should also note that the importance of MOV drops virtually to zero when it comes to predicting playoff outcomes, largely—though not entirely—because of home court advantage).

This may not sound like much, but I think it’s a pretty significant result: At the end of the day, this proves that there IS a skill to winning games independent of the rates at which you score and allow points. This is a non-obvious outcome that is almost entirely dismissed by the analytical community. If NBA games were a random walk based on possession-to-possession reciprocal advantages, this would not be the case at all.

Now, note that this is formally the same as the scenario discussed in section (b): We want to predict winning percentages, but using MOV alone leaves a certain amount of error. What this regression proves is that this error can be reduced by incorporating win percentage into our predictions as well. So consider this proof-positive that X-factors *are* predictively valuable. Since the predictive power of Win% and MOV should be equivalent no matter their source, we can now use this regression to make more accurate predictions about each player’s true impact.

Adapting this equation for individual player use is simple enough, though slightly different from before: Before entering the player’s Win% differential, we have to convert it into a raw win percentage, by adding .5. So, for example, if a player’s W% differential were 21.6%, we would enter 71.6%. Then, when a number comes out the other side, we can convert it back into a predicted differential by subtracting .5, etc.

Using this method, Rodman’s predicted win differential comes out to 14.8%. Here is the new histogram:

^{Note: N is still 470.}

This histogram is also weighted by the sample size for each player (meaning that a player with 100 games worth of qualifying minutes counts as 100 identical examples in a much larger dataset, etc.). I did this to get the most accurate distribution numbers to compute *P* values (which, in this case, work much like a percentile) for individual players. Here is a summary of the major factors for Dennis Rodman:

For comparison, I’ve also listed the percentage of eligible players that match the qualifying thresholds of my dataset (minus the games missed) who are in the Hall of Fame. Specifically, that is, those players who retired in 2004 or earlier and who have at least 3 seasons since 1986 with at least 15 games played in which they averaged at least 15 minutes per game. This gives us a list of 462 players, of which 23 are presently IN the Hall. The difference in average skill between that set of players and the differential set is minimal, and the reddish box on the histogram above surrounds the top 5% of predicted Win% differentials in our main data.

While we’re at it, let’s check in on the list of “select” players we first saw in section (a) and how they rank in this metric, as well as in some of the others I’ve discussed:

For fun, I’ve put average rank and rank of ranks (for raw W% diff, adjusted W% diff, MOV-based regression, raw W%/MOV-based regression, raw X-Factor, adjusted X-Factor, and adjusted W%/MOV-based regression) on the far right. I’ve also uploaded the complete win differential table for all 470 players to the site, including all of the actual values for these metrics and more. No matter which flavor of metric you prefer (and I believe the highlighted one to be the best), Rodman is solidly in Hall of Fame territory.

Finally, I’m not saying that the Hall of Fame does or must pick players based on their ability to contribute to their team’s winning percentages. But *if* they did, and *if* these numbers were accurate, Rodman would deserve a position with room to spare. Thus, naturally, one burning question remains: how much can we trust these numbers (and Dennis Rodman’s in particular)? This is what I will address in section (d) tomorrow.

There are several different methods for converting MOV into expected win-rates. For this series, I took the 55,000+ regular-season team games played since 1986 and compared their outcomes to the team’s Margin of Victory over the *other* 81 games of the season. I then ran this data through a logistic regression (a method for predicting things that come in percentages) with MOV as the predictor variable. Here is the resulting formula:

^{Note: e is euler’s number, or ~2.72. mv is the variable for margin of victory.}

This will return the probability between 0 and 1, corresponding to the odds of winning the predicted game. If you want to try it out for yourself, the excel formula is:

1 / (1 + EXP(-(-0.0039+0.1272*[MOV])))

So, for example, if a team’s point differential (MOV) over 81 games is 3.78 points per game, their odds of winning their 82nd game would be 61.7%.

Of course, we can use this same formula to predict a player’s win% differential based on *his* MOV differential. If, based on his MOV contribution alone, a player’s team would be expected to win 61.7% of the time, then his predicted win% differential is what his contribution would be *above average*, in this case 11.7% (this is one reason why, for comparison purposes, I prefer to use adjusted win differentials, as discussed in Part 3(a)).

As discussed in the part 2(b) of this series (“With or Without Worm”), Dennis Rodman’s MOV differential was 3.78 points, which was tops among players with at least a season’s worth of qualifying data, corresponding to the aforementioned win differential of 11.7%. Yet this under-predicts his *actual* win percentage differential by 9.9%. This could be the result of a miscalibrated prediction formula, but as you can see in the following histogram, the mean for win differential minus predicted win differential for our 470 qualifying player dataset is actually slightly *below* zero at –0.7%:

Rodman has the 2nd highest overall, which is even more crazy considering that he had one of the highest MOV’s (and the highest of anyone with anywhere close to his sample size) to begin with. Note how much of an outlier he is in this scatterplot (red dot is Rodman):

I call this difference the “X-Factor.” For my purposes, “X” stands for “unknown”: That is, it is the amount of a player’s win differential that isn’t explained by the most common method for predicting win percentages. For any particular player, it may represent an actual skill for winning above and beyond a player’s ability to contribute to his team’s margin of victory (in section (c), I will go about proving that such a skill exists), or it may simply be a result of normal variance. But considering that Rodman’s sample size is significantly larger than the average in our dataset, the chances of it being “error” should be much smaller. Consider the following:

Again, Rodman is a significant outlier: no one with more than 2500 qualifying minutes breaks 7.5%. Rodman’s combination of large sample with large Margin of Victory differential with large X-Factor is remarkable. To visualize this, I’ve put together a 3-D scatter plot of all 3 variables:

It can be hard to see where a point stands in space in a 2-D image, but I’ve added a surface grid to try to help guide you: the red point on top of the red mountain is Dennis Rodman.

To get a useful measure of how extreme this is, we can approximate a sample-size adjustment by comparing the number of qualifying minutes for each player to the average for the dataset, and then adjusting the standard deviation for that player accordingly (proportional to the square root of the ratio, a method which I’ll discuss in more detail in section (d)). After doing this, I can re-make the same histogram as above with the sample-adjusted numbers:

No man is an island. Except, apparently, for Dennis Rodman. Note that he is about 4 standard deviations above the mean (and observe how the normal distribution line has actually *blended with the axis* below his data point).

Naturally, of course, this raises the question:

**Where does Rodman’s X-Factor come from?**

Strictly speaking, what I’m calling “X-Factor” is just the prediction error of this model with respect to players. Some of that error is random and some of it is systematic. In section (c), I will prove that it’s *not* entirely random, though where it comes from for any *individual* player, I can only speculate.

Margin of Victory treats all contributions to a game’s point spread equally, whether they came at the tail end of a blowout, or in the final seconds of squeaker. One thing that could contribute to a high X-factor is “clutch”ness. A “clutch” shooter (like a Robert Horry), for example, might be an average or even slightly below-average player for most of the time he is on the floor, but an extremely valuable one near the end of games that could go either way. The net effect from the non-close games would be small for both metrics, but the effect of winning close games would be much higher on Win% than MOV. Of course, “clutch”ness doesn’t have to be limited to shooters: e.g., if one of a particular player’s skill advantages over the competition is that he makes better tactical decisions near the end of close games (like knowing when to intentionally foul, etc.), that would reflect much more strongly in his W% than in his MOV.

Also, a player who contributes significantly whenever they are on the floor but is frequently taken out of non-close games as a precaution again fatigue or injury may have a Win % that accurately reflects his impact, but a significantly understated MOV. E.g., in the Boston Celtics “Big 3” championship season, Kevin Garnett was rested constantly—a fact that probably killed his chances of being that season’s MVP—yet the Celtics won by far the most games in the league. In this case, the player is “clutch” just by virtue of being on the floor more in clutch spots.

The converse possibility also exists: A player could be “reverse clutch,” meaning that he plays *worse* when the game is *NOT* on the line. This would ultimately have the same statistical effect as if he played *better* in crunch time. And indeed, based on completely non-rigorous and anecdotal speculation, I think this is a possible factor in Rodman’s case. During his time in Chicago, I definitely recall him doing a number of silly things in the 4th quarter of blowout games (like launching up ridiculous 3-pointers) when it didn’t matter—and in a game of small margins, these things add up.

Finally, though it cuts a small amount against the absurdity of Rodman’s rebounding statistics, I would be derelict as an analyst not to mention the possibility that Rodman may have played sub-optimally in non-close games in order to pad his rebounding numbers. The net effect, of course, would be that his rebounding statistics could be slightly overstated, while his value (which is already quite prodigious) could be substantially understated. To be completely honest, with his rebounding percentages and his X-Factor both being such extreme outliers, I have to think that at least some relationship existing between the two is likely.

If you’re emotionally attached to the freak-alien-rebounder hypothesis, this might seem to be a bad result for you. But if you’re interested in Rodman’s true value to the teams he played for, you should understand that, if this theory is accurate, it could put Rodman’s true impact on winning into the stratosphere. That is, this possibility gives no fuel to Rodman’s potential critics: the worst cases on either side of the spectrum are that Rodman was the sickest rebounder with a great impact on his teams, or that he was a great rebounder with the sickest impact.

In the next section, I will be examining the relative reliability and importance of Margin of Victory vs. Win % generally, across the entire league. In my “endgame” analysis, this is the balance of factors that I will use. But the league patterns do not necessarily apply in all situations: In some cases, a player’s X-factor may be all luck, in some cases it may be all skill, and in most it is probably a mixture of both. So, for example, if my speculation about Rodman’s X-Factor were true, my final analysis of Rodman’s value could be greatly understated.

]]>Second, I apologize for the delay in getting this section out. I’m reminded of the words of the always brilliant Detective Columbo:

I worry. I mean, little things bother me. I’m a worrier. I mean, little insignificant details – I lose my appetite. I can’t eat. My wife, she says to me, “You know, you can really be a pain.”

Of course, as Columbo understood, the “insignificant” details that nag at you are usually anything but. Since Part 3 of this series should be the last to include heavily-quantitative analysis—and because it is so important to understanding Rodman’s true value—I really tried to tie up all the loose ends (even those that might at first seem to be redundant or obvious).

As a result, what began as a simple observation grew into something painfully detailed and extremely long (even by my standards)—but well worth it. So, once again, I’ve decided to break it down into 4 sections—however, each of these will be relatively short, and I’ll be posting them back-to-back each morning from now through Saturday. Here is the Cliff’s Notes version:

- Rodman had an observably great impact on his teams’ winning percentages.
- This impact was much greater than his already great impact on Margin of Victory would have predicted.
- Contrary to certain wisdom in the analytical community, Margin of Victory and Win% are both valuable indicators predictively, and combining Rodman’s differentials in both put him deep in Hall of Fame territory.
- Rodman’s differentials are statistically significant at one of the highest levels in NBA history.

Now, on with the show:

One of the most common doubts I hear about Dennis Rodman’s value stems from the belief that his personal successes—5 NBA championships, freakish rebounding statistics, etc.—were probably largely a result of his having played for superior teams. For example, his prodigious rebounding may have been something he was “allowed” to do because he played for good offensive teams and (as the argument goes) had few other offensive responsibilities.

In it’s weaker form, I think this argument is plausible but irrelevant: Perhaps Rodman would not have been able to put up the numbers that he did if he were “required” to do more on offense. But the implication that this diminishes his value is absurd—it would be like saying that Cy Young wasn’t a particularly valuable baseball player because he couldn’t have put up such a great ERA if he were “required” to hit every night.

The stronger form, however, suggests that Rodman’s anomalous rebounding statistics probably weren’t due to any particularly anomalous talent or contribution, but were merely (or at least mostly) a byproduct of his fortunate circumstances.

If this were true, however, one of the following things would necessarily have to follow:

- His rebounding must not have contributed much value to his teams, or
- The situations he played in must have been uniquely favorable to leveraging value from a designated rebounder, or
- The choice to use a designated rebounder on an offensively strong team must have been an extremely successful exploitative strategy.

The third, I technically cannot disprove: It is theoretically possible that Rodman’s refusal to take a lot of shots on offense unintentionally caused his teams to stumble upon an amazing exploitative strategy that no one had discovered before and that no-one has duplicated since (though, if that were the case, he still might deserve some credit for forcing their hands).

But 1 and 2 simply aren’t supported by the data: As I will show, Rodman had wildly positive impacts on 4 different teams that had little in common, except of course for being solid winners with Rodman in the lineup.

As I’ve discussed previously, a player’s differential statistics are simply the difference in their team’s performance in the games they played versus the games they missed. One very important differential stat we might be interested in is winning percentage.

To look at Rodman’s numbers in this area, I used exactly the same process that I described in Part 2(b) to look at his other differentials. However, for comparison purposes, I’ve greatly expanded the pool of players by dropping the qualifying minutes requirement from 3000 to 1000. This grows the pool from 164 players to 470.

Why expand? Honestly, because Rodman’s extreme win % differential allows it. I think the more stringent filters produce a list that is more reliable from top to bottom—but in this case, I am mostly interested in (literally) the top. There are some players on the list with barely 1/3 of a season’s worth of qualifying playing time to back up their numbers—which should produce extreme volatility—yet still no one is able to overtake Rodman.

Here is Rodman’s raw win differential, along with those of a number of select players (including a few whose styles are often compared to Rodman’s, some Hall of Famers, some future first-ballot Hall of Fame selections, and Rodman’s 2011 Hall of Fame co-finalists Chris Mullin and Maurice Cheeks):

I will put up a table of the entire list of 470 players—including win differentials and a number of other metrics that I will discuss throughout the rest of Part 3—along with section (c) on Friday.

Amazingly, this number may not even reflect Rodman’s true impact, because he generally played for extremely good teams, where it is not only harder to contribute, but where a given impact will have less of an effect on win percentage (for example, if your team normally wins 90% of its games, it is clearly impossible to have a win% differential above 10%). To account for this, I’ve also created “adjusted” win% differentials, which attempt to normalize a player’s percentage increase/decrease to what it would be on a .500 team.

This adjustment is done somewhat crudely, by measuring how far the player gets you toward 100% (for positive impacts) or toward 0% (for negative). E.g., if someone plays for a team that normally wins 70%, and they win 85% with him in the lineup, that is 50% of the way to 100%. Thus, as 50% of the way from 50% to 100% is 75%, that player’s adjusted differential is 25% (as opposed to their raw value of 15%).

A few notes about this method: While I prefer the adjusted numbers for this situation, they have their drawbacks. They are most accurate when dealing with consistently good or bad teams, over multiple seasons, and with bigger sample sizes. They are less accurate with smaller sample sizes, in individual seasons, and with uncertain team quality. This is because regression to the mean can become an interfering factor. When looking at individual seasons in a void, it is relatively easy to account for both effects, which I do for my league-wide win differential analysis. But when aggregating independent seasons that have a common related element—such as the same team or player—you basically have to pick your poison (of course, there may be some way to deal with this issue that I just don’t know or haven’t thought of yet). I will tend to use the adjusted numbers for this analysis, but though they *are* slightly more favorable to Rodman, either metric leads to the same bottom line. In any case, the tables I will be posting include both metrics (as well as other options).

Dennis Rodman’s adjusted numbers boost his win differential to 21.6%, widening the margin between him and 2nd place. I know I will be flamed if I don’t add that (just as I noted in part 2(b)) I am *not* claiming that Rodman was actually the best player in the last 25 years. This is a volatile statistic, and Rodman merely happening to have the best win differential among the group of 470 qualifying players does *not* mean he was actually the best player overall, or even that he was the best player in the group. That said, we should not dismiss the extremeness of the result either:

I will be using a number of (eerily similar) histograms through the rest of Part 3 as well. If you’re not familiar, histograms are one of the simplest and most useful graphical representations of single-variable data (yet, inexplicably, they aren’t built into Excel): each bar represents the number of data points of the designated value. If the variable is continuous (as it is in this case), each bar is basically a “container” that tells you how many data points fit in between the left and right values of the bar (technically it tells you the “density” of points near the center of the container, but those are effectively the same in most circumstances). Their main purpose is to eyeball how the variable is distributed—in this case, as you can see it is distributed normally.

The red line is an overlay of the normal distribution of the sample, which has a mean of –0.5% and standard deviation of 6.3%. This puts Rodman just over 3.5 standard deviations above the mean, a value that should occur about once in every 4000 instances—and he does this based on a standard deviation that is derived from a pool that includes the statistics of many players that have as little as 1/4th as much relevant data as he has.

Moreover, as I will discuss in section (b) tomorrow, his win % differential is not only extreme relative to the rest of the NBA, it is even extreme relative to himself—and this has important implications in its own right.

]]>Clearly, some of the predictions worked out better than others. E.g., Kansas City did manage to win their division (which I never would have guessed), but Dallas and San Francisco continued to make mockeries of their past selves. We did come dangerously close to a Jets/Packers Super Bowl, but in the end, SkyNet turned out to be more John Edwards than Nostradamus.

From a prediction-tracking standpoint, the real story of this season was the stunning about-face performance of Football Outsiders, who dominated the regular season basically from start to finish:

^{ Note: Average and Median Errors reflect the difference between projected and actual wins. Correlations are between projected and actual win percentages. }

Not only did they almost completely flip the results from 2009, but their stellar 2010 results (combined with the below-average outing of my model) actually pushed their last 3 seasons slightly ahead of the neural network overall. This improvement also puts Koko (.25 * previous season’s wins + 6) far in FO’s rearview, providing further evidence that Koko’s 2009 success was a blip.

If we use each method’s win projections to project the post-season as well, however, things turn out a bit differently. Football Outsiders starts out in a strong position, having correctly picked 4 of 8 division champions and 9 of 12 playoff teams overall (against 2 and 8 for the NN respectively), but their performance worsens as the playoffs unfold:

The neural network correctly placed Green Bay in the Super Bowl and the Jets into the AFC championship game, while FO’s final 4 were Atlanta over Green Bay and Baltimore over Indianapolis.

Moreover, if we use these preseason projections to pick the overall results of the playoffs *as they were actually set*, the neural network outperforms its rivals by a wide margin:

^{ Note: The error measured in this table is between predicted finish and actual finish. The Super Bowl winner finishes in 1st place, the loser in 2nd place, conference championship losers each tie for 3.5th place (average of 3rd and 4th), divisional losers tie for 6.5th (average of 5th, 6th, 7th, and 8th), and wild card round losers tie for 10.5th (average of 9th, 10th, 11th, and 12th). }

This minor victory will give me some satisfaction when I retool the model for next season—after all, this model is still essentially based on a small fraction of the variables used by its competitor, and neural networks generally get better and better with more data. On balance, though, the season clearly goes to Football Outsiders. So credit where it’s due, and congratulations to humankind for putting the computers in their place, at least for one more year.

]]>This blog is called “Skeptical” Sports Analysis for a reason: I’m generally wary of our ability to understand anything definitively, and I believe that most people who confidently claim to *know* a lot of things other than facts—whether in sports, academics, or life—are either lying, exaggerating, or wrong. I don’t accept this as an *a priori* philosophical tenet (in college I was actually very resistant to the skeptics), but as an empirical conclusion based on many years of engaging and analyzing various people’s claims of knowledge. As any of you who happen to know me will attest, if I have any talent on this earth, it is finding fault with such claims (even when they are my own).

Keeping that in mind—and keeping in mind that, unlike most sports commentators, I don’t offer broadly conclusive superlatives very often—I offer this broadly conclusive superlative: Dennis Rodman was the greatest rebounder of all time. If there has been any loose end in the arguments I’ve made already, it is this: based on the evidence I’ve presented so far, Rodman’s otherworldly rebounding statistics could, theoretically, be a result of shenanigans. That is, he could simply have been playing at the *role* of rebounder on his teams, ignoring all else and unnaturally inflating his rebounding stats, while only marginally (or even negatively) contributing to his team’s performance. Thus, the final piece of this puzzle is showing that his rebounding *actually* helped his teams. If that could be demonstrated, then even my perversely skeptical mind would be satisfied on the point—else there be no hope for knowledge.

This is where “The Case for Dennis Rodman Was a Great Rebounder” and “The Case for Dennis Rodman” join paths: Showing that Rodman got a lot of rebounds without also showing that this significantly improved his teams proves neither that he was a great player nor that he was a great rebounder. Unfortunately, as I discussed in the last two sections, player value can be hard to measure, and the most common conventional and unconventional valuation methods are deeply flawed (not to mention unkind toward Rodman). Thus, in this post and the next, I will take a different approach.

For this analysis, I will not be looking at Dennis Rodman’s (or any other player’s) statistics directly at all. Instead, I will be looking at his team’s statistics, comparing the games in which he played to the games that he missed. I used a similar (though simpler) method in my mildly popular Quantum Randy Moss post last fall, which Brian Burke dubbed WOWRM, or “With or Without Randy Moss.” So, now I present that post’s homophonic cousin: WOWWorm, or “With or Without Worm.”

The main advantages to indirect statistics are that they are all-inclusive (everything good or bad that a player does is accounted for, whether it is reflected in the box score or not), empirical (what we do or don’t know about the importance of various factors doesn’t matter), and they can get you about as close as possible in this business to isolating actual cause and effect. These features make the approach especially trenchant for general hypothesis-testing and broader studies of predictivity that include league-wide data.

The main disadvantage for individual player analysis, however, is that the samples are almost always too small to be conclusive (in my dream universe, every player would be forced to sit out half of their team’s regular-season games at random). They are also subject to bias based on quality of the player’s team (it is harder to have a big impact on a good team), or based on the quality of their backup—though I think the latter effect is much smaller in the basketball than in football or baseball. In the NBA, teams rotate in many different players and normally have a lot of different looks, so when a starter goes out, they’re rarely just replaced by one person—the whole roster (even the whole gameplan) may shift around to exploit the remaining talent. This is one reason you almost never hear of an NBA bench player finally “getting his shot” because the player in front of them was injured—if someone has exploitable skills, they are probably going to get playing time regardless. Fortunately, Dennis Rodman missed his fair share of games—aided by his proclivity for suspensions—and the five seasons in which he missed at least 15 games came on four different teams.

Note, for the past few years, more complete data has allowed people to look at minute-by-minute or play-by-play +/- in basketball (as has been done for some time in hockey). This basically eliminates the sample size problem, though it introduces a number of potential rotational, strategic and role-based biases. Nevertheless, it definitely makes for a myriad of exciting analytical possibilities.

For structural reasons, I’m going to hold off on Rodman’s Win % differentials until my next post in this series. In this post, however, I will look at everything else, starting with team point differential differential—a.k.a. “Margin of Victory”:

^{Note: Table is only the top 25 players in the dataset.}

First, the nitty-gritty: This data goes back to 1986, starting with all players who missed and played at least 15 games in a single season while averaging at least 20 minutes per game played. The “qualifying games” in a season is the *smaller* of games played or games missed. *E.g.*, if someone played 62 games and missed 20, that counts as 20 qualifying games, the same as if someone played 20 games and missed 62. Their “qualifying minutes” are then their average of minutes per game played multiplied by their total number of qualifying games. For the sample, I set the bar at 3000 qualifying minutes, or roughly the equivalent of a full season for a typical starter (82 games * 36 minutes/game is 2952 minutes), which leaves 164 qualifying players. I then calculated differentials for each team-season: *I.e.*, per-game averages were calculated separately for the set of games played and the set of games missed by each player *from within a particular season*, and each season’s “differentials” were created for each stat simply by subtracting the second from the first. Finally, I averaged the per-season differentials for each qualifying season for each player. This is necessarily different from how multiple-season per-game stats are usually calculated (which is just to sum up the stats from the various seasons and divide by total games). As qualifying games may come from different teams and different circumstances, to isolate a player’s impact it is crucially important that (as much as possible) their presence or absence is the only variable that changes, which is not even remotely possible across multiple seasons. In case anyone is interested, here is the complete table with all differential stats for all 164 qualified players.

I first ran the differentials for Dennis Rodman quite some time ago, so I knew his numbers were very good. But when I set out to do the same thing for the entire league, I had no idea that Rodman would end up literally on top. Here is a histogram of the MOV-differential distribution for all qualified players (rounded to the nearest .5):

^{Note: Red is Dennis Rodman (and Ron Artest).}

3.8 points per game may not sound like much compared to league-leading scorers who score 30+, but that’s both the beauty of this method and the curse of conventional statistics: When a player’s true impact is actually only a few points difference per night (max), you know that the vast majority of the “production” reflected in their score line doesn’t actually contribute to their team’s margin.

This deserves a little teasing out, as the implications can be non-obvious: If a player who scores 30 points per game is only actually contributing 1 or 2 points to his team’s average margin, that essentially means that at least 28 of those points are either 1) redundant or 2) offset by other deficiencies. With such a low signal-to-noise ratio, you should be able to see how how it is that pervasive metrics like PER can be so unreliable: If a player only scores 10 points a night, but 4 of them are points his team couldn’t have scored otherwise, he could be contributing as much as Shaquille O’Neal. Conversely, someone on the league leaderboard who scores 25 points per game could be gaining his team 2 or 3 points a night with his shooting, but then be giving it all back if he’s also good for a couple of unnecessary turnovers.

Professional basketball is a relatively low-variance sport, but winners are still determined by very small margins. Last year’s championship Lakers team had an average margin of victory of just 4.7 points. For the past 5 years, roughly three quarters of teams have had lower MOV’s than Dennis Rodman’s differential in his 5 qualifying seasons:

Now, I don’t want to suggest too much with this, but I would be derelict if I didn’t mention the many Hall of Fame-caliber players who qualified for this list *below* Rodman (my apologies if I missed anyone):

*In HoF already:*

- Hakeem Olajuwon
- Scottie Pippen<
- Clyde Drexler
- Dominique Wilkins

*HoF locks:*

- Shaquille O’Nea
- Dwyane Wade
- Jason Kidd
- Allen Iverson
- Ray Allen

*HoF possible:*

- Yao Ming
- Pau Gasol
- Marcus Camby
- Carlos Boozer
- Alonzo Mourning

*Not in HoF but probably should be:*

- Toni Kukoc
- Chris Mullin
- Tim Hardaway
- Dikembe Mutumbo

The master list also likely includes many players that are NOT stars but who quietly contributed a lot more to their teams than people realize. Add the fact that Rodman managed to post these differentials while playing mostly for extremely good, contending, teams (where it is harder to have a measurable impact), and was never ostensibly the lynchpin of his team’s strategy—as many players on this list certainly were—and it is really quite an amazing outcome.

Now, I do *not* mean to suggest that Rodman is *actually* the most valuable player to lace up sneakers in the past 25 years, or even that he was the most valuable player on this list: 1) It doesn’t prove that, and 2) I don’t think that. Other more direct analysis that I’ve done typically places him “only” in the top 5% or so of starting players. There is a lot of variance in differential statistics, and there are a lot of different stories and circumstances involved for each player. But, at the very least, this should be a wake-up call for those who ignore Rodman for his lack of scoring, and for those who dismiss him as “merely” a role-player.

As I have discussed previously, one of the main defenses of conventional statistics—particularly *vis a vis* their failures w/r/t Dennis Rodman—is that they don’t account for defense or “intangibles.” As stated in the Wikipedia entry for PER:

Neither PER nor per-game statistics take into account such intangible elements as competitive drive, leadership, durability, conditioning, hustle, or WIM (wanting it more), largely because there is no real way to quantitatively measure these things.

This is true, for the most part—but not so much for Rodman. He does very well with indirect statistics, which actually DO account for all of these things as part of the gestalt that goes into MOV or Win% differentials. But these stats also give us a very detailed picture of where those differences likely come from. Here is a table summarizing a number of Rodman’s differential statistics, both for his teams and their opponents. The “reciprocal advantage” is the difference between his team’s differential and their opponent’s differential for the same statistic:

^{Note: Some of the reciprocals were calculated in this table, and others are taken from the dataset (like margin of victory). In the latter case, they may not necessarily match up perfectly, but this is for a number of technical and mathematical reasons that have no significant bearing on the final outcomes.}

Rodman’s Margin of Victory differential comes in part from his teams scoring more points on offense and in part from their allowing fewer points on defense. Superficially, this may look like the majority of Rodman’s impact is coming on the defensive side (-2.4 vs. + 1.3), but that’s deceptive. As you can find in the master table, Rodman also has a significant *negative* effect on “Pace”—or number of possessions per game—which basically applies equally to both teams. This is almost certainly due to his large number of possession-extending offensive rebounds, especially as he was known (and sometimes criticized) for “kicking it out” and resetting the offense rather than trying to shoot it himself or draw a foul. “Scoring opportunities” are total possessions plus offensive rebounds. As you might expect intuitively, his teams generally had about the same number of these with or without him, because the possessions weren’t actually *lost*, they were only restarted.

As we can see from the reciprocal table, Rodman had a slightly positive effect on his teams scoring efficiency (points per opportunity), but also had a small positive (though nearly negligible) effect on his opponents’. Thus, combining the effect his rebounding had on number of scoring opportunities with any other effects he had on each side’s scoring efficiency, we can get a fairly accurate anatomy of his overall margin. In case that confused you, here it is broken down step-by-step:

So, roughly speaking, his 3.7ish margin of victory breaks down to roughly 2.8 points from effect on offensive and defensive scoring opportunities and .9 points from the actual value of those opportunities—or, visually:

Furthermore, at least part of that extra offensive efficiency likely stems from the fact that a larger proportion of those scoring opportunities began as offensive rebounds, and post-offensive-rebound “possessions” are typically worth slightly more than normal (though this may actually be less true with Rodman due to the “kicking”). Otherwise, the exact source of the efficiency differences is much more uncertain, especially as the smaller margins in the other statistics are that much more unreliable because of the sample-size issues inherent in this method.

The next-strongest reciprocal effects on the list above appear to be personal fouls and their corresponding free throws: with him in the lineup, his teams had fewer fouls and more free throws, and his opponents the opposite. This is particularly peculiar because Rodman himself got a lot of fouls and was a terrible free throw shooter (note: this is yet another reason why including personal fouls in your player valuation method—yes, I’m looking at you, PER—is ridiculous).

Whether Rodman was a “role player” or not is irrelevant: whatever his role, he did it well enough to contribute more to his teams than the vast majority of NBA players (role players or not) contributed to theirs. For some reason, this simple concept seems to be better understood in other sports: No-one would say that Mariano Rivera hasn’t contributed much to the Yankees winning because he is “merely” a closer (though I do think he could contribute more if he pitched more innings), just as no-one would say that Darrelle Revis hasn’t contributed much to the Jets because he is “merely” a cornerback.

So does this mean I am conceding that Rodman was just a very good, but one-dimensional, player? Not that there would be anything wrong with that, but definitely not. That is how I would describe it if he had *hurt* his team in other areas, but then made up for it—and then some—through excellent rebounding. This is actually probably how most people would predict that Rodman’s differentials would break down (including, initially, myself), but they don’t. E.g., the fact that his presence on the court didn’t *hurt* his team’s offensive efficiency, despite hardly ever scoring himself, is solid evidence that he was actually an excellent offensive player. Even if you take the direct effects of his rebounds out of the equation entirely, he *still* seems to have made three different championship contenders—including one of the greatest teams of all time—better. While the majority of his value *added*—that which enabled him to significantly improve already great teams—came from his ability to grab rebounds that no one else would have gotten, the full realization of that value was made possible by his not hurting those teams significantly in any other way.

As it wasn’t mystical intangibles or conveniently immeasurable defensive ability that made Rodman so valuable, I think it is time we rescind the free pass given to the various player valuation metrics that have relied on that excuse for getting this one so wrong for so long. However, this does not prove that even a perfectly-designed metric would *necessarily* be able to identify this added value directly. Though I think valuation metrics can be greatly improved (and I’m trying to do so myself), I can’t say for certain that my methods or any others will definitely be able to identify which rebounds actually helped a team get more rebounds and which points actually helped a team score more points. Indeed, a bench player who scores 8 points per game could be incredibly valuable if they were the *right* 8 points, even if there were no other direct indications (incidentally, this possibility has been supported by separate research I’ve been doing on play-by-play statistics from the last few seasons, in which I’ve found that a number of bench players have contributed much more to their teams than most people would have guessed possible). But rather than throwing our hands in the air and defending inadequate pre-existing approaches, we should be trying to figure out how and whether these sorts of problems can be addressed.

As an amusing but relevant aside, you may have already noticed that the data—at least superficially—doesn’t even seem to support the conventional wisdom that, aside from his rebounding, Rodman was primarily a defensive player. Most obviously, his own team’s points per scoring opportunity improved, but his opponents’ improved slightly as well. If his impact were primarily felt on the defensive side, we would probably expect the opposite. Breaking down the main components above into their offensive and defensive parts, our value-source pie-chart would look like this.

The red is actually slightly smaller than his contribution from defensive rebounds alone, as technically defensive efficiency was slightly lower with Rodman in the games. For fun, I’ve broken this down a bit further into an Offense vs. Defense “Tale of the Tape,” including a few more statistics not seen above:

^{Note: Differentials that help their respective side are highlighted in blue, and those that hurt their respective side are highlighted in Red. The values for steals and blocks are each transposed from their team and opponent versions above, as these are defensive statistics to begin with.}

Based on this completely ridiculous and ad-hoc analysis, it would seem that Rodman was more of an offensive player than a defensive one.

Including rebounding, I suspect it is true that Rodman’s overall contribution was greater on offense than defense. However, I wouldn’t read too much into the breakdowns for each side. Rodman’s opponents scoring slightly more per opportunity with him in the game does NOT prove that he was a below-average defender. Basketball is an extremely dynamic game, and the effects of success in one area may easily be realized in others. For example, a strong defensive presence may free up other players to focus on their team’s offense, in which case the statistical consequences could be seen on the opposite side of the floor from where the benefit actually originated.

There are potential hints of this kind of possibility in this data, such as: Why on earth would Rodman’s teams shoot better from behind the arc, considering that he was only a .231 career 3-point shooter himself? This could obviously just be noise, but it’s also possible that some underlying story exists in which more quality long-range shots opened up as a result of Rodman’s successes in other assignments. Ultimately, I don’t think we can draw any conclusions on the issue, but the fact that this is even a debatable question has interesting implications, both for Dennis Rodman and for basketball analytics broadly.

While I am the first to admit that the dataset this analysis is based on might not be sufficiently robust to settle the entire “Case” on its own, I still believe these results are powerful evidence of the truth of my previous inferences—and for very specific reasons:

Assessing the probability of propositions that have a pre-conceived likelihood of being true in light of new evidence can be tricky business. In this case, the story goes like this: I developed a number of highly plausible conclusions about Rodman’s value based on a number of reasonable observations and empirical inquiries, such as: 1) the fact that his rebounding prowess was not just great, but truly extreme, 2) the fact that his teams always seemed to do extremely well on both ends of the floor, and 3) my analysis (conducted for reasons greater than just this series) suggesting that A) scoring is broadly overrated, B) rebounding is broadly underrated, and C) that rebounding has increasing marginal returns (or is exponentially predictive). Then, to further examine these propositions, I employed a completely independent method—having virtually no overlap with the various factors involved in those previous determinations—and it not only appears to confirm my prior beliefs, but does so even more than I imagined it would.

Now, technically, it is possible that Rodman just got extremely lucky in the differential data—in fact, for this sample size, getting that lucky isn’t even a particularly unlikely event, and many of his oddball compatriots near the top of the master list probably did just that. But this situation lends itself perfectly to Bayes’ Theorem-style analysis. That is, which is the better, more likely explanation for this convergence of results: 1) that my carefully reasoned analysis has been completely off-base, AND that Rodman got extremely lucky in this completely independent metric, or 2) that Dennis Rodman actually *was* an extremely valuable player?