So I was catching up on some old blog-reading and came across this excellent post by Brian Burke, Pre-Season Predictions Are Still Worthless, showing that the Football Outsiders pre-season predictions are about as accurate as picking 8-8 for every team would be, and that a simple regression based on one variable — 6 wins plus 1/4 of the previous season’s wins — is significantly more accurate
While Brian’s anecdote about Billy Madison humorously skewers Football Outsiders, it’s not entirely fair, and I think these numbers don’t prove as much as they may appear to at first glance. Sure, a number of conventional or unconventional conclusions people have reached are probably false, but the vast majority of sports wisdom is based on valid causal inferences with at least a grain of truth. The problem is that people have a tendency to over-rely on the various causes and effects that they observe directly, conversely underestimating the causes they cannot see.
So far, so obvious. But these “hidden” causes can be broken down further, starting with two main categories, which I’ll call “random causes” and “counter-causes”:
“Random causes” are not necessarily truly random, but do not bias your conclusions in any particular direction. It is the truly random combined with the may-as-well-be-random, and generates the inherent variance of the system.
“Counter causes” are those which you may not see, but which relate to your variables in ways that counteract your inferences. The salary cap in the NFL is one of the most ubiquitous offenders: E.g. an analyst sees a very good quarterback, and for various reasons believes that QB with a particular skill-set is worth an extra 2 wins per season. That QB is obtained by an 8-8 team in free agency, so the analyst predicts that team will win 10 games. But in reality, the team that signed that quarterback had to pay handsomely for that +2 addition, and may have had to cut 2 wins worth of players to do it. If you imagine this process repeating itself over time, you will see that the correlation between QB’s with those skills and their team’s actual winrate may be small or non-existent (in reality, of course, the best quarterbacks are probably underpaid relative to their value, so this is not a problem). In closed systems like sports, these sorts of scenarios crop up all the time, and thus it is not uncommon for a perfectly valid and logical-seeming inference to be, systematically, dead wrong (by which I mean that it not only leads to an erroneous conclusion in a particular situation, but will lead to bad predictions routinely).
So how does this relate to Football Outsiders, and how does it amount to a defense of their predictions? First, I think the suggestion that FO may have created “negative knowledge” is demonstrably false: The key here is not to be fooled by the stat that they could barely beat the “coma patient” prediction of 8-8 across the board. 8 wins is the most likely outcome for any team ex ante, and every win above or below that number is less and less likely. E.g., if every outcome were the result of a flip of a coin, your best strategy would be to pick 8-8 for every team, and picking *any* team to go 10-6 or 12-4 would be terrible. Yet Football Outsiders (and others) — based on their expertise — pick many teams to have very good and very bad records. The fact that they break even against the coma patient shows that their expertise is worth something.
Second, I think there’s no shame in being unable to beat a simple regression based on one extremely probative variable: I’ve worked on a lot of predictive models, from linear regressions to neural networks, and beating a simple regression can be a lot of work for marginal gain (which, combined with the rake, is the main reason that sports-betting markets can be so tough).
Yet, getting beaten so badly by a simple regression is a definite indicator of systematic error — particularly since there is nothing preventing Football Outsiders from using a simple regression to help them make their predictions. Now, I suspect that FO is underestimating football variance, especially the extent of regression to the mean. But this is a blanket assumption that I would happily apply to just about any sports analyst — quantitative or not — and is not really of interest. However, per the distinction I made above, I believe FO is likely underestimating the “counter causes” that may temper the robustness of their inferences without necessarily invalidating them entirely. A relatively minor bias in this regard could easily lead to a significant drop in overall predictive performance, for the same reason as above: the best and worst records are by far the least likely to occur. Thus, *ever* predicting them, and expecting to gain accuracy in the process, requires an enormous amount of confidence. If Football Outsiders has that degree of confidence, I would wager that it is misplaced.