Bayes’ Theorem, Small Samples, and WTF is Up With NBA Finals Markets?

Seriously, I am dying to post about something non-NBA related, and I should have my Open-era tennis ELO ratings by surface out in the next day or so. But last night I finally got around to checking the betting markets to see how the NBA Finals—and thus my chances of winning the Smackdown—were shaping up, and I was shocked by what I found. Anyway, I tossed a few numbers around, and thought you all might find them interesting. Plus, there’s a nice little object-lesson about the usefulness of small sample size information for making Bayesian inferences. This is actually one area where I think the normal stat geek vs. public dichotomy gets turned on its head: Most statistically-oriented people reflexively dismiss any empirical evidence without a giant data-set. But in certain cases—particularly those with a wide range of coherent possibilities—I think the general public may even be a little too conservative about the implications of seemingly minor statistical anomalies.

Freaky Finals Odds:

First, I found that most books seem to see the series as a tossup at this point. Here’s an example from a European sports-betting market:

Intuitively, this seemed off to me. Dallas needs to win 1 out of the 2 remaining games in Miami. Assuming the odds for both games are identical (admittedly, this could be a dubious assumption), here’s a plot of Dallas’s chances of winning the series relative to Miami’s expected winrate per home game:

So for the series to be a tossup, Miami needs to be about a 71% favorite per game. Even at home in the playoffs, this is extremely high. Depending on what dataset you use, the home team wins around 60-65% of the time in the NBA regular season and about 65%-70% of the time in the postseason. But that latter number is a bit deceptive, since the playoffs are structured so that more games are played in the homes of the better teams: aside from the 2-3-2 Finals, any series that ends in an odd number of games gives the higher-seeded team (who is often much better) an extra game at home. In fact, while I haven’t looked into the issue, that extra 5% could theoretically be less than the typical skill-disparity between home and away teams in the playoffs, which would actually make home court less advantageous than in the regular season.

Now, Miami has won only 73% of their home games this season, and it was against below-average competition (overall, they had one of the weakest schedules in the league). Counting the playoffs, at this point Dallas actually has a better record than Miami (by one game), and they played an above-average schedule. More importantly, the Mavs won 68% of their games on the road (compare to the league average of 35-40%). Not to mention, Dallas is 5-2 against the Heat overall, and 2-1 against them at home (more on that later).

So how does the market tilt so heavily to this side? Honestly, I have no idea. Many people are much more willing to dismiss seemingly incongruent market outcomes than I am. While I obviously think the market can be beaten, when my analytical results diverge wildly from what the money says, my first inclination is to wonder what I’m doing wrong, as the odds of a massive market failure are probably lower than the odds that I made a mistake. But, in this case, with comparatively few variables, I don’t really get it.

It is a well-known phenomenon in sports-betting that huge games often have the juiciest (i.e., least efficient) lines. This is because the smart money that normally keeps the market somewhat efficient can literally start to run out. But why on earth would there be a massive, irrational rush to bet on the Heat? I thought everyone hated them!

Fun With Meta-Analysis:

So, for amusement’s sake, let’s imagine a few different lines of reasoning (I’ll call them “scenarios”) that might lead us to a range of different conclusions about the present state of the series:

Miami won at Home ~73% of the time while Dallas won on the road (a fairly stunning) 68% of the time. If these values are taken at face value, a generic Miami Home team would be roughly 5% better than a generic Dallas road team, making Miami a 52.5% favorite in each game.
The average home team in the NBA wins about 63% of the time. Miami and Dallas seem pretty evenly matched, so Miami should win each game ~63% of the time as well.
Let’s go with the very generous end of broader statistical models (discounting early-season performance, giving Miami credit for championship experience, best player, and other factors), and assume that Miami is about 5-10% better than Dallas on a neutral site. The exact math on this is complicated (since winning is a logistic function), but, ballpark, this would translate into about a 65.5% chance at home.
Markets rule! Approximate Market Price for a Miami series win is ~50%, translating into the 71% chance mentioned above above.

Here’s a scatter-plot of the chances of Dallas winning the series based on those per-game estimates:

Ignore the red dots for now—we’ll get back to those. The blue dots are the probability of Dallas winning at least one of the next two games (using the same binomial formula as the function above). Now, hypothetically, let’s assume you thought each of these analyses were equally plausible, your overall probability for Dallas winning the title would simply be the average of the four scenario’s results, or right around 60%. Note: I am NOT endorsing any of these lines of reasoning or any actual conclusions about this series here—it’s just a thought experiment.

A Little Bayesian Inference:

As I mentioned above, the Mavericks are 5-2 against the Heat this season, including 2-1 against them in Miami. Let’s focus on the second stat: Sticking with the assumption that you found each of these 4 lines of reasoning equally plausible prior to knowing Dallas’s record in Miami, how should your newly-acquired knowledge that they were 2-1 affect your assessment?

Well, wow! 3 games is such a miniscule sample, it can’t possibly be relevant, right? I think most people—stat geek and layperson alike—would find this statistical event pretty unremarkable. In the abstract, they’re right: certainly you wouldn’t let such a thing invalidate a method or process built on an entire season’s worth of data. Yet, sometimes these little details can be more important than they seem. Which brings us to perhaps the most ubiquitously useful tool discovered by man since the wheel: Bayes’ Theorem.

Bayes’ Theorem, at it’s heart, is a fairly simple conceptual tool that allows you to do probability backwards: Garden-variety probability involves taking a number of probabilistic variables and using them to calculate the likelihood of a particular result. But sometimes you have the result, and would like to know how it affects the probabilities of your conditions: Bayesian analysis makes this possible.

So, in this case, instead of looking at the games or series directly, we’re going to look at the odds of Dallas pulling off their 2-1 record in Miami under each of our scenarios above, and then use that information to adjust the probabilities of each. I’ll go into the detail in a moment, but the relevant Bayesian concept is that, given a result, the new probability of each precondition will be adjusted proportionally to its prior probability of producing that result. Looking at the red dots above (which are technically the cumulative binomial probability of Miami winning 0 or 1 out of 3 games), you should see that Dallas is far more likely to go 2-1 or better on Miami’s turf if they are an even match than if Miami is a huge favorite—over twice as likely, in fact. Thus, we should expect that scenarios suggesting the former will become much more likely, and scenarios suggesting the latter will become much less so.

In its simplest form, Bayes’ Theorem states that the probability of A given B is equal to the probability of B given A times the prior probability of A (probability before our new information), divided by the prior probability of B:

$P(A|B)= \frac{P(B|A)*P(A)} {P(B)}$

Though our case looks a little different from this, it is actually a very simple example. First, I’ll treat the belief that the four analyses are equally likely to be correct as a “discrete uniform distribution” of a single variable. That sounds complicated, but it simply means that there are 4 separate options, one of which is actually correct, and each of which is equally likely. Thus, the odds of any given scenario are expressed exactly as above (B is the 2-1 outcome):

$P(S_x)= \frac{P(B|S_x)*P(S_x)} {P(B)}$

The prior probability for S_xis .25. The prior probability of our result (the denominator) is simply the sum of the probabilities of each scenario producing that result, weighted by each scenario’s original probability. But since these are our only options and they are all equal, that element will factor out, as follows:

$P(B)= P(S_x)*(P(B|S_1)+P(B|S_2)+P(B|S_3)+P(B|S_4))$

Since P(S_x) appears in both the numerator and the denominator, it cancels out, leaving our probability for each scenario as follows:

$P(S_x)= \frac{P(B|S_x)} {P(B|S_1)+P(B|S_2)+P(B|S_3)+P(B|S_4)}$

The calculations of P(B|S_x) are the binomial probability of Dallas winning exactly 2 out of 3 games in each case (note this is slightly different from above, so that Dallas is sufficiently punished for not winning all 3), and Excel’s binom.dist() function makes this easy. Plugging those calculations in with everything else, we get the following adjusted probabilities for each scenario:

Note that the most dramatic changes are in our most extreme scenarios, which should make sense both mathematically and intuitively: going 2-1 is much more meaningful if you’re a big dog.

Our new weighted average is about 62%, meaning the 2-1 record improves our estimate of Dallas’s chances by 2%, making the gap between the two 4%: 62-38 (24% difference) instead of 60-40. That may not sound like much, but a few percentage points of edge aren’t that easy to come by. For example, to a gambler, that 4% could be pretty huge: you normally need a 5% edge to beat the house (i.e., you have to win 52.5% of the time), so imagine you were the only person in the world who knew of Dallas’s miniature triumph—in this case, that info alone could get you 80% of the way to profit-land.

Making Use:

I should note that, yes, this analysis makes some massively oversimplifying assumption—in reality, there can be gradients of truths between the various scenarios, with a variety of interactions and hidden variables, etc.—but you’d probably be surprised by how similar the results are whether you do it the more complicated way or not. One of the things that makes Bayesian inference so powerful is that it often reveals trends and effects that are relatively insulated from incidental design decisions. I.e., the results of extremely simplified models are fairly good approximations of those produced by arbitrarily more robust calculations. Consequently, once you get used to it, you will find that you can make quick, accurate, and incredibly useful inferences and estimates in a broad range of practical contexts. The only downside is that, once you get started on this path, it’s a bit like getting Tetrisized: you start seeing Bayesian implications everywhere you look, and you can’t turn it off.

Of course, you also have to be careful: despite the flexibility Bayesian analysis provides, using it in abstract situations—like a meta-analysis of nebulous hypotheses based on very little new information—is very tricky business, requiring good logical instincts, a fair capacity for introspection, and much practice. And I can’t stress enough that this is a very different beast from the typical talking head that uses small samples to invalidate massive amounts of data in support of some bold, eye-catching and usually preposterous pronouncement.

Finally, while I’m not explicitly endorsing any of the actual results of the hypo I presented above, I definitely think there are real-life equivalents where even stronger conclusions can be drawn from similarly thin data. E.g., one situation that I’ve tested both analytically and empirically is when one team pulls off a freakishly unlikely upset in the playoffs: it can significantly improve the chances that they are better than even our most accurate models (all of which have significant error margins) would indicate.

Tiger Woods Needs to Need a Therapist (and Probably Does)

Tiger Woods is obviously having a terrible season. His scoring average so far (71.66) is almost 2 strokes higher than his previous worst year (69.75 in 1997). He has no wins, no top 3’s, and has only finished top 10 in 2 of 9 tournaments. That 22%, if it holds up, would be the worst of his career by 20%. For the first time basically ever, his eventually capturing the all-time major championships record is in doubt. Of course, 9 tournaments is not a large sample, and this could just be a slump. As I see it, there are basically 4 possibilities:

Tiger is running very badly.
Tiger is in serious decline.
Tiger is declining somewhat and running somewhat badly.
Tiger needs a shrink.

So the questions of the day are: a) How likely are each of these possibilities? and b) What does each say about his chances of winning 19 majors? For reasons I will explain, I believe 1 and 2 are very unlikely, and 3 is somewhat unlikely. Which is fine, since Tiger should basically pray this is all in his head, because otherwise his chances of catching and passing Nicklaus are diminishing considerably.

I would normally be the first to promote a “bad variance” explanation of this kind of phenomena, but in this case: a) Tiger doesn’t really have slumps like this; and b) the timing is too much of a coincidence. For some historical perspective, here’s a graph of Tiger’s overall winning %, top-10 finish %, and winning % in majors, by year:

For the record, his averages are 28.4%, 66.4% and 24.6%, respectively. As should be obvious, not only is his 2010 historically awful, but there is nothing to suggest that he was in decline beforehand. Despite having recently run slightly worse in majors than he did in the early 2000’s, his Win% and Top-10% trendlines have still been sloping upwards.
Of course, 2/3 of a season is still a small sample, and it is certainly possible that this is variance, but just because something *could* be a statistical artifact doesn’t mean that it is *likely* to be. In fact, one problem with statistically-oriented sports analysis is that its proponents can sometimes be overly (even dogmatically) committed to neutral or variance-based explanations for observed anomalies, even when the conventional explanation is highly plausible (ironically, I think this happens because people often apply Bayes’ Theorem-style reasoning implicitly, even if the statisticians forget to apply it explicitly). I believe this is one of those situations.

That said, whether it stems from diminishing skills or ongoing psychological unrest, a significant and continuing Tiger decline is still a realistic possibility. From the chart above, it should be clear that Tiger circa 2009 shouldn’t have any problem blowing past Jack, but what would happen if he were a different Tiger? Fortunately for him, he has a long way to drop before being a non-factor. For comparison, let’s look at the same graph as above, but for the 2nd-best player of the recent era, Phil Mickelson:

Mickleson’s averages are 9.2%, 35.8%, and 5.6%, respectively. Half a Tiger would still be much better. Of course, Mickelson has won 4 majors in recent years, but has still been much worse than Tiger: over that period his averages are 12.2%, 40.1%, and 14.3%. It should not go without notice that if Tiger transformed into Phil Mickelson, played 7 more years, and won majors at the same rate that Mickelson has over the last 7 (Phil is about 6 years older), it would put him at exactly the magic number: 18.

Finally, let’s look at the graph for the man himself — Jack Nicklaus:

^{Note: For years prior to 1970, only official PGA Tour events are included.}
Jack’s averages over this span (from the year he turned pro to the year of his final major) are 15.5%, 63.4%, and 18%. These numbers are slightly understated, since in truth Jack was well past his prime when he won the Masters in ’86. As we can see, Jack began to decline significantly around 1979, but still won 3 more majors after that point. A similar pattern for Woods would put him at 17, and at least in contention for the record. On the other hand, not everyone is Jack Nicklaus. Nicklaus, incredibly, won a higher percentage of majors than tournaments overall. This is especially apparent in his post-decline career: note the small amount of blue compared to the amount of green from 1979 on. Whether he just ran well in the right spots, or whether he had preternatural competitive spirit, not even Tiger Woods can count on having Nicklaus’s knack for winning majors. So if Tiger hopes to catch up, he had better be out of his mind.

A Decade of Hot Teams in the Playoffs

San Diego and Dallas were the Super Bowl-pick darlings of many sports writers and commentators heading into this postseason, in no small part because they were the two “hottest” teams in the NFL, having finished the regular season with the two longest winning streaks of any contenders (at 11 and 3, respectively). Routinely, year after year, I think that the prediction-makers in the media overvalue season-ending rushes. My reasons for believing this include:

The seeding of many teams are frequently sealed or near-sealed weeks before the playoffs begin, leaving them with little incentive to compete fully.
Teams that are eliminated from playoff contention may be dispirited, and/or players may not be giving 100% effort to winning, instead focusing on padding statistics or avoiding injury.
When non-contenders do give maximum effort, it may more often be to play the role of “spoiler,” or to save face for their season by trying to beat the most high-profile contenders.
Variance.

So the broader question to ask is “does late-season success correlate any more strongly with postseason performance than middle or early season success?” But in this case, I’m interested only in winning streaks — i.e., the “hottest” teams, for which any relevant sample would probably be too small to draw any meaningful conclusions. However, I thought it might be interesting to look at how the teams with the longest winning streaks have performed in the last decade:

2009:
AFC: San Diego: Won 11, lost divisional
NFC: Dallas: Won 3, lost divisional

2008:
AFC: Indianapolis: Won 9, lost wildcard
NFC: Atlanta: Won 3, lost wildcard

2007:
AFC: New England: Won 16, lost Superbowl
NFC: Washington: Won 4, lost wildcard

2006:
AFC: San Diego: Won 10, lost divisional
NFC: Philadelphia: Won 5, lost divisional

2005:
NFC: Redskins: Won 5, lost divisional
AFC: Tie: Won 4: Denver: lost AFC championship; Pittsburg: won Superbowl
(the hottest team overall, Miami, won 6 but didn’t make the playoffs)

2004:
NFC: Pittsburg: Won 14, lost AFC championship
AFC: Tie: Won 2: Seattle: lost Superbowl; St. Louis: lost divisional; Green Bay: lost wildcard
2003:
AFC: New England: Won 12, won Superbowl
NFC: Green Bay: Won 4, lost divisional

2002:
AFC: Tennessee: Won 5, lost AFC championship
NFC: NY Giants: Won 4, lost wildcard

2001:
AFC: Patriots: Won 6, won Superbowl
NFC: Rams: Won 6, lost Superbowl

2000:
AFC: Baltimore: Won 7, won Superbowl
NFC: NY Giants: Won 5, lost Superbowl

From 2006 on, the hottest teams have obviously done terribly, with the undefeated Patriots being the only team to make it out of the divisional round. Prior to that, the results seem more normal: In 2005, Pittsburg won the Superbowl after tying for the longest winning streak among AFC playoff teams (though they trailed Washington in the NFC and Miami who didn’t make the playoffs). New England won the Superbowl as the hottest team twice: in 2001 and 2003 — although both times they were one of the top seeds in their conference as well. The last hottest team to play on wildcard weekend AND win the Superbowl was the Baltimore Ravens in 2000.

So what does that tell us? Well, a decent anecdote — and not much more. The sample is small and the numbers inconclusive. On the one hand, the particular species of Cinderella team that gets predicted to win the Superbowl year after year by some — one that starts the season weakly but catches fire late and rides their momentum to the championship — has been a rarity (and going back further, it doesn’t get any more common). On the other hand, if you simply picked the hottest team to win the Superbowl every year in this decade, you would have correctly picked 3 winners out of 10, which would not be a terrible record.