From the Live Blog: Odds and Ends (On Moneyball, Michael Vick, Cam Newton, Kerry Collins, etc.)

[Preface: With apologies to people those of you who sat through or otherwise already read my NFL Live Blog from Sunday, it is incredibly long, so I thought maybe I should split out individual posts for some of the individual topics I covered.  I’ve removed the time-stamps and re-organized a bit, but this is all original, so it obviously may not be as clean or detailed as a regular article (any additional comments I’ll put in brackets or at the end).  If this sort of stuff interests you, I will be live-blogging again this Sunday.]

On Moneyball:

True story: Yesterday, my wife needed a T-shirt, and ended up borrowing my souvenir shirt from SSAC (MIT/Sloan Sports Analytics Conference). She was still wearing it when we went to see Moneyball last night, and, sure enough, she ended up liking it (nerd!) and I thought it was pretty dull.

Nit-picking: The Athletics won their last game of the season in 2004, 2005, 2007, and 2010. (It’s not that hard when you don’t make the playoffs). [If you haven’t seen it, a major theme in the whole affair is how Billy Beane really wants to win the last game of the season.]

Really Moneyball is all about money, not statistics. Belichick would be such a better subject for a sports-analytics movie than Billy Beane.  It’s dramatic how Belichick has been willing to do whatever it takes to win—whether it be breaking the rules or breaking with convention—plus, you know, with more success.

On Advanced NFL Stats’ WPA Calculator:

I haven’t really used Advanced NFL Stats WPA Calculator much, as I’ve been (very slowly) trying to build my own model.  But I just noticed it doesn’t take time outs into account.  I’m curious whether that’s the same for his internal model or if that’s just the calculator.  Obv timeouts make a huge difference in endgame or even end-half scenarios (and accounting for them properly is one of the toughest things to figure out).

[This came up in the Pitt/Indy game, I believe on the play where Roethlisburger scrambled for a 1st down in Indy territory.] Oooh, depending on the time out situation, that might have been a spot where dropping just short of the first down would have been better than making it.  Too bad Burke’s WPA Calculator doesn’t factor in time outs!

On Andy Reid:

So Both Donovan McNabb and Michael Vick have been considerably better QB’s in Philadelphia than elsewhere.  At some point, does Andy Reid get some credit?  Without a Super Bowl ring, he’s generally respected but not revered in the mainstream, and he’s such a poor tactician that he’s dismissed by most analytics types.  But he may be one of the best offensive schemers in the modern era.

[David Meyers asked what I meant by “poor tactician”]: I just mean that he has notoriously bad time management skills, makes ridiculous 4th down decisions, and generally seems clueless about win maximization, esp. in end-game scenarios.

On Michael Vick:

I kind of feel the same way about Vick that I felt about Stephen Strasbourg after he hurt his arm last year: their physical skills are so unprecedented that, unfortunately, Bayesian inference suggests that their injury-proneness isn’t a coincidence.

So if the Eagles go on to lose, does this make Vick 1-0 with 2 “no decisions” for the year?

On Cam Newton:

Watching pre-game.  Strahan is taking “overreaction” to a new level, not only declaring that maybe the NFL isn’t even ready for Cam Newton, but that this has taught him to stop being critical of rookie QB’s in the future.

So should I be more or less excited about Cam Newton after his win today?  He had a much more “rookie-like” box of 18/34 for 158.  Here’s how to break that down for rookies: Low yards = bad. High attempts = good.  Completion percentage = completely irrelevant. Win = Highly predictive of length of career, not particularly predictive of quality (likely b/c a winning rookie season gets you a lot of mileage whether you’re actually good or not). Oh, and he’s still tall:  Height is also a significant indicator (all else being equal).

On Breakouts:

I remember Mike Wallace being a valuable backup on my fantasy team in 2009, otherwise, meh.  Seems to talk a lot of crap that these announcers eat up.  Ironically, though, if a rookie or a complete unknown starts a season super-hot, commentary is usually that they’re already the next big thing, while a quality-but-not-superstar veteran with a hot start is often just credited with a hot start.  But, in reality, I think the vet, despite being more of a known quantity, is still more likely to take off.  In this case, they’re busting out the hyperbole regardless.

Speaking of which, does anyone remember Ryan Moats?  A stringer for Houston in 2009, he ended up starting (briefly) after a rash of injuries to his teammates. In his first start (against Buffalo), he had 150 yards and 3 touchdowns, and some fantasy contestants were falling over each other to pick him up.  After that, he had 2 touchdowns the rest of the season, and then was out of football.

On Troy Polamalu:

Polamalu to the rescue, of course.  He’s so good that I think he improves the Steeler’s offense.  (And no, not kidding.)

On Skeptical Sports Analysis:

Google Search Leading to My Blog of the Day: “what sport does dennis rodman play”

Shout-out to Matt Glassman for plugging my live blog on his:

One look at his blog will convince you that he’s not only a killer sports statistician, but he’s also an engaging and humorous writer.

Though, at best, this generous praise is a game of “Two Truths and a Lie.” (I’m not even remotely a statistician.)

If I were more clever, I’d think of some riff off the Jay-Z’s 99 problems line:

Nah, I ain’t pass the bar but i know a little bit

Enough that you won’t illegally search my shit

Incidentally, love the Rap Genius annotation for that lyric (also apt to my situation):

If you represent yourself (pro se), Bar admission is not required, actually

On Kerry Collins:

So I always think of Kerry Collins as a pretty bad QB, but damn: he’s the last man standing from the entire 1995 draft:

And, you know, he’s not dead.  So I guess he won that rivalry.

From the Live Blog: On Detroit, Quick Field Goals, and Buffalo

[Preface: With apologies to people those of you who sat through or otherwise already read my NFL Live Blog from Sunday, it is incredibly long, so I thought maybe I should split out individual posts for some of the individual topics I covered.  I’ve removed the time-stamps and re-organized a bit, but this is all original, so it obviously may not be as clean or detailed as a regular article (any additional comments I’ll put in brackets or at the end).  If this sort of stuff interests you, I will be live-blogging again this Sunday.]

On Detroit’s OT 1st-Down Field Goal:

Nate asks:

Any thoughts on the Lions kicking a 32-yard FG in overtime from the left hash on first down?

I’ve thought about this situation a bit, and I don’t hate it.  Let me pull up this old graph:

So a kneel in the center is maybe slightly better: generically, they lose a percentage or two, but I’m pretty sure that even from that distance you lose a percentage or two for being on the hash.  Kickers are good enough at that length that going for extra yards or a TD isn’t really worth it, plus you’re not DOA even if you miss (while you might be if you turn the ball over).

Also from that post where the graph came from, the “OT Choke Factor” for kicks of that length is negligible.

Matt Glassman asks:

Question re: field goals — What percentage are you looking for your kicker to have at the longest range you are willing to regularly (i.e. throughout the game) use him?

I’ll use a static example: if your kicker was a known 50% from 52 yards, would you regularly take that over a punt? What about 40%, etc. Then make it dynamic, where the kicker has some shrinking probability as he moves back, and the coach has a decision about whether to kick/punt from a given distance. At what maximum distance/percentage do you regularly kick, rather than regularly punt.

This is a good question and topic, but it’s extremely hard to generalize. It depends on your game situation and what your alternatives are. Long kicks, for example, are generally bad—even with a relatively good long-range kicker.  But in late-game or late-half scenarios, clearly being able to take long kicks can be very valuable.

It is demonstrable, however, that NFL kickers have gotten incredibly good compared to past kickers.  Aside from end-game scenarios, kicking FG’s used to be almost universally dominated by going for it (or sometimes punting). But since kickers have become so accurate, the balance has gotten more delicate.

Also [sort of contra Brian Burke, I’m thinking of a link but can’t find it], I think individual team considerations are a much bigger factor in these decisions than just raw WPA.  It depends a lot on how good your offense is, how good it is at converting particular distances, how good your defense is, etc.  While the percentage differences may be fairly small for the instant decision, they pile up on each other in these types of multi-pronged calculations.

[In the later game] Chris Collinsworth said kickers prefer being on the left hash (though the justification was kind of weak). [Obv this would make the 1st down kick even better.]

On Detroit and Buffalo’s Chances:

Detroit is currently 3-0 and leading the league in Point Differential at +55, and unlikely to be passed by anyone any time soon [by which I mean, this weekend].

That +55 would be the 16th best since 2000.  Combined with their 3-0 record, they project to win ~11 games, though with lots of variance:

Yes, this can be calculated more precisely, but it will be around 11 games regardless.

The teams who led in MOV after 3 weeks since 2000 were:

  • 2010: Pittsburgh, +39, Lost Super Bowl
  • 2009: New Orleans, +64, Won Super Bowl
  • 2008: Tennessee, +43, Lost Divisional
  • 2007: New England, +79, Lost Super Bowl
  • 2006: San Diego, +57, Lost Divisional
  • 2005: Cincinatti, +60, Lost Wild Card
  • 2004: Seattle, +52, Lost Wild Card
  • 2003: Denver, +65, Lost Wild Card
  • 2002: Miami, +63, Missed Playoffs
  • 2001: Green Bay, +80, Lost Divisional
  • 2000: Tampa Bay, +67, Lost Wild Card

Not bad.  Only Miami missed the playoffs, and they were in a 3 way tie atop AFC East at 9-7.

David asks:

Q: The Bills for real? What do they project to over a season?

Um, I don’t know.  Generically, being 3-0 and +40 projects to 10 or 11 wins, but there’s a lot of variance in there.  The previous season’s results are still fairly significant, as are the million other things most fans could tick off.  Another statistically significant factor that most people prob wouldn’t think of is preseason results.  The Bills scored 24 and 35 points in games 2 and 3 of the preseason.  There’s a ton of work behind this result, but basically I’ve found that points scored plus points scored in games 2 and 3 of the preseason (counting backwards) is approximately as predictive as points scored minus points allowed in one game from the regular season.  So, loosely speaking, in this case, you might say that the Bills are more like a 4-0 team, with the extra game worth of data being the equivalent of a fairly quality win over a Denver/Jacksonville Hybrid.

I’d also note that it’s difficult to take strength of schedule into account at this point, at least in a quantitative way.  You can make projections about the quality of a team’s opponents, but the error in those projections are so large at this point that they add more variance to your target team’s projections than they are worth.  Or, maybe a simpler way to put it: it’s hard enough to adjust for quality of opponent when you *know* how good they were, and we don’t even know, we just have educated guesses.  (Even at the END of the season, I think a lot of ranking models and such don’t sufficiently account for the variance in SoS: that is, when a team beats x number of teams with good records, they can do very well in those rankings, even though some of the teams they beat overperfomed in their other games.  In fact, given regression to the mean, this will almost always be the case.  Of course, a clever enough model should account for this uncertainty.)

From the Live Blog: More on Interceptions

[Preface: With apologies to people those of you who sat through or otherwise already read my NFL Live Blog from Sunday, it is incredibly long, so I thought maybe I should split out individual posts for some of the individual topics I covered.  I’ve removed the time-stamps and re-organized a bit, but this is all original, so it obviously may not be as clean or detailed as a regular article (any additional comments I’ll put in brackets or at the end).  If this sort of stuff interests you, I will be live-blogging again this Sunday.]

Aaron Schatz tweeted:


Bills go for it on fourth-and-14 from NE 35… and Fitzpatrick throws his second pick (first that is his fault)

4th and 14 is a situation where I think more quarterbacks throw too few interceptions than throw too many.

[As you can note from this post, an issue I’m very interested in is how to judge interceptions more fairly.  As I said in the comments, “I’m conceptually drawn to the similarity between the stigma against interceptions and the stigma against going for it on 4th down.”  E.g., a coach who played optimal 4th down strategy would easily lead the league in 4th down turnovers.]

Though, I have to admit, Aaron Rodgers is a great QB who seems to defy my “Show me a QB who doesn’t throw interceptions, and I’ll show you a sucky quarterback” rule of thumb.  And it’s not like Tom Brady, who throws INT’s when his team is struggling and doesn’t throw them when his team is awesome (which, ofc, I have NO problem with): Rodgers has a crazy-low INT rate on a team that has been mediocre (2008), good-but-not-great (2009), or all over the place (2010) during his 3 years as a starter.

Ok, purely for fun, let’s compare the all-time single-season leaders in (low) Int% (from Pro Football Reference):

With the all-time leaders for most INT thrown (also from Pro Football Reference):

Not drawing any conclusions or doing any scientific comparisons, but both lists seem to have plenty of studs as well as plenty of duds. (Actually, when I first made this comparison a couple of years ago, the “Most” list had a much better resume than the “Least” list.  But since then, the ‘good’ list has added several quality new members.)

From the Live Blog: On Bill Belichick and Peyton Manning

[Preface: With apologies to people those of you who sat through or otherwise already read my NFL Live Blog from Sunday, it is incredibly long, so I thought maybe I should split out individual posts for some of the individual topics I covered.  I’ve removed the time-stamps and re-organized a bit, but this is all original, so it obviously may not be as clean or detailed as a regular article (any additional comments I’ll put in brackets or at the end).  If this sort of stuff interests you, I will be live-blogging again this Sunday.]

This may not be the most controversial statement, but I think the two most powerful forces in the NFL over the last decade have been Peyton Manning and Bill Belichick (check out the 2nd tab as well):

[Main axes are] wins in season n against wins in season n+1. [Note (May 28, 2012): Updated through 2011 season.]

In case you haven’t seen it, the old “Graph of the Day” that I tweaked for the above is here.

Belichick, of course, is known for winning Super Bowls, going for it on 4th down, and:

Good thing he doesn’t have to worry about potential employers Googling him.

Since I [was] watching the Indy game, a few things about Peyton Manning:

First, a quick over/under: .5, for number of Super Bowls won by Peyton Manning as a coach?  I mean, I’d take the under obv just b/c of the sheer difficulty of winning Super Bowls, but I’d be curious about the moneyline.

[Here’s] something completely new to me.  Not sure exactly what to make of it, but it’s interesting:

This is QB’s with 7+ seasons of 8+ games who averaged 200+ yards per game (n=42).  These are their standard deviations, from season to season (counting only the 8+ gm years), for Yards per Game vs. Adjusted Net Yards Per Attempt.

The red dot is our absentee superstar, Peyton Manning, and the green is Johnny Unitas.  The orange is Randall Cunningham, but his numbers I think are skewed a bit because of the Randy Moss effect.  The dot at the far left of the trend-line is Jim Kelly.

So what to make of it?  I’ve been mildy critical of Adjusted Net Yards Per Attempt for the same reasons I’ve been critical of Win Percentage Added: Since the QB is involved in basically every offensive play, both of these tend to track two things:

  1. Their raw offensive quality, plus (or multiplied by)
  2. The amount which the team relies on the passing game.

Neither is particularly indicative of a QB doing the best with what he can, as it is literally impossible to put up good numbers in these stats on a bad team.

So it’s interesting to me that Peyton — who most would agree is one of the most consistent QB’s in football — would have such a high ANY/A standard dev (he also has a larger sample than some of the other qualifiers).

An incredibly superficial interpretation might be that Peyton sacrifices efficiency in order to “get his yards.” OTOH, this may be counter-intuitive, but I wonder if it’s not actually the opposite: Peyton was an extremely consistent winner.  Is it possible that the ANY/A to some extent reflected the quality of his supporting cast, but the yards sort of indirectly reflect his ability to transfer whatever he had into actual production? Obv I’d have to think about it more.

With their schedule, Indy may be eliminated from playoff contention before Manning even starts thinking about a return.  Could be good for them next year, though:  San Antonio Gambit, anyone?

Okay, one last thought:  In this post, Brian Burke estimates Manning’s worth to that team, and uses the team’s total offensive WPA as a sort of “ceiling” for how valuable Manning could be:

In this case, it can tell us how many wins the Peyton Manning passing game can account for. Although we can’t really separate Manning from his blockers and receivers, we can nail down a hard number for the Colts passing game as a whole, of which Manning has been the central fixture.

The analysis, while perfectly good, does ignore two possibilities: First, the Indianapolis offense minus Manning may be below average (negative WPA), in which case the “Colts passing game” as a whole would understate Manning’s value: E.g., he could be taking it from -1 to +2.5, such that he’s actually worth 3.5, etc.  Second, even if you could get a proper measure of how much the offense would suffer without Manning, that still may not account for the degree to which the Indianapolis offense bolstered their defense’s stats.  When you’re ahead a lot, you force the other team to make sub-optimal plays that increase variance to give themselves some opportunity to catch up: this makes your defense look good. In such a scenario, I would imagine hearing things like, “Oh, the Indianapolis defense is so opportunistic!” Hmmm.

New! This Sunday: Wire-to-Wire NFL Live-Blog

With a nice vacation under my belt and the NFL season underway, I figure it’s a good time to shift some of my attention back to the blog.  I’m working on finishing and writing up some of the research and analysis I’ve been doing for a number of different sports and contexts (even baseball), so I should have some pretty interesting and diverse things to post about in the coming weeks.  But I’d also like to try some new things content-wise, and one of those that I’m very excited about is doing a regular NFL live-blog:  So, for the first time this Sunday, I’ll be conducting an all-day live blog—starting from the first kick-off and continuing all the way through the night game.

Obv I’ll be kind of making up the format as I go along, but I expect it to be a little different from your usual play-by-play with instant reactions.  We’ll see what works and what doesn’t, but my intention is for it to be a bit more of a window into how I watch the NFL, and the kinds of things I think about and explore in the process, like:

  • Random thoughts and observations related to the games and coverage that I’m watching.
  • Quick and dirty analysis (I’ve got my databases locked and loaded, and there will be graphs).
  • Relevant tidbits from or previews of some of my ongoing research.
  • Links and/or brief discussions of relevant articles, tweets, blog posts or other things that I’m reading.
  • Other random ideas (sports related or not) that grab me and won’t let go.

Additionally, if there are any reader questions, criticisms, or comments that come up, I’ll be monitoring and responding to them throughout the day (and these don’t necessarily have to be on topic: so if you have the urge to pick my brain, challenge my ideas, or point out any of my stupid mistakes, this will be a good opportunity to get an immediate response).

I’ll be starting just before the first kickoff, around 10am PST.  So, you know, be there, drop on by, I’ll make it worth your while, see you then, etc.

ESPN Stat Geek Smackdown 2011 Champion

. . . is me.

Final Standings:

  1. Benjamin Morris (68)
  2. Stephen Ilardi (65)
  3. Matthew Stahlhut (56)
  4. (Tie) Haralabos Voulgaris (54)
  5. (Tie) John Hollinger (54)
  6. David Berri (52)
  7. Neil Paine (49)
  8. Henry Abbott’s Mom (46)

To go totally obscure, I feel like Packattack must have felt when he pulled off this strat (the greatest in the history of Super Monkey Ball):

That is, he couldn’t have done it without a lot of luck, but it still feels better than just getting lucky.

As for the result, I don’t have any awesome gloating comments prepared: Like all the other “Stat Geeks,” I thought Miami was a favorite going into the Finals—and given what we knew then, I would think that again.  But at this point I definitely feel like the better team won.

For as far as they went, Miami’s experiment of putting 3 league-class primary options on the same team was essentially a failure.  I’m sure the narrative will be about how they were “in disarray” or needed more time together, but ultimately it’s a design flaw.  Without major changes, I think they’ll be in a similar spot every year: that is, they’ll be very good, and maybe even contenders, but they won’t ever be the dominant team so many imagined.

As for Dallas, they played beautiful basketball throughout the playoffs, and I personally love seeing a long-range shooting team take it down for a change.  It’s noteworthy that they defied two of the patterns I identified in my “How to Win a Championship in Any Sport” article: They become only the second NBA team since 2000 with a top-3 payroll to win it all, and they’re only the second champion in 21 years without a first-team All-NBA player.

Bayes’ Theorem, Small Samples, and WTF is Up With NBA Finals Markets?

Seriously, I am dying to post about something non-NBA related, and I should have my Open-era tennis ELO ratings by surface out in the next day or so.  But last night I finally got around to checking the betting markets to see how the NBA Finals—and thus my chances of winning the Smackdown—were shaping up, and I was shocked by what I found.  Anyway, I tossed a few numbers around, and thought you all might find them interesting.  Plus, there’s a nice little object-lesson about the usefulness of small sample size information for making Bayesian inferences.  This is actually one area where I think the normal stat geek vs. public dichotomy gets turned on its head:  Most statistically-oriented people reflexively dismiss any empirical evidence without a giant data-set.  But in certain cases—particularly those with a wide range of coherent possibilities—I think the general public may even be a little too conservative about the implications of seemingly minor statistical anomalies.

Freaky Finals Odds:

First, I found that most books seem to see the series as a tossup at this point.  Here’s an example from a European sports-betting market:


Intuitively, this seemed off to me.  Dallas needs to win 1 out of the 2 remaining games in Miami.  Assuming the odds for both games are identical (admittedly, this could be a dubious assumption), here’s a plot of Dallas’s chances of winning the series relative to Miami’s expected winrate per home game:


So for the series to be a tossup, Miami needs to be about a 71% favorite per game.  Even at home in the playoffs, this is extremely high.  Depending on what dataset you use, the home team wins around 60-65% of the time in the NBA regular season and about 65%-70% of the time in the postseason.  But that latter number is a bit deceptive, since the playoffs are structured so that more games are played in the homes of the better teams: aside from the 2-3-2 Finals, any series that ends in an odd number of games gives the higher-seeded team (who is often much better) an extra game at home.  In fact, while I haven’t looked into the issue, that extra 5% could theoretically be less than the typical skill-disparity between home and away teams in the playoffs, which would actually make home court less advantageous than in the regular season.

Now, Miami has won only 73% of their home games this season, and it was against below-average competition (overall, they had one of the weakest schedules in the league).  Counting the playoffs, at this point Dallas actually has a better record than Miami (by one game), and they played an above-average schedule.  More importantly, the Mavs won 68% of their games on the road (compare to the league average of 35-40%).  Not to mention, Dallas is 5-2 against the Heat overall, and 2-1 against them at home (more on that later).

So how does the market tilt so heavily to this side?  Honestly, I have no idea. Many people are much more willing to dismiss seemingly incongruent market outcomes than I am.  While I obviously think the market can be beaten, when my analytical results diverge wildly from what the money says, my first inclination is to wonder what I’m doing wrong, as the odds of a massive market failure are probably lower than the odds that I made a mistake. But, in this case, with comparatively few variables, I don’t really get it.

It is a well-known phenomenon in sports-betting that huge games often have the juiciest (i.e., least efficient) lines.  This is because the smart money that normally keeps the market somewhat efficient can literally start to run out.  But why on earth would there be a massive, irrational rush to bet on the Heat?  I thought everyone hated them!

Fun With Meta-Analysis:

So, for amusement’s sake, let’s imagine a few different lines of reasoning (I’ll call them “scenarios”) that might lead us to a range of different conclusions about the present state of the series:

  1. Miami won at Home ~73% of the time while Dallas won on the road (a fairly stunning) 68% of the time.  If these values are taken at face value, a generic Miami Home team would be roughly 5% better than a generic Dallas road team, making Miami a 52.5% favorite in each game.
  2. The average home team in the NBA wins about 63% of the time.  Miami and Dallas seem pretty evenly matched, so Miami should win each game ~63% of the time as well.
  3. Let’s go with the very generous end of broader statistical models (discounting early-season performance, giving Miami credit for championship experience, best player, and other factors), and assume that Miami is about 5-10% better than Dallas on a neutral site.  The exact math on this is complicated (since winning is a logistic function), but, ballpark, this would translate into about a 65.5% chance at home.
  4. Markets rule!  Approximate Market Price for a Miami series win is ~50%, translating into the 71% chance mentioned above above.

Here’s a scatter-plot of the chances of Dallas winning the series based on those per-game estimates:

Ignore the red dots for now—we’ll get back to those.  The blue dots are the probability of Dallas winning at least one of the next two games (using the same binomial formula as the function above).  Now, hypothetically, let’s assume you thought each of these analyses were equally plausible, your overall probability for Dallas winning the title would simply be the average of the four scenario’s results, or right around 60%.  Note: I am NOT endorsing any of these lines of reasoning or any actual conclusions about this series here—it’s just a thought experiment.

A Little Bayesian Inference:

As I mentioned above, the Mavericks are 5-2 against the Heat this season, including 2-1 against them in Miami.  Let’s focus on the second stat: Sticking with the assumption that you found each of these 4 lines of reasoning equally plausible prior to knowing Dallas’s record in Miami, how should your newly-acquired knowledge that they were 2-1 affect your assessment?

Well, wow! 3 games is such a miniscule sample, it can’t possibly be relevant, right?  I think most people—stat geek and layperson alike—would find this statistical event pretty unremarkable.  In the abstract, they’re right: certainly you wouldn’t let such a thing invalidate a method or process built on an entire season’s worth of data. Yet, sometimes these little details can be more important than they seem.  Which brings us to perhaps the most ubiquitously useful tool discovered by man since the wheel: Bayes’ Theorem.

Bayes’ Theorem, at it’s heart, is a fairly simple conceptual tool that allows you to do probability backwards:  Garden-variety probability involves taking a number of probabilistic variables and using them to calculate the likelihood of a particular result.  But sometimes you have the result, and would like to know how it affects the probabilities of your conditions: Bayesian analysis makes this possible.

So, in this case, instead of looking at the games or series directly, we’re going to look at the odds of Dallas pulling off their 2-1 record in Miami under each of our scenarios above, and then use that information to adjust the probabilities of each.  I’ll go into the detail in a moment, but the relevant Bayesian concept is that, given a result, the new probability of each precondition will be adjusted proportionally to its prior probability of producing that result.  Looking at the red dots above (which are technically the cumulative binomial probability of Miami winning 0 or 1 out of 3 games), you should see that Dallas is far more likely to go 2-1 or better on Miami’s turf if they are an even match than if Miami is a huge favorite—over twice as likely, in fact.  Thus, we should expect that scenarios suggesting the former will become much more likely, and scenarios suggesting the latter will become much less so.

In its simplest form, Bayes’ Theorem states that the probability of A given B is equal to the probability of B given A times the prior probability of A (probability before our new information), divided by the prior probability of B:

P(A|B)= \frac{P(B|A)*P(A)} {P(B)}

Though our case looks a little different from this, it is actually a very simple example.  First, I’ll treat the belief that the four analyses are equally likely to be correct as a “discrete uniform distribution” of a single variable.  That sounds complicated, but it simply means that there are 4 separate options, one of which is actually correct, and each of which is equally likely. Thus, the odds of any given scenario are expressed exactly as above (B is the 2-1 outcome):

P(S_x)= \frac{P(B|S_x)*P(S_x)} {P(B)}

The prior probability for Sx is .25.  The prior probability of our result (the denominator) is simply the sum of the probabilities of each scenario producing that result, weighted by each scenario’s original probability.  But since these are our only options and they are all equal, that element will factor out, as follows:

P(B)= P(S_x)*(P(B|S_1)+P(B|S_2)+P(B|S_3)+P(B|S_4))

Since P(Sx) appears in both the numerator and the denominator, it cancels out, leaving our probability for each scenario as follows:

P(S_x)= \frac{P(B|S_x)} {P(B|S_1)+P(B|S_2)+P(B|S_3)+P(B|S_4)}

The calculations of P(B|Sx) are the binomial probability of Dallas winning exactly 2 out of 3 games in each case (note this is slightly different from above, so that Dallas is sufficiently punished for not winning all 3), and Excel’s binom.dist() function makes this easy.  Plugging those calculations in with everything else, we get the following adjusted probabilities for each scenario:

Note that the most dramatic changes are in our most extreme scenarios, which should make sense both mathematically and intuitively: going 2-1 is much more meaningful if you’re a big dog.

Our new weighted average is about 62%, meaning the 2-1 record improves our estimate of Dallas’s chances by 2%, making the gap between the two 4%: 62-38 (24% difference) instead of 60-40. That may not sound like much, but a few percentage points of edge aren’t that easy to come by.  For example, to a gambler, that 4% could be pretty huge: you normally need a 5% edge to beat the house (i.e., you have to win 52.5% of the time), so imagine you were the only person in the world who knew of Dallas’s miniature triumph—in this case, that info alone could get you 80% of the way to profit-land.

Making Use:

I should note that, yes, this analysis makes some massively oversimplifying assumption—in reality, there can be gradients of truths between the various scenarios, with a variety of interactions and hidden variables, etc.—but you’d probably be surprised by how similar the results are whether you do it the more complicated way or not. One of the things that makes Bayesian inference so powerful is that it often reveals trends and effects that are relatively insulated from incidental design decisions.  I.e., the results of extremely simplified models are fairly good approximations of those produced by arbitrarily more robust calculations.  Consequently, once you get used to it, you will find that you can make quick, accurate, and incredibly useful inferences and estimates in a broad range of practical contexts.  The only downside is that, once you get started on this path, it’s a bit like getting Tetrisized: you start seeing Bayesian implications everywhere you look, and you can’t turn it off.

Of course, you also have to be careful: despite the flexibility Bayesian analysis provides, using it in abstract situations—like a meta-analysis of nebulous hypotheses based on very little new information—is very tricky business, requiring good logical instincts, a fair capacity for introspection, and much practice.  And I can’t stress enough that this is a very different beast from the typical talking head that uses small samples to invalidate massive amounts of data in support of some bold, eye-catching and usually preposterous pronouncement.

Finally, while I’m not explicitly endorsing any of the actual results of the hypo I presented above, I definitely think there are real-life equivalents where even stronger conclusions can be drawn from similarly thin data.  E.g., one situation that I’ve tested both analytically and empirically is when one team pulls off a freakishly unlikely upset in the playoffs: it can significantly improve the chances that they are better than even our most accurate models (all of which have significant error margins) would indicate.

Blog Changes: More Content, New Feed Options

This blog has gotten a bit more traffic and attention in recent weeks, so I think this is a good time to make a long-planned move to shift gears a bit as far as content.  I will still be writing and posting the same types of longer articles that I always have, but I will also be posting shorter, less polished, more frequent, and generally more blog-like items, such as:

  • Random thoughts, ideas, graphs, or speculative takes relating to the various sports analytics conflicts taking place in the blogosphere or in my head
  • More preliminary results from my ongoing research and works in progress.
  • Responses to reader comments and emails.  If you’ve emailed me questions or followed the comments, you’ve probably noticed that I’ve given a lot of fairly detail replies, so I’m going to start posting some of those exchanges on the main page.
  • Brief follow-ups and updates to individual items (e.g., how I was right about Tiger Woods not being himself).
  • Links to relevant and/or interesting outside articles (though always with some comment or criticism).
  • Site news and info, such as: how things are going in the Stat Geek Smackdown (now alone in 2nd), my new policy on rotating subtitles, the silly bet I lost to Arturo, etc.
  • Occasional non-sports material.  Don’t worry, most of my abstract-thinking time is spent on three subjects: Sports analysis, day-to-day applications of Bayes’ Theorem, and Hacker-God cosmology [Claimed!].  Fortunately, these are all pretty much the same, so if I go off topic a bit it should still be somewhat relevant.

So, to reflect the change, I’ve slightly altered the blog structure and layout, including altering the style sheets and switching to 3 columns.  You should see 4 new feeds in the upper left:

  • Everything: This will still be the landing page and default feed.
  • Articles: This is for my longer (though not necessarily 30,000 word) pieces only (the most recent are also listed in the left column).
  • Non-Articles: Not that I would encourage skipping my articles, but I’m providing this as an option in case anyone wants to subscribe to the feeds separately.
  • Featured: This will be a feed of just my favorites from both sides.

For even less polished and more raw material, I’ve started tweeting more often than I used to, so I’ve expanded the Twitter feed in the right side column.  I also have some moderate to big ideas for non-post content, which I should be rolling out in the near(ish) future.

If you hate the new setup (feed structure, layout, style, me, etc.), or if you have any suggestions for improvements, per usual, please let me know in the comments or email me.

Stat Geek Smackdown Round 3: Scenarios

Update (5/22/11): Here’s an updated version of the same graphic (slightly reorganized), reflecting the latest:


As most of you know, I’m competing in ESPN’s Stat Geek Smackdown 2011. I lucked into the lead coming out of the first round, but have since dropped into a tie for 2nd.

Oklahoma City choking in the 2nd half of game six against Memphis cost me dearly: had they held on to their 10 point halftime lead to win that game, I would have remained outright leader heading into these last three series. But by losing that one and winning the next, the Thunder have put me in a tough spot: With Ilardi and I both having the Heat in 6, this round doesn’t give me a lot of opportunities to catch up. At this point, the lead—no matter how small—will be huge advantage heading into the Finals, and four of us are technically within striking distance:
Round 3 Scenarios

Stahlhut and Berri have put themselves in decent spots by being the only panelists currently in contention to choose OKC and Chicago, respectively. To regain a share of the lead, I need Dallas to win in 6 and not{Chicago win in 7}. But Dallas came through for me by winning in 6 in round one, so here’s hoping it happens again.

Google Search of the Day: Player Efficiency Rating is Useless

From the “almost too good to be true” department:


Hat tip to whoever the guy was that used that search to find my blog yesterday.  See for yourself here.

Note the irony that I’m actually saying the opposite in the quoted snippet.

UPDATE:  As of right now, Skeptical Sports Analysis is the #1 result for these searches as well (no quotation marks, and all have actually been used to find the site):