I was just watching the Phillies v. Mets game on TV, and the announcers were discussing this Outside the Lines study about MLB umpires, which found that 1 in 5 “close” calls were missed over their 184 game sample. Interesting, right?
So I opened up my browser to find the details, and before even getting to ESPN, I came across this criticism of the ESPN story by Nate Silver of FiveThirtyEight, which knocks his sometimes employer for framing the story on “close calls,” which he sees as an arbitrary term, rather than something more objective like “calls per game.” Nate is an excellent quantitative analyst, and I love when he ventures from the murky world of politics and polling to write about sports. But, while the ESPN study is far from perfect, I think his criticism here is somewhat off-base ill-conceived.
The main problem I have with Nate’s analysis is that the study’s definition of “close call” is not as “completely arbitrary” as Nate suggests. Conversely, Nate’s suggested alternative metric – blown calls per game – is much more arbitrary than he seems to think.
First, in the main text of the ESPN.com article, the authors clearly state that the standard for “close” that they use is: “close enough to require replay review to determine whether an umpire had made the right call.” Then in the 2nd sidebar, again, they explicitly define “close calls” as “those for which instant replay was necessary to make a determination.” That may sound somewhat arbitrary in the abstract, but let’s think for a moment about the context of this story: Given the number of high-profile blown calls this season, there are two questions on everyone’s mind: “Are these umps blind?” and “Should baseball have more instant replay?” Indeed, this article mentions “replay” 24 times. So let me be explicit where ESPN is implicit: This study is about instant replay. They are trying to assess how many calls per game could use instant replay (their estimate: 1.3), and how many of those reviews would lead to calls being overturned (their estimate: 20%).
Second, what’s with a quantitative (sometimes) sports analyst suddenly being enamored with per-game rather than rate-based stats? Sure, one blown call every 4 games sounds low, but without some kind of assessment of how many blown call opportunities there are, how would we know? In his post, Nate mentions that NBA insiders tell him that there were “15 or 20 ‘questionable’ calls” per game in their sport. Assuming ‘questionable’ means ‘incorrect,’ does that mean NBA referees are 60 to 80 times worse than MLB umpires? Certainly not. NBA refs may or may not be terrible, but they have to make double or even triple digit difficult calls every night. If you used replay to assess every close call in an NBA game, it would never end. Absent some massive longitudinal study comparing how often officials miss particular types of calls from year to year or era to era, there is going to be a subjective component when evaluating officiating. Measuring by performance in “close” situations is about as good a method as any.
Which is not to say that the ESPN metric couldn’t be improved: I would certainly like to see their guidelines for figuring out whether a call is review-worthy or not. In a perfect world, they might even break down the sets of calls by various proposals for replay implementation. As a journalistic matter, maybe they should have spent more time discussing their finding that only 1.3 calls per game are “close,” as that seems like an important story in its own right. On balance, however, when it comes to the two main issues that this study pertains to (the potential impact of further instant replay, and the relative quality of baseball officiating), I think ESPN’s analysis is far more probative than Nate’s.
The definition of close calls as “those for which instant replay was necessary to make a determination” is still arbitrary because it is vague. How is this necessity determined? The umpire who made the call would tell you replay wasn’t necessary, and since it turns out he usually gets it right, there is indeed a case to be made. I suspect that what this really means is that a group of guys were sitting around a TV watching video. If one of them said “Hey, I want to see that one again” then it go classified as a close call. But a different group of guys might come up with a different set of “close calls”.
Vague does not mean arbitrary. And though the standard is clearly subjective, I’m not even sure it is vague.
What about calls that aren’t close and do not require replay, but umpires get wrong anyway? They are left out of the study entirely, yet probably are the worst offenders of the bunch. I remember a play recently argued on PTI where a grounder bounced fair in the infield by the 3rd base foul line, then bounced fair in the outfield but the umpire called it foul. He claimed it went out of bounds before 3rd, then “swerved” back into fair territory behind 3rd base. I saw it originally, thought it was fair, and have watched replay multiple times and have yet to see any swerve. To me, that play doesn’t require replay to make a determination, yet was a blown call.
Another questionable example are home runs that hit the stands and bounce back into play, which frequently require reviews for officials to make the correct call even though it’s obvious if you’re watching. If a batter rounding 1st can see it’s a home run but the official isn’t sure, is replay necessary for a determination?
Then there’s also the problem with why a play required instant replay to determine a call. Was it simply because it was a close play such as a bang-bang play at 1st, or was it that the original video feed had a bad angle or didn’t show the important part of the play?
These are all good questions that could certainly be used to design a better study (or may go to details of the methodology that just weren’t in the article).