From almost my very first day here at Bloody Elbow, I’ve received requests to review the work of Fightnomics, a.k.a. Reed Kuhn. A common theme of such requests is that in an advanced stats era his work seems simplistic, and someone with the access and ability to critique it should take a closer look. That’s exactly what we’re going to do with a series called Fact-Checking Fightnomics.
In the major sports, stat nerds like Nate Silver, John Hollinger, Dean Oliver, Dave Berri, SABR people and others regularly critique and criticize each other’s work. Criticism isn’t always fun – especially when standing in front of a room full of people – but it helps improve and advance the field and keeps a lot of statistical nonsense out of the sports analytics mainstream.
Unlike the major sports, the data for MMA is closed to the public. I’d also venture to guess fewer people are interested in analyzing our beloved “human cockfighting” relative to the NBA or MLB. The net effect is a short supply of MMA stat nerds and even fewer checking others’ work. To my knowledge, Mr. Kuhn and I are the only two people currently publishing results of detailed fight data analyses at major MMA websites (I apologize if there are others). So I’ll be Fightnomics’ peer reviewer and we’ll see if there truly is meaning behind his “hidden numbers in mixed martial arts.”
Should we be skeptical?
Fightnomics plays off of the Freakonomics brand name – the New York Times best-selling book by Ph.D. economist, Steven Levitt, and writer Stephen Dubner. Both books tell stories, but Freakonomics is scientific at heart. Its stories are based off peer-reviewed research but are written in an easily-accessible style that everyday readers can digest. Fightnomics plays off the -nomics suffix but isn’t scientific at heart and isn’t based on any underlying MMA research. It appears to be a self-published book of statistical averages constructed by the author.
Averages can be valuable and dangerous at the same time. They’re descriptive – i.e., they tell us what’s happening (on average) – which makes them useful for noting differences between weight classes and the time trends of knockouts, submissions and decisions. But averages are very bad at answering tough questions or explaining complex situations, and MMA is a pretty complex sport.
Despite the former name of the ESPN show, it’s very easy for numbers to lie. In fact, lying averages are part of the reason we have the famous quote, “There are lies, damned lies and statistics.” So data analytics, and the analysts themselves (myself included), need always be approached with a healthy dose of skepticism.
Keep this in mind over the coming weeks and months and let’s jump into the first review. Today we’re going back in time eight months to revisit what many have called the worst decision of 2014 – the UFC Fight Night Albuquerque scrap between Ross Pearson and Diego Sanchez.
Pearson vs. Sanchez
Fightnomics’ analysis of the bout can be found here. He starts by examining significant strikes landed in each round, showing Pearson’s advantages, and then moves into significant strikes attempted as “a more likely view of what the judges actually saw.” Perhaps Sanchez’s forward movement, even if swinging and missing, helped him look like the aggressor?
This leads to the following questions, “So how outrageous is it that Sanchez got the decision? How often does the nod go the other way when two fighters end a fight this way?” These questions are addressed by examining how decision win rates vary with bout-level differentials in significant strikes landed. Here’s my version of Fightnomics’ chart:
Note: I make some adjustments to the data I get from FightMetric and Mr. Kuhn may do the same, so our charts may not be perfectly identical.
To Mr. Kuhn this shows, “The fighter who lands more Significant Strikes generally wins, and the more he lands more [sic], the more he wins.” The problem, of course, is that judges don’t score entire bouts, they score rounds, and they care about more than just significant strikes. When making a point about MMA judges while showing a chart of bout-level significant strikes and win rates, the only thing “hidden” is common sense.
So what does Fightnomics’ chart show? Does the upward sloping line show that landing additional significant strikes improves your chances of winning a decision? If so, then we also know that landing additional power strikes helps you win:
And getting additional takedowns helps you win:
And having more guard control time helps you win:
And landing additional power leg strikes at distance helps you win:
Wait, what’s going on here? Landing extra power leg strikes should improve your chances for getting the decision nod, but the chart shows a slightly negative trend line. To make matters worse, I can influence the trend by the amount of data I show you. Here’s the chart for jabs in the clinch:
It’s looking pretty flat. Now here’s the same chart again:
Not so flat anymore, huh?
Sweeps get a fighter out of a bad ground situation and into a dominant top position, so we’d expect the act of a sweep to be a positive performance element that displays control and improves one’s chances of winning. Here’s its Fightnomics-style chart:
Sweep your opponent and your odds of winning plummet, so sayeth the chart! The foundation of this entire exercise is flawed so the results – whether upward sloping, downward sloping or flat – are unreliable. In other words, Fightnomics’ win rates aren’t relevant because his data is inappropriate for the question he’s trying to answer.
A few years ago, a group of professors performed an analysis of MMA judges while also using bout-level performance statistics. This is what they found regarding in-cage performance and the odds of getting a decision nod from judges:
Do these results make sense to you? Does busting up the opponent’s face, getting a tight submission hold or landing a takedown really have no effect on a fighter’s odds of taking the decision? Does landing a jab or power shot to the head on the ground reduce one’s odds of winning? The analytics seem to say so, so it must be true, right? Of course not. The results don’t make sense because the underlying bout-level model is flawed. These are the types of analyses that can give sports analytics a bad reputation and make Charles Barkley go crazy on live TV – to be fair, he seems to go off on all analytics.
When we use more appropriate data and look at judges’ round-by-round scoring decisions, the results on how they value striking and grappling make much more sense. Does this seem more reasonable?
Almost every element of effective in-cage performance helps fighters win a decision (and we should note the pure act of a sweep is a high point-scorer). Non-powerful strikes (jabs) to the body and standups tend to be statistically neutral to the judges while going for takedowns and missing generally costs fighters points. For more on these results, see here, here and here.
The point is that MMA judges can be analyzed so much better than with line graphs using bout-level significant strike data. It’s a pretty good bet that Marcos Rosales, Jeff Collins and Chris Tellez didn’t make a single scoring decision based on the overall significant strike differential of 18. Judges score rounds, and in rounds 1, 2 and 3 Pearson had significant strike differentials of 5, 4 and 9, respectively. But even that ignores the complexities of an MMA fight – the details of all strikes thrown and other in-cage action.
Instead of trying to make inferences from misapplied data, let’s use RoboJudge and a more detailed analysis of round-by-round MMA judging decisions. RoboJudge uses 24 pieces of performance information to score each round and you can catch his take on all 248 UFC decisions in 2014 right here at Bloody Elbow tomorrow (shameless plug). The underlying model and research paper are currently in the second round of review at the Journal of Sports Economics, but we’re using a slightly modified version.
Fightnomics uses his bout-level line graph to find that Pearson’s significant strike differential of 18 “correlates with a 75% win rate.” By examining judging decisions at the round-level and looking at more than just significant strikes, I find that Pearson’s odds were actually more like 95.3 percent. Here’s the RoboJudge report for Pearson/Sanchez:
Did Diego Sanchez’s activity (swings and misses) affect this fight? It’s extremely unlikely according to the data. Sanchez’s misses relative to Pearson’s had less than a one percent effect on the 1st round and roughly a six percent effect on the 2nd and three percent effect on the 3rd. If we completely turn off how judges’ tend to value missed strikes, Pearson’s odds of winning the fight increase to 97.5 percent. At only a 2.2 percent difference, the effect of Sanchez’s misses relative to Pearson’s misses was minimal at best so it seems that activity isn’t a good explanatory variable for this particular fight outcome.
A final Fightnomics quote worth mentioning:
“Looking back at the round stats from this fight, it’s clear that judge Jeff Collins had a bias towards Sanchez for whatever reason.”
I’ve written academic articles on referee and judge bias (and been called the worst person in sports for my trouble) and no one draws bias conclusions from a sample of three rounds. Perhaps Jeff Collins had bad vantage points and wasn’t looking at his monitor. Perhaps he heard sounds that made him think Sanchez was winning or thought many of Sanchez’s missed strikes landed. He may have been sneaking peaks at Chrissy Blair. None of this is bias. Perhaps Mr. Kuhn was being loose with his words, but the data only tells us Jeff Collins’ decisions that night were very unusual. In no way is it clear he was biased towards Sanchez or away from Pearson.
I don’t mean to be hard on “The Fight Scientist.” His book and Internet work are detailed collections of averages that look like they took a lot of work and were a huge pain in the neck to put together. But those who do sports analytics for public consumption have to be careful. Answering difficult questions is…difficult, and making graphs that purport to analyze MMA judging without actually analyzing the decisions of judges is just wrong. Richard Feynman – the famous physicist of Challenger shuttle commission fame – warned of this behavior many years ago:
“We get experts on everything that sound like they’re sort of scientific…but they’ll sit there on the typewriter and make-up all this stuff as if it’s science and then become an expert on foods, organic foods and so on…I have the advantage of having found out how hard it is to get to really know something, how careful you have to be about checking the experiments, how easy it is to make mistakes and fool yourself. I know what it means to know something.”
Here’s the direct YouTube link.
We looked at judging today partially as a segue into tomorrow’s RoboJudge article on 2014 UFC decisions. Come back over the coming weeks and months as we fact check more Fightnomics topics.
Paul is Bloody Elbow’s analytics writer and an economics professor at Pepperdine University. All mistakes are his own and they’ve been known to happen sometimes. Follow him @MMAanalytics. Fight data provided by FightMetric.
About the author