Fight Prior MMA Data Science and Statistical Analysis

The not-so-small-world network of MMA fighters

There are over 140,000 amateur and professional mixed martial artists and the matchups between these fighters are far from random. It seems pretty obvious that Holly Holm’s next fight won’t be against Bigfoot Silva, but it is less clear how many reasonable matchups are out there for most fighters.

One way we can think about this question is that fighters will tend to compete against other fighters who are relatively similar to them. An amateur fighter may compete against fighters in his town and in neighboring towns. His opponents may do the same. While this fighter is unlikely to compete against fighters who live two towns over, he can still be said to be relatively closely connected to these fighters because they are only two jumps aways. At a broader scale, MMA fighters who are very different may be many jumps apart, seperated by large geographical barriers or other obstructions.

Click here to continue reading...

Building a large database of MMA fight results III: summarizing the demographics of 140,000 MMA fighters

The essential units of analysis in MMA are the sport’s fighters and the bouts between them. In my previous post, I discussed how to standardize match data so that the major categories of finishes could be easily visualized.

In this entry, I will discuss how to clean-up fighter-level data. The main goal of this post will be to determine the factors that affect fighter matchups. For some fighters, our data contains useful demographics, while for other fighters, this information is missing. One goal of this post will be to infer missing demographic data based on fighters’ previous bouts. For example, if we don’t know where in the world a fighter lives, we can probably determine their location with good accuracy if all their fights have been against people from the same region.

Click here to continue reading...

Building a large database of MMA fight results II: quantitatively summarizing over 240,000 MMA fights

In my last post, I discussed how it was possible to extract match-level summaries of more than 240,000 unique MMA bouts between 143,602 fighters. In this entry, I will discuss how data from individual webpages can be combined into a single table with comparable entries. I will then show some high-level summaries of how fights are finished.

Click here to continue reading...

Building a large database of MMA fight results I: scraping with rvest

While MMA is an exciting sport that offers many interesting data analysis opportunities, there is no existing dataset that has aggregated the results of the more than 400,000 fights that have occured to date. The challenge is not that the information is not available, rather that the information is distributed across thousands of webpages. If we are looking for individual fighters or MMA events, we can easily find a large amount of information about fighters and their fight histories.

Click here to continue reading...