We use cookies to personalize content, to provide social media features and to analyze our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. For information on how to change your cookie settings, please see our Privacy policy. Otherwise, if you agree to our use of cookies, please continue to use our website.

What I Learned from Building a College Basketball Bracket Prediction Model Last Year

This year, Merkle is running its first Bracket-lytics Challenge! We’re excited to launch this competition to find some of the best and brightest stars in the analytics field, much like the annual men’s college basketball tournament does for the up-and- coming talent in college basketball.

Last year, I used many of the analytical tools and skills I developed while working with clients at Merkle, and I combined them with my personal passion for basketball, to build a prediction model for the men’s college basketball tournament. I can honestly say that building a model to predict the outcome for the tournament and watching the games play out with those key ingredients in mind, was one of the most fulfilling experiences in my young data analyst career.

Though the variables are different, the thought process for ranking conversion rates for customer prospects in marketing and the win rate of a basketball team are not all that different. When building my model last year, I formed certain assumptions from watching the games. I tested box-score data and game results to see what factors correlated with winning, and if they were as I had initially thought. When looking back at last year’s tournament, one of the reasons North Carolina won was that it led the country in offensive rebounding percentage by a wide margin. In a basketball sense, that means the team rebounded many of its misses, giving it more shots than the other team. This is also something that my model found to be significant.

While you’re working on your model and filling out your bracket, there are going to be many decisions where using discretion is very important, just like decisions made by marketing analysts every day. For example, three-point shooting is known to be a high-variance statistic in the basketball world. If a team relies on the three ball as its main source of offense, it’s known as a “low-floor-high-ceiling team.” This means there’s a higher chance of being on either side of a blowout when compared to a team that may rely on getting to the rim and getting fouled, which is considered a more low-variance strategy. In terms of the competition, this could lead to an interesting situation where a team is predicated by your model to be a strong favorite in the first round due to its strong three-point shooting, however since it’s a variable strategy, so the team may not be a good prediction to win multiple games. This concept of leverage and variability is one of the most important things to look at when building your model and making selections.

Along with your submission to the Bracket-lytics Challenge, you should submit an abstract in which you describe your model and the decisions you made with it. This includes what factors go into the model, descriptions of weights if they’re used, and your methodology for the picks in general.

Good luck! I’m looking forward to seeing your thought process!

Join the Discussion