In our previous article, we laid the groundwork for predicting NCAA Men’s Basketball Tournament winners by preparing our dataset, engineering key features, and exploring the relationship between tournament seeding, Elo ratings, and historical outcomes. Now it’s time to put that data to the test. In this article, we will build predictive models in Dataiku, leveraging features like seeding, Elo, and Bart Torvik’s T-Ranking to see just how much of an edge machine learning can give us in filling out the perfect bracket.
The Elite 8: Building a Predictive Model
With our data prepped and ready, it’s time to take the next step—building a predictive model to pick the winners of the 2025 NCAA Men’s Basketball Tournament matchups. Specifically, we’re going to make use of the “diff” features we generated in the previous article:
- SEED_DIFF: the difference in seeding between two teams in a matchup.
- ELO_DIFF: the difference in Elo rankings between two teams in a matchup.
- B_DIFF: the difference in Bart Torvik rankings between two teams in a matchup.
- t1_WINNER: one-hot encoded (0 or 1) feature indicating whether “team 1” was the game winner (using this as the target of our ML model training).
Utilizing these features, we’re going to train models to make game predictions and then evaluate the success of those models against the “baseline” model of simply choosing each game winner based on the team’s seeding (picking the team with the better seed) to see how well these models can perform.
Seeds of Success: Establishing the Baseline
In order to establish a baseline of success, we will start by evaluating how many games we’d pick correctly from 2019-2024 if we always picked the team with the better seed. As in the feature creation steps of the previous article, we will accomplish this by creating a new Dataiku formula in a Prepare step.
This formula will output a 1 if the better seed won a given game or a 0 if they lost. By evaluating the average of this column on our 2019-2024 data, we can see that anyone using this strategy could get 68.15% of games correct - not bad for such a simple strategy! This aligns with our previous analysis, which indicated that tournament seeding really does correlate strongly with tournament wins.
Tournament Training and Testing
In the previous article, we prepared datasets with tournament game results from 2008-2024. This set gives us a substantial number of games to build and evaluate a model off of. As you’ll see in the available Github Dataiku project download, we set up our modeling effort to train on the games from 2008-2018 and test on the holdout set from 2019-2024 (2020 tournament was omitted). Additional details on training/testing splits and other data sampling in Dataiku can be found in our article here.
Performing this split gives us 690 games to train our model on and 314 to test - a little higher than the typical 80/20 split, but in the right ball-court for this analysis.
Building an Elo Model
As a first attempt, we’ll look at building a model using only the ELO_DIFF and SEED_DIFF features we prepared in the last article. To reiterate, these features quantify the difference between the two teams’ Elo rating and seeds in the tournament. We will use accuracy as the metric of evaluating our models - which will directly measure how often the model correctly picks winners in this balanced binary classification task. To keep things simple and explainable, we’ll build our prediction models using a Logistic Regression algorithm. It is certainly possible that other algorithms may produce better results, but given the size of our dataset, we'll go with the assumption that simple is better.
Kicking off this modeling effort in Dataiku using only the ELO_DIFF and SEED_DIFF features quickly results in a reported 67.8% accuracy, which unfortunately, is worse than the 68.15% baseline that was obtained by picking the better (lower numbered) seed in every game. We know from prior analysis that Elo rank has a very positive correlation with game results, but the combination (although intriguing) does not provide any tangible lift in a simple model.
Calling In the B Team
Since we have obtained disappointing results in our attempt to make better game picks using teams’ Elo ranking and seeds, let’s do what any good Machine Learning engineer would do: iterate with some of our additional features. In the previous article, we briefly mentioned another team metric, the “B Ranking,” which is positively correlated to the Elo we analyzed in more detail. The Bart Torvik ranking (also known as B Ranking or T-Rank), is an advanced college basketball analytics system created by an analyst named Bart Torvik. This ranking provides team efficiency metrics using statistical modeling to evaluate team performance based on offensive and defensive efficiency, strength of schedule, and other key factors.
In this iteration, we’ll try out the B Ranking feature (B_DIFF) in conjunction with the SEED_DIFF used previously, applying the same simple Logistic Regression algorithm and measuring the results by the same accuracy metric. The confusion matrix results of our modelling process applied to our Test set is shown below, this time bumping us all the way up to a 70.7% accuracy, which is a full 2% better than our baseline.
It’s exciting to see that there is a significant lift provided by the B Ranking in addition to the seeding to predict game results. To get a better understanding of these results, we’re going to deploy this model to the flow in Dataiku and use it to predict the winners of tournament games since 2019 (our test set), comparing them to our “baseline” model of picking the team with the best seed in any match. The number of correct win predictions in each year’s tournament are shown below. The 2019 year’s results are especially impressive, with seven more wins produced by our model (dark blue) than the baseline model (light blue).
These are very positive results, but certainly no model is perfect. In the next section, we’ll see if we can further improve our model using less conventional methods to help us pick the games that the model is least certain about, aka “coin flip” games.
Final 4: Predicting Close Games With GenAI
So far, we’ve used traditional machine learning techniques and key statistical metrics to analyze and predict NCAA basketball outcomes. But what happens when our models flag a game as too close to call? That’s where we’re adding a little creativity—by calling in Generative AI for a completely different kind of tiebreaker.
Let’s try incorporating an LLM model (OpenAI GPT-4o-mini) to decide these razor-thin matchups based on an unconventional yet undeniably entertaining criterion: a hypothetical ping-pong showdown between the teams’ mascots. It may not be the most scientific approach, but when the data says it’s a toss-up, we might as well let AI have a little fun while making the call.
In the previous section, we utilized a linear regression model to predict game outcomes. Although this prediction results in a binary “win” or “lose” (0 or 1), at a more granular level, it actually is in the form of a “probability” - the odds that “Team 1” will win a given game. So for any given game, our model may say for example, that the odds of “Team 1” winning is 0.40—which would mean that there is a 40% chance that Team 1 will win and a 60% chance that Team 2 will win. We will call in GenAI for help with the games that our regression model deems to be very close battles (those with 45%-55% Team 1 win probability)—the “coin flip” matchups.
To accomplish the LLM integration, we’re going to make use of an LLM prompt recipe in our Dataiku flow. The prompt recipe will allow us to define an LLM prompt and inject records from each of our games to a call to OpenAI’s API to generate predictions.
In this recipe, we will write a prompt as shown below, asking the LLM to predict the game winner based on the “projected” mascot’s performance in a ping-pong match.
You’ll find the technical details of how we interpreted the JSON results of this recipe in the Dataiku flow from the project export, but as you’ll see below, the per-game results are entertaining.
It is easy to prove that these predicted results are entertaining, but let’s evaluate how accurate these are for predicting the close game results. Looking at the 19 games since 2019 that our model considers a “coin flip”, the LLM’s prediction based on the ping-pong skills of the team’s mascot performs admirably with a 63.2% accuracy! With this limited sample size, the ping-pong approach certainly appears to be no less accurate than a coin flip.
When combining our two different models (linear regression for the majority and GenAI-pong for the close games) we obtain an overall historical accuracy of 71.3% (224 games) correctly picked since 2019 as compared to 68.2% (214 games) by picking by seed alone. If we define an upset as any game where the teams’ seeds differ by more than 2, our model correctly picked 11 tournament upsets, including highlights:
- The #8 seed Arkansas upsetting #1 seed Kansas in the 2nd round of 2023 . This was determined by regression to be a coin flip game and chosen because “Razorbacks are fast and agile, making them great at ping-pong!”
- #8 seed North Carolina upsetting #2 seed Duke in the final four of 2022. This was determined by regression to be a coin flip game and chosen because “The Tar Heel is known for its tenacity and skill, making it a formidable opponent in ping-pong!”
While the results were fun and impressive, the additional .6% increase in accuracy of the GenAI picks certainly need to be taken with a grain of salt. We’d need to evaluate a bigger set of ping-pong mascot data to truly validate any potential long-term correlation, but in the meantime, let’s make a plan to validate our model further in the upcoming 2025 tournament.
Championship Monday: Predicting the 2025 Tournament
As we’ve seen, there is a tangible lift that can be obtained by creating a machine learning model to use team tournament seed and Bart Torvik ranking to make picks. In addition, tight game tournament picks are a fun and interesting candidate to use alongside GenAI as our “coin flip” predictor. We’ve seen good results from this approach in recent tournaments. Let’s see how well we can predict the tournament in 2025.
Will we predict the winner? Follow us on LinkedIn as we post predictions with explanations and reasoning for the 1st round of the tournament to inspire your bracket picks. As the tournament progresses, we’ll provide accuracy metric updates as well as additional game predictions. Join us for an exciting and data-driven tournament season!
