Date of Completion
Spring 5-1-2025
Thesis Advisor(s)
Dr. Dongjin Song
Honors Major
Computer Science and Engineering
Disciplines
Applied Statistics | Computer Sciences | Data Science
Abstract
We present the first multimodal, multitask benchmark for NCAA basketball, synthesizing structured statistical features with large language model (LLM)-generated game summaries across 19,739 games spanning four NCAA Division I seasons (2021--2025). We evaluate three model families---XGBoost, deep neural networks, and Transformers---under tabular-only and early-fusion settings to measure the impact of LLM-derived textual embeddings. To assess practical utility, we simulate fixed-stake and Kelly criterion-based betting strategies using historical bookmaker odds, analyzing both profitability and downside risk via Monte Carlo simulation. Our results show that XGBoost with early-fusion achieves the highest return on investment and the lowest risk of loss. This work is, to our knowledge, the first to integrate LLM-generated narrative data with structured inputs for calibrated forecasting in sports, offering a reproducible benchmark for multimodal decision-making under uncertainty.
Recommended Citation
Barnett, Brendan, "Multimodal Benchmarking for NCAA Basketball" (2025). Honors Scholar Theses. 1091.
https://digitalcommons.lib.uconn.edu/srhonors_theses/1091