Date of Completion

Spring 5-1-2025

Thesis Advisor(s)

Dr. Dongjin Song

Honors Major

Computer Science and Engineering

Disciplines

Applied Statistics | Computer Sciences | Data Science

Abstract

We present the first multimodal, multitask benchmark for NCAA basketball, synthesizing structured statistical features with large language model (LLM)-generated game summaries across 19,739 games spanning four NCAA Division I seasons (2021--2025). We evaluate three model families---XGBoost, deep neural networks, and Transformers---under tabular-only and early-fusion settings to measure the impact of LLM-derived textual embeddings. To assess practical utility, we simulate fixed-stake and Kelly criterion-based betting strategies using historical bookmaker odds, analyzing both profitability and downside risk via Monte Carlo simulation. Our results show that XGBoost with early-fusion achieves the highest return on investment and the lowest risk of loss. This work is, to our knowledge, the first to integrate LLM-generated narrative data with structured inputs for calibrated forecasting in sports, offering a reproducible benchmark for multimodal decision-making under uncertainty.

Recommended Citation

Barnett, Brendan, "Multimodal Benchmarking for NCAA Basketball" (2025). Honors Scholar Theses. 1091.
https://digitalcommons.lib.uconn.edu/srhonors_theses/1091

Download

Included in

Applied Statistics Commons, Computer Sciences Commons, Data Science Commons

COinS

Honors Scholar Theses

Multimodal Benchmarking for NCAA Basketball

Date of Completion

Thesis Advisor(s)

Honors Major

Disciplines

Abstract

Recommended Citation

Included in

Search

Links

Browse

Author Corner

Homepage

Honors Scholar Theses

Multimodal Benchmarking for NCAA Basketball

Authors

Date of Completion

Thesis Advisor(s)

Honors Major

Disciplines

Abstract

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner

Homepage