Graduate student Isabella Antonuccio-Amato will be defending her thesis titled “Large Language Models for Generating and Evaluating Education Finance Reports.”
Isabella Antonuccio-Amato thesis defense
- Date: Tuesday, April 28
- Time: Noon-1 p.m.
- Location: BARC 1122
- Current major: M.S. of data science
- Thesis committee chair: Dr. Jim Dewey
- Committee members: Dr. Abdulaziz Alhamadani, Dr. Kathleen Hardesty, and Dr. Susan LeFrancois.
Abstract
This study investigates the use of large language models (LLMs) for both generating and evaluating structured education finance reports. A multi-stage LLM pipeline was developed to produce state-level FY2025 reports modeled after State of the States (SOS) publications, incorporating source retrieval, drafting, validating, and iterative revision. To enable direct comparison with human-authored reports, a set of FY2024 reports was also generated using a single-prompt approach.
LLM-generated reports and human-authored SOS reports were compared by identifying and counting individual claims within each document. For each pair of reports, evaluators recorded the number of total, shared, and unique claims, and assessed whether each claim was factually correct. Evaluations were performed using multiple LLMs (GPT, Gemini, Mistral), with report order reversed to test for consistency and positional bias. A subset of 11 states was also evaluated by human annotators to provide a baseline for comparison.
Results indicate that LLM-based evaluations are sensitive to document ordering, with the first-presented report consistently receiving higher claim counts across models, demonstrating positional bias. Compared to human annotators, LLMs consistently identify fewer total, unique, and factual claims, while also tending to overestimate overlap between documents.
These findings highlight limitations in using LLMs as standalone evaluators and emphasize the need for careful evaluation design. This study contributes a structured framework for evaluating LLM performance in applied research settings, and highlights the importance of human oversight when using LLMs for scholarly work.
For more information, please contact Isabella Antonuccio-Amato.