Graduate student Sunim Acharya will be defending his thesis titled “HERA: Hierarchical Ensemble Reasoning Agent.”
Sunim Acharya thesis defense
- Date: Wednesday, April 22
- Time: 10-11 a.m.
- Location: BARC 1122
- Current major: M.S. of computer science
- Thesis committee chair: Dr. Sathish Chandra Akula
- Committee members: Dr. Luis Jaimes, Dr. Parisa Hajibabaee, Dr. Sanjeeta N. Ghimire, and Dr. Asai Asaithambi.
Abstract
Autonomous AI agents are increasingly deployed to write, debug, and refactor production software. However, large language models frequently introduce subtle bugs or security vulnerabilities, and current single-agent frameworks lack formal, specialized supervision for trustworthy autonomous engineering.
In this thesis, we present HERA (Hierarchical Ensemble Reasoning Agent), a supervision architecture designed around functional, multi-agent specialization. Specifically, we design a three-layer system (orchestration, execution, and ensemble) where specialized security and performance supervisors vote via confidence-weighted consensus to reduce correlated oversight failures during code generation.
To establish a reproducible baseline under strict hardware constraints, we evaluate the execution agent on SWE-bench Verified (Mini) and resolve 16.0% of tasks (8/50, best-of-3 temperature sweep) using an open-weights 32-billion parameter model on a single L4 GPU. Further evaluations on LiveCodeBench (24.5%, 98/400 tasks) and CruxEval (71.0%, 568/800 tasks) characterize the system’s broader generative and execution trace-reasoning capabilities.
These results provide preliminary evidence that strict engineering guardrails, including edit verification and false-submit prevention, can improve reliability under constrained settings. Component-level validation also suggests that ensemble supervision and world-model integration are promising directions, but full end-to-end validation of the complete stack is beyond the current hardware scope. The dominant measured failure mode is incorrect fix generation (producing a semantically wrong patch), not file navigation. The open-source HERA implementation offers a transparent baseline that future work can use to quantify the incremental value of specialized ensemble oversight.
For more information, please contact Sunim Acharya.