Category: Dates and Deadlines
April 20, 2026

April 22 – Joyce Champie’s thesis defense

This notice appeared in the Weekly Phoenix between April 20, 2026 and April 22, 2026.

Graduate student Joyce Champie will be defending her thesis titled “Evaluating False Assurance and Explainability in LLMs-based Android Vulnerability Detection.”

Joyce Champie thesis defense

  • Date: Wednesday, April 22
  • Time: 9-10 a.m.
  • Location: BARC 1142
  • Current major: M.S. of computer science
  • Thesis committee chair: Dr. Karim Elish
  • Committee members: Dr. Ayesha Dina and Dr. Abdulaziz Alhamadani.

Abstract

The rapid proliferation of Android applications has intensified demand for intelligent, automated vulnerability detection systems. While large language models (LLMs) have emerged as a promising approach for code analysis and security evaluation, existing work evaluates their performance using traditional classification metrics such as accuracy, precision, recall, and F1-score, which fail to capture the reliability and trustworthiness of model outputs in security-critical contexts.

This thesis addresses this gap through two complementary contributions. First, we present a systematic literature review of LLM-based malware detection research spanning 26 peer-reviewed studies, proposing a four-category taxonomy covering static embeddings, prompt-driven reasoning, reasoning-enhanced architectures, and security-tuned models, alongside a multi-dimensional comparative analysis across effectiveness, robustness, efficiency, explainability, and privacy dimensions. Second, we conduct a comprehensive empirical evaluation of six LLMs across two Android vulnerability benchmarks and seven vulnerability categories using structured zero-shot prompts. Each model produces a security verdict, vulnerability type, technical justification, and confidence score. We introduce the false assurance rate (FAR), a novel metric capturing high-confidence incorrect secure predictions, and an explainability framework assessing correctness, localization, and faithfulness of generated explanations.

Our results demonstrate that models with comparable detection performance differ substantially in false assurance risk, and that technically plausible explanations frequently accompany incorrect predictions. These findings establish that trustworthy LLM-based security analysis requires simultaneous evaluation across detection capability, confidence calibration, and explanation quality.

For more information, Please contact Joyce Champie.