ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

In recent years, Large Language Models for code (LLMc) have transformed the landscape of software engineering (SE), demonstrating significant efficacy in tasks such as code completion, summarization, review, tracing, translation, test case generation, clone detection, and bug fixing. Notably, GitHub Copilot and Google’s CodeBot exemplify how LLMc contributes to substantial time and effort savings in software development. However, the widespread application of these models has raised critical concerns regarding their trustworthiness. The lack of well-defined trust metrics beyond mere accuracy poses significant risks, including potential security vulnerabilities and compromised data integrity. This dissertation proposes a solution to this pressing need by developing a comprehensive framework to evaluate LLMc trustworthiness. We aim to establish contextualized definitions of trust, distrust, and trustworthiness specific to LLMc, identify key influencing factors, and create a standardized evaluation framework encompassing both model-based attributes and human-centric considerations. Through rigorous empirical studies and user evaluations, we will validate the framework’s effectiveness and provide insights for targeted improvements in LLMc development. This dissertation seeks to enhance the reliability and transparency of LLMc, fostering their responsible integration into software engineering practices and paving the way for more trustworthy AI-assisted code generation.