LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20651421

LLaMA 3.2 Bug Detection Recall Under Context Truncation Versus Sliding Windows Across Code Complexity

- Summary

Abstract

Large language models (LLMs) have demonstrated strong performance on a wide range of software engineering tasks, including code generation and analysis. However, most prior work relies on cloud-based models or specialized hardware, limiting practical applicability in privacy-sensitive or resource-constrained environments. In this paper, we present a systematic empirical evaluation of two locally deployed LLMs, LLaMA 3.2 and Mistral, for real-world Python bug detection using the BugsInPy benchmark. We evaluate 349 bugs across 17 projects using a zero-shot prompting approach at the function leveResearch goal: How does context window truncation affect LLaMA 3.2's bug detection recall on BugsInPy compared to sliding window strategies across varying code complexity levels?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Found an issue? Give us feedback