Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We? This repository contains the Artifact of the paper "Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?". 1. Introduction Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task. We advocate a fast, general, and easily extensible approach called LLM4SA that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warnings reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, LLM4SA can remove plenty of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that LLM4SA is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13) and a recall rate (94.64) for a total of 9,547 bug warnings. This research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software quality. Catalog Description src/code-extractor Our implementation of a code extractor (source code in Python) src/bug-inspector Our implementation of the LLMs-powered inspector (source code in Python), as well as the zero-shot prompts and few-shot prompts scripts/ Experimental Scripts conf/ Configuration file used to set the API and proxy for calling LLMs Inspection/ Our LLM reviews the experimental results of the static analysis report (raw data) report/ Bug reports generated by static analysis tools (raw data) project/ Provided detailed information on our experimental dataset 2. Installation and Deployment 2.1. Requirements - Hardware Requirements: Workstations/PC with multi-core processors - Operating System: >= Ubuntu 18.04 LTS - run `sudo apt -y install cscope codequery libjansson-dev libjansson4 wget autoconf automake pkg-config cmake unzip tzdata libncurses5` - run `pip3 install xmltodict openai` - run `./scripts/install/install_ctags-6.0.0.sh` to install version 20230806 2.2. Installing Static Analysis Tools (Cppcheck, CSA, Infer) We experimentally select 3 state-of-the-art static analysis tools, including Cppcheck (v2.9), CSA (llvm-12.0.1) and Facebook Infer (v1.1.0). They are all open-source static analysis tools that can detect multiple types of bugs. - Installing Cppcheck ```sh # Install cppcheck $ ./scripts/install/install_cppcheck-2.9.sh ``` - Installing CSA ```sh $ # Configure Clang + LLVM $ ./scripts/requires/get_llvm.sh # Test Clang $ source ./scripts/requires/init_env.sh $ clang --version ``` - Installing Infer ```sh # Install Infer $ ./scripts/install/install_infer-v1.1.0.sh ``` ### 2.3. Configuration of ChatGPT Currently, querying of LLMs is accomplished through an API call to ChatGPT. This design choice facilitates seamless integration with other pre-trained LLM models. The template for the configuration file is located in the config template.json file in the conf directory, Please edit your own OpenAI API-key and the proxy IP (if no proxy is used, just set the empty string) in `conf/conf.json`. for example: ```json { "open_ai_api_key": "sk-vnZt0x***********************************Oh9DGgk", "proxy": "http://172.0.0.1:4780", "https_proxy": "http://172.0.0.1:4780", "http_proxy": "http://172.0.0.1:4780", } ``` 3. Getting Started 3.1. Using Static Analysis Tools to Generate Analysis Reports ```sh # Initialize environment variables $ source ./scripts/requires/init_env.sh # Change to the working directory $ cd test/npd # Use cppcheck-2.9 to scan and output as cppcheck_err.xml $ cppcheck --enable=warning . --output-file=cppcheck_err.xml --xml --force $ python3 ${ROOT_DIR}/scripts/process/cppcheck_XML2Json_for_single_err.py -f cppcheck_err.xml -p . # Or $ run_cppcheck.sh . # Use CSA to scan, output in the csa_report folder $ scan-build -plist -o csa_report make $ python3 ${ROOT_DIR}/scripts/process/csa_Plist2Json_for_single_err.py -f csa_report # Or $ run_csa.sh make # Use Infer to scan, output in the file infer_out/report.json $ infer --keep-going --biabduction --bufferoverrun --liveness --quandary --siof --uninit run -- make $ python3 ${ROOT_DIR}/scripts/process/infer_Json2Json_for_single_err.py -f infer-out/report.json # Or $ run_infer.sh make ``` Reports generated by Cppcheck, CSA, and Infer can be found in the `cppcheck_output`, `csa_output`, and `infer_otuput` folders, respectively. If the folder is empty, it indicates that the static analyzer did not report any warnings. 3.2. Automatically Preprocessing and Aggregating Analysis Reports into a Unified Format Assuming the final unified format report is named `Bug_XXX_0001.json`, `Bug_XXX_0002.json`, etc. For example, `Bug_0001_NPD.json` (generated by Cppcheck from `test/npd` in this case) represents a "Uninitialized Value" bug of type with the id number "0001", and its content is as follows (closest to Infer's JSON output format): ```json { "bug_type": "Null Pointer Dereference", "line": 24, "column": 3, "procedure": "", "file": "npd.c", "qualifier": { "Cppcheck": "Possible null pointer dereference: buf" }, "Trace": [ {"filename": "npd.c", "line_number": 24, "column_number": 3, "description": ""}, {"filename": "npd.c", "line_number": 21, "column_number": 21, "description": ""}, {"filename": "npd.c", "line_number": 46, "column_number": 18, "description": ""} ] } ``` 3.3. Extracting Relevant Code Snippets from Bug Reports and Project Source Code, and Constructing Prompts for Querying the Large Language Model 3.3.1.(Optional) Independent Step - Extracting Code Snippets from Bug Reports and Project Source Code ```sh $ python3 ${ROOT_DIR}/src/code-extractor/extract_code_snippet.py -f ${ROOT_DIR}/test/npd -r ${ROOT_DIR}/test/npd/Bug_0001_NPD.json -o CodeSnippet -m 1 ``` Use the `code-extractor/extract.py` script to extract code snippets. The `-f` parameter denotes the path to the project under analysis, `-r` is the current bug report for inspection, `-o` specifies the folder path for outputting code snippets, and `-m` indicates the depth of function call. You can see the bug report in addition to the code snippet in the `CodeSnippet` folder. 3.3.2. (Optional) Independent Step - Constructing Prompts Based on Bug Types and Querying the Large Language Model ```sh $ python3 ${ROOT_DIR}/src/bug-inspector/inspector.py -c ${ROOT_DIR}/test/npd/CodeSnippet/Trace_None_snippets_1.txt -m "gpt-3.5-turbo-16k" -o final_result ``` Use `bug-inspector/inspector.py` to query the large language model. The `-t` parameter specifies the bug type to analyze (can be left empty for general defect analysis. The results can be improved if the buy type is specified like UVA, UAF, NPD, etc.), `-c` denotes the path to the project under analysis, `-m` indicates the large language model to query, and `-o` specifies the folder path for output results. 3.3.3. (Recommended) One-Click Automated Script ```sh $ python3 ${ROOT_DIR}/src/bug-inspector/main.py -f ${ROOT_DIR}/test/npd -r ${ROOT_DIR}/test/npd/Bug_0001_NPD.json -m "gpt-3.5-turbo-16k" -o inspected_result.txt ``` The `-f` parameter represents the path to the project under analysis, the `-r` parameter indicates the current bug report to be inspected, the `-m` parameter represents the querried large language model, and the `-o` parameter represents the output file path for the result. 3.4. Final Bug Inspection Conclusion (Example) ``` INFO:root:LLM querying ... 👽: Based on the bug report and code snippet, let's analyze the issue. The bug report describes a "Possible null pointer dereference" at line 24 of npd.c. The error trace shows that the dereference occurs at line 24 itself. Looking at the code snippet, we can see that the `acl_read_cb` function is called at line 46 with `NULL` passed as the `priv` parameter. Inside the `acl_read_cb` function, at line 24, the `buf` pointer is dereferenced with `buf->flag += size;`. This indicates a potential null pointer dereference. However, it is important to note that before this dereference, at line 25, `buf` is assigned `NULL`, indicating that it is intentionally set to null. Therefore, the null pointer dereference at line 24 is intentional and not a bug. Based on this analysis, I conclude that the reported bug is a false alarm. @@@ false alarm @@@ ``` The last line of output can only be "@@@ real bug @@@", "@@@ false alarm @@@" and "@@@ unknown @@@". You only need to pay attention to the result on this line. The comprehensive results of multiple LLM queries are available in the `Bug_XXX_XXXX` (same as the name of the bug report) folder.

Related Organizations

Xidian University
China (People's Republic of)

Keywords

Large Language Model, Static Analysis, Static Warning, False Alarms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	15
download	downloads	2

15
views
2
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

15

2