Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

Anonymous

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software

Data sources: ZENODO

Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

integration_instructionsResearch softwarekeyboard_double_arrow_right Software Under curationPublisher:Zenodo

Authors: Anonymous;

doi: 10.5281/zenodo.19350077

Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

- Summary

Abstract

SAFuzzer 1. SAFuzzer Project Introduction SAFuzzer is an innovative framework for testing Static Application Security Testing (SAST) tools through semantic-preserving code mutations. The framework employs a three-phase pipeline: Mutator Invention: Mines patterns from real-world refactoring commits and transforms them into executable Spoon-based mutators via LLM agents Mutator Refinement: Validates each mutator's semantic-preservation guarantee through rigorous dynamic equivalence checking Static Analyzer Testing: Applies validated mutators at scale to test static analyzers via metamorphic testing SAFuzzer supports mainstream SAST tools including SpotBugs, PMD, Infer, CheckStyle, and SonarQube. The framework uses Java Spoon for AST manipulation . 2. Top 10 Mutators Causing Bugs The following are the top 10 mutators that most frequently cause bugs in SAST tools during testing: Rank Mutator Name Issue Count Issues 1 EqualityCheckToInstanceofMutator 4 SpotBugs #3916, Infer #2001, Sonar S2259, PMD #6513 2 ConditionalBlockInsertionMutator 4 SpotBugs #3894, Infer #2015, SpotBugs #3929, PMD #6518 3 ParenthesesAdditionMutator 4 SpotBugs #3904, Infer #2015, PMD #6491, CheckStyle #19162 4 IfConditionReorderingMutator 3 SpotBugs #3886, SpotBugs #3920, SpotBugs #3963 5 VariableAdditionMutator 3 SpotBugs #3884, Infer #2015, Infer #1993 6 NullCheckReorderingMutator 3 SpotBugs #3920, SpotBugs #3886, SpotBugs #3916 7 ConditionalLogicInsertionMutator 2 PMD #6518, SpotBugs #3978 8 MethodChainCallSwapMutator 2 SpotBugs #3966, PMD #6494 9 ConditionNegationMutator 2 PMD #6435, SpotBugs #3963 10 SingleLineReturnToBlockReturnMutator 2 PMD #6491, PMD #6519 3. SAFuzzer Usage Guide Project Architecture Overview SAFuzzer consists of three main components: Semantic_Equivalence_Knowledge_Base: A Python-based pipeline for mining semantic-preserving code patterns from real-world refactoring commits using LLM agents and dynamic execution validation. This component extracts transformation patterns from GitHub commits and validates their semantic equivalence. MutatorExecutor: A standalone Maven project containing mutator implementations and semantic equivalence validation module. This component is used by the knowledge base pipeline to dynamically validate semantic equivalence of code transformations. Main SAFuzzer Framework: The core Java application that applies validated mutators to test SAST tools via metamorphic testing. This is the primary tool for detecting bugs in static analyzers. Mutator Generation and Validation Pipeline The framework includes three Python scripts that automate the mutator invention and refinement process: 1. Stage 1: Mutator Generation (stage1_generator.py) Input: Code pairs from GitHub refactoring commits (raw_diffs_chunk_*_output.json) Output: Mutator descriptions (JSON) and Java implementations Process: Uses LLM to analyze code pairs and generate mutator descriptions Converts descriptions into executable Spoon-based Java mutators Output Structure: outputs/ ├── 1_mutator_description/ # JSON descriptions └── 2_mutator_implementation/ # Java implementations 2. Stage 2: Compilation Verification (stage2_compilation_verification.py) Input: Java mutators from Stage 1 Output: Compilable mutators Process: Deploys mutators to sandbox environment Validates compilation using javac Automatically repairs compilation errors using LLM agents (max 5 attempts) Key Features: Parallel processing (16 workers) Sandbox isolation for each mutator Intelligent repair with specialized tools Output Structure: outputs/compilable_3-7/ # Compilable mutators 3. Stage 3: Fast Semantic Verification (stage3_fast_verify.py) Input: Compilable mutators + test seeds Output: Validation results + repair datasets Process: Phase 1: Quick verification with 200 seeds Phase 2: Extends to 500 seeds if pass rate < 90% Phase 3: Extends to 1000 seeds if zero triggers Repair loop: LLM-driven repair for failed mutators (max 3 attempts) Validation Criteria: Pass: Trigger rate > 0 AND pass rate ≥ 90% (200 seeds) OR ≥ 80% (extended seeds) Fail: Pass rate not met OR zero triggers Output Structure: lists/ ├── success_list_v2.txt # Successfully validated mutators └── fail_list_v2.txt # Failed mutators refine_dataset/ # Detailed repair datasets (JSON) Running the Pipeline # Step 1: Generate mutators from refactoring commits python stage1_generator.py # Step 2: Verify and repair compilation errors python stage2_compilation_verification.py [start_chunk] # Step 3: Validate semantic preservation python stage3_fast_verify.py Environment Requirements Java 17 or higher (Maven compilation target) Python 3.8+ (for analysis scripts in Semantic_Equivalence_Knowledge_Base) Maven 3.6+ for building the project 8GB RAM minimum, 16GB RAM recommended 20GB free disk space for generated mutants and results Quick Start with the Core Framework Package Step 1: Extract and Setup # Extract the ZIP file unzip SAFuzzer_Core_Framework_*.zip cd SAFuzzer_Core_Framework # Make scripts executable chmod +x run_complete_pipeline.sh test_pipeline_quick.sh chmod +x Semantic_Equivalence_Knowledge_Base/run_pipeline.sh Step 2: Install SAST Tools Before running SAFuzzer, you need to install the SAST tools. Follow the instructions in tools/README.md to download and install: SpotBugs 4.9.8 PMD 7.22.0 CheckStyle 13.3.0 Infer 1.2.0 SonarQube Scanner 8.0.1 Step 3: Configure Tool Paths # Copy the configuration template cp config.properties.template config.properties # Edit config.properties with your tool paths nano config.properties # or use your favorite editor Update the paths in config.properties: spotbugs.jar.path=/absolute/path/to/spotbugs-4.9.8/lib/spotbugs.jar pmd.cli.path=/absolute/path/to/pmd-bin-7.22.0/bin/pmd checkstyle.jar.path=/absolute/path/to/checkstyle-13.3.0-all.jar infer.cli.path=/absolute/path/to/infer-linux-x86_64-v1.2.0/bin/infer sonar.scanner.path=/absolute/path/to/sonar-scanner-8.0.1.6346-linux-x64/bin/sonar-scanner Step 4: Build the Project # Build main SAFuzzer framework mvn clean compile package # Build MutatorExecutor (for semantic validation) cd MutatorExecutor mvn clean compile cd .. Step 5: Install Python Dependencies # Install required Python packages pip install -r Semantic_Equivalence_Knowledge_Base/requirements.txt Step 6: Run Quick Verification Test # Test if everything works correctly ./test_pipeline_quick.sh If all tests pass, you're ready to run the full pipeline! Running the Complete Pipeline Option A: Run All Three Stages (Recommended) # This runs the complete SAFuzzer pipeline end-to-end ./run_complete_pipeline.sh The script will: Check environment and dependencies Build the project if needed Run the Semantic Equivalence Knowledge Base pipeline (Stage 1) Validate mutators using MutatorExecutor (Stage 2) Test SAST tools with validated mutators (Stage 3) Generate results and summary Option B: Run Individual Stages Stage 1: Mutator Invention (Pattern Mining) cd Semantic_Equivalence_Knowledge_Base ./run_pipeline.sh This stage mines refactoring patterns from GitHub commits. Note: This requires GitHub API access and may take several hours. Stage 2: Mutator Refinement (Semantic Validation) cd MutatorExecutor mvn compile # The validation is integrated into Stage 1 pipeline Stage 3: Testing Static Analyzers # Test a specific test case with SpotBugs java -cp "target/SASTFuzz-1.0-.jar:target/classes:target/dependency/*" \ com.mutation.Main \ --project_path "." \ --target_case "seeds.PMD_Seeds.bestpractices_AccessorClassGeneration.AccessorClassGeneration1" \ --target_SAST "SpotBugs" \ --max_iter 10 # Test all SAST tools on a test case java -cp "target/SASTFuzz-1.0-.jar:target/classes:target/dependency/*" \ com.mutation.Main \ --project_path "." \ --target_case "seeds.SpotBugs_Seeds.bestpractices_ArrayIsStoredDirectly.ArrayIsStoredDirectly1" \ --target_SAST "ALL" \ --max_iter 20 Command Line Parameters --project_path <arg> Source code root directory (required) --target_case <arg> Target Java class (package.ClassName format) (required) --target_SAST <arg> SAST tool to test: SpotBugs, PMD, CheckStyle, Infer, SonarQube, Semgrep, or ALL (required) --max_iter <arg> Maximum mutation iterations (default: 50) Output Structure Results are organized in results/run_YYYYMMDD_HHMMSS/: safuzzer_output.log: Complete execution log final_results/: Generated mutants and SAST reports 0/: Original seed code with baseline SAST analysis 1..N/: Each iteration's mutated code and SAST results iteration_history.txt: Trace of applied mutators verification_summary.txt: Pipeline verification results Advanced Configuration Custom Mutator Selection The framework automatically selects from all available mutators. To modify mutator behavior, edit the Scheduler.run() method in src/com/mutation/Scheduler.java. Rule Coverage Experiment Enable JaCoCo coverage measurement in config.properties: jacoco.enabled=true jacoco.agent.path=/path/to/jacoco-agent.jar jacoco.cli.path=/path/to/jacoco-cli.jar Custom SAST Tool Integration Implement new SAST tool classes extending the SAST abstract class in src/com/mutation/config/. 4. Detected Bug Case Demonstrations Case 1: PMD SimplifyConditional False Negative (#6513) Bug Description: PMD fails to detect a redundant null check before instanceof when additional conditions are interleaved in the && chain by a semantic-preserving mutation. Original Code (PMD correctly reports SimplifyConditional): public class SimplifyConditionalDemo { public void foo() { String s = "a"; if (s != null && s instanceof String) { // <- SimplifyConditional reported (TP) System.out.println(s); } } } Mutated Code (PMD silently misses the bug): public class SimplifyConditionalDemo { public void foo() { String s = "a"; String s2 = "a"; if (s != null && s2 != null && s instanceof String) { // <- null check still redundant, but NOT reported (FN) System.out.println(s); } } } Triggering Mutator: NonNullVarRedundantNullCheckMutator — inserts an additional s2 != null guard into an existing && chain, a common defensive coding pattern that does not change the semantics of the original condition. Analysis: In both cases the s != null check immediately before s instanceof String is completely redundant, since instanceof already handles null by returning false. PMD's SimplifyConditional detector only matches the pattern when the null check and instanceof are directly adjacent in the && chain. Once any intervening condition is inserted between them, the rule fails to trace the relationship and produces a False Negative. This issue is open and reported on Mar 20, 2026. Case 2: SpotBugs IM_BAD_CHECK_FOR_ODD False Negative (#3886) Bug Description: SpotBugs fails to detect the incorrect odd-number check pattern when the condition operands are reordered into Yoda-style by a semantic-preserving mutation. Original Code (SpotBugs correctly reports IM_BAD_CHECK_FOR_ODD): public class TestModulo { public void standardCheck(int i) { if (i % 2 == 1) { // <- IM_BAD_CHECK_FOR_ODD reported (TP) System.out.println("Odd"); } } } Mutated Code (SpotBugs silently misses the bug): public class TestModulo { public void yodaCheck(int i) { if (1 == i % 2) { // <- semantically identical, but IM_BAD_CHECK_FOR_ODD NOT reported (FN) System.out.println("Odd"); } } } Triggering Mutator: IfConditionReorderingMutator — rewrites <expr> == <literal> into the Yoda-style <literal> == <expr>, a common and semantically equivalent code transformation. Analysis: Both i % 2 == 1 and 1 == i % 2 are semantically identical and share the same bug: this check incorrectly returns false for negative odd integers (e.g., -3 % 2 == -1, not 1). SpotBugs' IM_BAD_CHECK_FOR_ODD detector only matches the canonical operand order and fails to recognize the Yoda variant, resulting in a False Negative. This bug was subsequently fixed via PR #3935. Case 3: PMD ForLoopCanBeForeach False Negative (#6495) Bug Description: PMD fails to detect that a traditional index-based for loop can be replaced by an enhanced foreach loop when the array length is first extracted into a pre-declared local variable by a semantic-preserving mutation. Original Code (PMD correctly reports ForLoopCanBeForeach): public class PMD_FN_Demo { public void testTruePositive(long[] counts) { double total = 0; for (int i = 0; i < counts.length; i++) { // <- ForLoopCanBeForeach reported (TP) total += counts[i]; } } } Mutated Code (PMD silently misses the bug): public class PMD_FN_Demo { public void testFalseNegative(long[] counts) { double total = 0; int len = counts.length; // array length extracted to a local variable for (int i = 0; i < len; i++) { // <- semantically identical, but ForLoopCanBeForeach NOT reported (FN) total += counts[i]; } } } Triggering Mutator: ConditionalBlockInsertionMutator (combined with loop bound extraction) — hoists the array.length expression into a pre-declared local variable, a standard performance-oriented refactoring that does not change loop semantics. Analysis: Both loops iterate over the entire array in the same order and produce identical results. PMD's ForLoopCanBeForeach rule performs pattern matching on the loop condition and expects i < array.length literally in the for header. When the bound is stored in an intermediate variable len, the rule's detector fails to trace back to the array and misses the violation. A PR (#6521) has been submitted to address this. 5. Bugs Summary Table Bug Statistics Overview The following table summarizes bugs detected across different SAST tools and their current status: Issue Status SpotBugs PMD Infer SonarQube CheckStyle Overall Reported 18 10 8 4 2 42 Confirmed 12 6 0 3 1 22 Fixed 2 0 0 0 1 3 Won't Fix 1 0 0 0 0 1 Bug Details Bug Type Rule Status Issue ID Issue Link Rule Link FN NN_NAKED_NOTIFY Reported #3884 Link Rule FN IM_BAD_CHECK_FOR_ODD Fixed #3886 Link Rule FN ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD Confirmed #3893 Link Rule FN UCF_USELESS_CONTROL_FLOW Confirmed #3894 Link Rule FN RV_RETURN_VALUE_IGNORED_NO_SIDE_EFFECT Confirmed #3900 Link Rule FP IL_INFINITE_RECURSIVE_LOOP Confirmed #3904 Link Rule FN SF_SWITCH_NO_DEFAULT Confirmed #3905 Link Rule FN NULLPTR_DEREFERENCE Reported #1992 Link Rule FN DIVIDE_BY_ZERO Reported #1993 Link Rule FN UnconditionalIfStatement Confirmed #6435 Link Rule FP INFINITE_EXECUTION_TIME Reported #2000 Link Rule FN NP_LOAD_OF_KNOWN_NULL_VALUE Confirmed #3916 Link Rule FN NULL_DEREFERENCE Reported #2001 Link Rule FN S2259 (Null pointers should not be dereferenced) Reported #177381 Link Rule FP DANGLING_POINTER_DEREFERENCE Reported #2002 Link Rule FN INFINITE_EXECUTION_TIME Reported #2005 Link Rule FN RCN_REDUNDANT_COMPARISON_OF_NULL_AND_NONNULL_VALUE Confirmed #3920 Link Rule FP SA_LOCAL_SELF_ASSIGNMENT Confirmed #3929 Link Rule FN URF_UNREAD_FIELD Confirmed #3955 Link Rule FN NULLPTR_DEREFERENCE Reported #2015 Link Rule FN UselessOverridingMethod Reported #6491 Link Rule FN CloseResource Reported #6494 Link Rule FN LeftCurly Fixed #19162 Link Rule FN ForLoopCanBeForeach Confirmed #6495 Link Rule FN NP_LOAD_OF_KNOWN_NULL_VALUE Reported #3961 Link Rule FN NULLPTR_DEREFERENCE Reported #2019 Link Rule FP CWO_CLOSED_WITHOUT_OPENED Reported #3962 Link Rule FP IL_INFINITE_RECURSIVE_LOOP Confirmed #3963 Link Rule FN DM_STRING_TOSTRING Fixed #3966 Link Rule FN SimplifyConditional Reported #6513 Link Rule FN SA_FIELD_DOUBLE_ASSIGNMENT Reported #3975 Link Rule FN UselessPureMethodCall Confirmed #6517 Link Rule FN NS_NON_SHORT_CIRCUIT Reported #3976 Link Rule FN UnusedAssignment Confirmed #6518 Link Rule FP DoNotUseThreads Confirmed #6520 Link Rule FN SimplifyBooleanReturns Confirmed #6519 Link Rule FN CollectionTypeMismatch Reported #6526 Link Rule FP IL_INFINITE_RECURSIVE_LOOP Reported #3978 Link Rule FN AvoidInstantiatingObjectsInLoops Reported #6560 Link Rule FN NP_NULL_ON_SOME_PATH Reported #3985 Link Rule FP INTEGER_OVERFLOW_L2 Reported #2027 Link Rule FN Inconsistent synchronization Reported #3986 Link Rule

Found an issue? Give us feedback