
# Artifact: Less Is More (GPCE 2026) ## Setup ### 1. Python environment ```bashpython -m venv venvsource venv/bin/activatepip install -r requirements.txt``` ### 2. HuggingFace API key ```bashcp .env.example .env# Edit .env and set HF_TOKEN to your HuggingFace Pro token``` ### 3. Target Java projects ```bashmkdir targets && cd targetsgit clone https://github.com/apache/commons-lang.gitgit clone https://github.com/WebGoat/WebGoat.gitcd ..``` ### 4. Joern ```bashcurl -L https://github.com/joernio/joern/releases/latest/download/joern-install.sh | bashjoern --server # runs on localhost:8080``` We used Joern 4.0.488. Later versions should work but may produce slightly different output. ## Running experiments ```bashpython scripts/run_a1.py --model qwen-72b --reps 3python scripts/run_a2.py --model qwen-72b --reps 3python scripts/run_a3.py --model qwen-72b --reps 3``` Available models: `qwen-72b`, `qwen-7b`, `llama-70b`, `llama-8b`. ## Running the mapper tests (no infrastructure needed) ```bashpython -m pytest tests/test_mapper.py -v``` 22 tests, one per benchmark task plus serialisation and field-count checks. ## Reproducing paper numbers from included data All 660 trial results are in `data/results/` as JSON files. To reproduce the paper's tables: ```bashpython scripts/run_eval.py``` ## Reading the result files Each file in `data/results/` is named `a{1,2,3}_{model}_{timestamp}.json` and contains a `trials` array. Each trial has: - `task_id` — which benchmark task (S01–S07, D01–D07, C01–C06)- `result_match` — whether the output matched the ground truth (the primary metric)- `execution_success` — whether the query executed without error- `input_tokens`, `output_tokens` — token consumption For A1 and A2, each trial also has `generated_cpgql` (the query the LLM produced). For A3, each trial has a `steps` array showing every tool call, its arguments, and the Joern response, followed by `final_answer`. Example: ```pythonimport jsonwith open("data/results/a3_qwen2.5_72b_instruct_20260226_180317.json") as f: data = json.load(f)trial = data["trials"][0]print(trial["task_id"], trial["result_match"], len(trial["steps"]))```
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
