
This repository contains the research implementation of OnionGuard. ⚠️ Note for Reviewers: Data Availability & Reproducibility To adhere to double-blind review policies and anonymous artifact hosting constraints (e.g., per-file size limits), this repository contains a lightweight version of the system artifacts. Knowledge Bases (KBs): We provide Lite versions of the vector stores. They are fully functional but include only a reduced subset of the original KB entries. Datasets: We provide fixed 200-sample subsets per benchmark under `dataset/` to verify the execution pipeline and logic. Performance: This artifact targets functional reproducibility and comparative validation; exact headline numbers from the full-scale experiments are not expected. 📋 Prerequisites - Python: 3.10.18 - Conda Anaconda - Hardware: NVIDIA GPU + CUDA driver (required for vLLM inference) 🛠️ Installation 1. Create and Activate Environment First, create a conda environment using the provided `environment.yml` file. conda env create -f environment.yml conda activate onion_guard 2. Install Package Install the package in editable mode. pip install -e . conda develop . 🚀 Getting Started To run OnionGuard, you need to start the vLLM server first, and then run the test scripts in a separate terminal. 1. Start the vLLM Server Run the startup script to initialize the inference server. chmod +x ./execute_vllm.sh bash ./execute_vllm.sh Note: Keep this terminal open while running the tests. 2. Run OnionGuard Open a new terminal, activate the environment, and navigate to the configuration directory. conda activate onion_guard cd examples/configs/OnionGuard You can evaluate OnionGuard using the following benchmark scripts. Attack Defense Benchmark Evaluate the defense performance against direct attacks. python ONION_GUARD_ATTACK_TEST.py Safety Dataset Benchmarks Evaluate OnionGuard against various standard safety datasets. python ONION_GUARD_BENCHMARK_TEST.py --dataset Supported Datasets: - `AEGIS` - `XSTEST` - `OAI` - `TOXIC` Examples: # Run benchmark on AEGIS dataset python ONION_GUARD_BENCHMARK_TEST.py --dataset AEGIS # Run benchmark on XSTEST dataset python ONION_GUARD_BENCHMARK_TEST.py --dataset XSTEST WildGuard Output Benchmark Evaluate the output filtering capabilities using the WildGuard benchmark. python ONION_GUARD_WILDGUARD_OUTPUT_TEST.py 📁 Key Paths (for reviewers) - Core OnionGuard logic: `nemoguardrails/library/onion_guard/` - Benchmark Configs & KBs: `examples/configs/OnionGuard/` - OnionGuard System Prompts: `examples/configs/OnionGuard/config/prompts.yml` ❓ Troubleshooting If you encounter any issues during reproduction, please check that: 1. the vLLM server is running, 2. the correct environment is activated, and 3. you are executing scripts under `examples/configs/OnionGuard/`.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
