Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Title

Authors: Anonymous, Anonymous; Anonymous, Anonymous; Anonymous, Anonymous;
Abstract

This repository contains the research implementation of OnionGuard. ⚠️ Note for Reviewers: Data Availability & Reproducibility To adhere to double-blind review policies and anonymous artifact hosting constraints (e.g., per-file size limits), this repository contains a lightweight version of the system artifacts. Knowledge Bases (KBs): We provide Lite versions of the vector stores. They are fully functional but include only a reduced subset of the original KB entries. Datasets: We provide fixed 200-sample subsets per benchmark under `dataset/` to verify the execution pipeline and logic. Performance: This artifact targets functional reproducibility and comparative validation; exact headline numbers from the full-scale experiments are not expected. 📋 Prerequisites - Python: 3.10.18 - Conda Anaconda - Hardware: NVIDIA GPU + CUDA driver (required for vLLM inference) 🛠️ Installation 1. Create and Activate Environment First, create a conda environment using the provided `environment.yml` file. conda env create -f environment.yml conda activate onion_guard 2. Install Package Install the package in editable mode. pip install -e . conda develop . 🚀 Getting Started To run OnionGuard, you need to start the vLLM server first, and then run the test scripts in a separate terminal. 1. Start the vLLM Server Run the startup script to initialize the inference server. chmod +x ./execute_vllm.sh bash ./execute_vllm.sh Note: Keep this terminal open while running the tests. 2. Run OnionGuard Open a new terminal, activate the environment, and navigate to the configuration directory. conda activate onion_guard cd examples/configs/OnionGuard You can evaluate OnionGuard using the following benchmark scripts. Attack Defense Benchmark Evaluate the defense performance against direct attacks. python ONION_GUARD_ATTACK_TEST.py Safety Dataset Benchmarks Evaluate OnionGuard against various standard safety datasets. python ONION_GUARD_BENCHMARK_TEST.py --dataset Supported Datasets: - `AEGIS` - `XSTEST` - `OAI` - `TOXIC` Examples: # Run benchmark on AEGIS dataset python ONION_GUARD_BENCHMARK_TEST.py --dataset AEGIS # Run benchmark on XSTEST dataset python ONION_GUARD_BENCHMARK_TEST.py --dataset XSTEST WildGuard Output Benchmark Evaluate the output filtering capabilities using the WildGuard benchmark. python ONION_GUARD_WILDGUARD_OUTPUT_TEST.py 📁 Key Paths (for reviewers) - Core OnionGuard logic: `nemoguardrails/library/onion_guard/` - Benchmark Configs & KBs: `examples/configs/OnionGuard/` - OnionGuard System Prompts: `examples/configs/OnionGuard/config/prompts.yml` ❓ Troubleshooting If you encounter any issues during reproduction, please check that: 1. the vLLM server is running, 2. the correct environment is activated, and 3. you are executing scripts under `examples/configs/OnionGuard/`.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average