Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
versions View all 5 versions
addClaim

Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

Authors: Pape, David; Mavali, Sina; Eisenhofer, Thorsten; Schönherr, Lea;

Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

Abstract

Artifact for the paper: "Prompt Obfuscation for Large Language Models" This artifact contains the code to reproduce the results presented in the paper. The code allows users to perform and evaluate prompt obfuscation as an alternative method to traditional system prompting for large language models. Furthermore, different deobfuscation methods are included. The main components to reproduce the results of the paper are the following groups of scripts: obfuscate.py and evaluate_obfuscation.py to obfuscate system prompts and evaluate them (Section 5.1, 5.2, 5.2 results). finetuning.py and evaluate_finetuning.py to finetune LoRa adapters and evaluate them (Section 5.4 results). prompt_extraction.py and evaluate_prompt_extraction.py to extract the system prompt and evaluate the success rate (Section 6.1 results). projection.py to project embedded (soft) prompts back to token space (Section 6.2 results). fluency_deobfuscation.py and evaluate_fluency_deobfuscation.py to deobfuscate obfuscated system prompts using fluency optimization and evaluate them (Section 6.3 results). Several helper scripts (generate_output.py, compare_output.py, compare_sys_prompts.py) to quickly generate and compare output and to compare system prompts for evaluation and baseline comparisons. Project Structure prompt_obfuscation├── README.md├── compare_output.py├── compare_sys_prompts.py├── data│ ├── __init__.py│ ├── config.py│ ├── loader.py│ └── utils.py├── evaluate_finetuning.py├── evaluate_fluency_deobfuscation.py├── evaluate_obfuscation.py├── evaluate_prompt_extraction.py├── extraction_prompts│ └── gpt4_generated.json├── finetune.py├── fluency_deobfuscation.py├── generate_output.py├── obfuscate.py├── projection.py├── prompt_extraction.py├── requirements.txt└── src ├── __init__.py ├── finetuning_utils.py ├── logging_config.py ├── model.py ├── output_generation.py ├── output_similarity.py ├── prompt_utils.py ├── style_prompts.py ├── sys_prompt_similarity.py └── utils.py The data/ directory handles dataset loading and processing. The src/ directory contains core logic for models, generation, and evaluation. The extraction_prompts/ directory contains extraction prompts for the prompt extraction attack. The python scripts in the root directory are used to run the experiments. Setup A GPU is highly recommended for reasonable computation times. Create a Python 3.12.7 environment (e.g., using conda): conda create -n prompt_obfuscation python=3.12.7conda activate prompt_obfuscation Install the required packages:pip install -r requirements.txt Hugging Face Access: The main model used (Llama-3.1-8B) requires a Hugging Face account with access granted to the model. Log in via the command line after requesting access on the model's page:huggingface-cli login Please see the README.md file for example usage and a full list of all command-line arguments.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average