JailFact-Bench: A Comprehensive Analysis of Jailbreak Attacks vs. Hallucinations in LLMs

JailFact-Bench is a curated benchmark dataset for analyzing jailbreak attacks and hallucination patterns in Large Language Models (LLMs). It contains semantically aligned jailbreak and factuality prompts, along with metadata including toxicity shifts, similarity scores, and annotation strategies. Developed as part of a capstone research project at NYU Abu Dhabi under Professor Christina Pöpper, this dataset accompanies the paper accepted at the SiMLA 2025 Workshop, co-located with the 23rd International Conference on Applied Cryptography and Network Security (ACNS).

Related Organizations

Abu Dhabi University
United Arab Emirates
Ruhr University Bochum
Germany

Keywords

Large Language Models, Factuality Analysis, LLM Hallucination, Jailbreak Attack Prompts, AI Safety, Jailbreak Attacks

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average