LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification

Name: LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification
Creator: Tarsha Kurdi, Mamone

Tarsha Kurdi, Mamone

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Other literature type . 2025

License: CC BY

Data sources: Datacite

ZENODO

Other literature type . 2025

License: CC BY

Data sources: Datacite

LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification

descriptionPublicationkeyboard_double_arrow_right Other literature type 29 Nov 2025Publisher:SECEQ Research

Authors: Tarsha Kurdi, Mamone;

doi: 10.5281/zenodo.17765295 , 10.5281/zenodo.17765294

LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification

- Summary
- Metrics

Abstract

Formal verification of cryptographic implementations using proof assistants like F* and Rocq provides strong mathematical guarantees about code correctness. However, the verification process fundamentally depends on human-written specifications that translate informal standards (e.g., NIST FIPS documents, IETF RFCs) into formal machine-checkable predicates. These specifications constitute a critical component of the Trusted Computing Base (TCB), yet remain vulnerable to human error, ambiguity in natural language interpretation, and subtle logical mistakes. This paper presents Specification Consensus, a novel methodology that employs multiple independent Large Language Models (LLMs) as diverse specification generators, creating an N-version programming paradigm for formal specifications. By generating multiple independent formal specifications from the same authoritative standard and verifying cross-consistency through equivalence proofs, we establish implicit semantic bridges between natural language standards and verified implementations. Key contributions: Identification and characterization of the specification trust problem in formal verification A multi-LLM consensus framework for generating and validating formal specifications Methodology for specification equivalence verification using proof assistants Theoretical analysis of TCB reduction through specification diversity Experimental evaluation on SHA-256, AES-128, and ML-KEM cryptographic primitives Keywords: Formal verification, Trusted Computing Base, Large Language Models, cryptographic specifications, N-version programming, specification synthesis, F*, Rocq, high-assurance cryptography

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now