
Formal verification of cryptographic implementations using proof assistants like F* and Rocq provides strong mathematical guarantees about code correctness. However, the verification process fundamentally depends on human-written specifications that translate informal standards (e.g., NIST FIPS documents, IETF RFCs) into formal machine-checkable predicates. These specifications constitute a critical component of the Trusted Computing Base (TCB), yet remain vulnerable to human error, ambiguity in natural language interpretation, and subtle logical mistakes. This paper presents Specification Consensus, a novel methodology that employs multiple independent Large Language Models (LLMs) as diverse specification generators, creating an N-version programming paradigm for formal specifications. By generating multiple independent formal specifications from the same authoritative standard and verifying cross-consistency through equivalence proofs, we establish implicit semantic bridges between natural language standards and verified implementations. Key contributions: Identification and characterization of the specification trust problem in formal verification A multi-LLM consensus framework for generating and validating formal specifications Methodology for specification equivalence verification using proof assistants Theoretical analysis of TCB reduction through specification diversity Experimental evaluation on SHA-256, AES-128, and ML-KEM cryptographic primitives Keywords: Formal verification, Trusted Computing Base, Large Language Models, cryptographic specifications, N-version programming, specification synthesis, F*, Rocq, high-assurance cryptography
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
