An Exact-Fixed-Point Reference Benchmark and Validation Methodology for Simulators of Signed k-State Voter Dynamics on Networks

Melegh, Janos Gabor

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

An Exact-Fixed-Point Reference Benchmark and Validation Methodology for Simulators of Signed k-State Voter Dynamics on Networks

descriptionPublicationkeyboard_double_arrow_right Preprint Under curationPublisher:Zenodo

Authors: Melegh, Janos Gabor;

doi: 10.5281/zenodo.20555674

An Exact-Fixed-Point Reference Benchmark and Validation Methodology for Simulators of Signed k-State Voter Dynamics on Networks

- Summary

Abstract

Open technical disclosure · v1.0 · 5 June 2026. This document specifies, in fully reproducible and implementation-ready detail, a tiered reference benchmark and an associated validation methodology for simulators of signed k-state voter dynamics on networks, anchored on an exact, seed-independent, machine-precision fixed point established by the companion empirical study (P02_TOPOLOGY_PHASE_MAP, v0.5). It is released for unrestricted public use, study, implementation, modification, and redistribution. The reference quantities, test definitions, data schemas, and reference code given below are placed in the public record so that any implementer, on any substrate, may verify their software against a common, freely available standard. No registration, credential, fee, or grant of permission is required to use anything described herein. Preprint · Reproducibility & Reference Benchmarking in the Statistical Physics of Networks An Exact-Fixed-Point Reference Benchmark and Validation Methodology for Simulators of Signed k-State Voter Dynamics on Networks A freely reusable, implementation-agnostic conformance suite built on a machine-precision analytic invariant, a topology-invariant stress baseline, and a degree-attribution mechanism signature János Gábor Melegh1 1Independent researcher. 5 June 2026· Manuscript ID: P02_BENCHMARK_SUITE· v1.0 · companion to P02_TOPOLOGY_PHASE_MAP v0.5 Abstract Computational studies of stochastic dynamics on networks are difficult to validate because the quantities they produce are usually distributional: a result that is “close enough” can mask a defect in the update rule, the random number stream, the graph generator, the floating-point reduction, or the parallel decomposition. We observe that the companion empirical program establishes a quantity that is not merely close but exact: for a per-node local stress Ti and a per-node local admissibility ψi, the across-node Spearman rank correlation H3 = ρS(T, ψ) equals exactly −1 to machine precision on every degree-regular ensemble, independent of size, negative-edge fraction, and random seed (sample standard deviation of order 10−16). An exact, seed-independent invariant is the ideal anchor for a software conformance test, because a correct implementation must reproduce it to floating-point tolerance while almost any category of defect breaks it. Around this anchor we define a four-tier benchmark suite: Tier 0 exact/analytic tests (machine precision, seed-free), Tier 1 distributional baselines with calibrated tolerance bands (including the topology-invariant mean stress ⟨T⟩ = 0.324, Kruskal–Wallis p = 1.00), Tier 2 mechanism-attribution signatures (mean-degree variance dominance: random-forest importance 0.94, cross-validated R2 = 0.99, collapse classifier AUC 0.998), and Tier 3 perturbation-response signatures (matched-degree residual ≈ 0.19; a sharp degree–collapse boundary; and a sign-and-order-of-magnitude asymmetric bistability under degree-targeted perturbation). We give the precise observable contract, canonical generator specifications, a portable on-disk manifest schema, reference pseudocode, and acceptance criteria; we then enumerate, in deliberate breadth, the practical contexts in which the suite can be used — continuous-integration regression testing, cross-language and cross-substrate conformance, floating-point and reduction-order auditing, random-stream verification, GPU/distributed/out-of-core reproducibility, differentiable and surrogate-model validation, hardware/accelerator/neuromorphic/probabilistic substrate qualification, mean-field and closure-solver validation, teaching and assessment, peer-review artifact evaluation and conformance certification, and applied multi-agent/swarm/IoT consensus-controller qualification — and we describe how the same construction generalizes to other dynamics, observables, alphabet sizes, and graph families. The entire specification is placed in the public record for free use. Keywords — reference benchmark · software conformance testing · reproducibility · exact fixed point · signed networks · k-state voter model · Spearman rank correlation · machine-precision invariant · matched-degree control · mean-degree attribution · floating-point auditing · random-number-stream verification · GPU/distributed reproducibility · differentiable simulation · mean-field validation · multi-agent consensus · FAIR data · open technical disclosure Contents IIntroduction: the validation gap IIReference observables and ground-truth quantities IIIThe four-tier benchmark specification IVReference implementation, data formats, tolerances VMethods of use in practice (broad enumeration) VIGeneralization and extension VIIWorked validation protocol and checklist VIIIDistribution, openness, and reuse posture IXLimitations and scope XConclusions I.Introduction: the validation gap in computational network dynamics Simulators of stochastic dynamics on networks are now ubiquitous across statistical physics, computational social science, epidemiology, systems biology, distributed computing, and applied control. Their outputs, however, are notoriously hard to verify. A simulation of a voter-type, Ising-type, contagion-type, or consensus-type process produces sample statistics whose expected values are themselves estimated, so a wrong implementation can sit comfortably inside the sampling noise of a correct one. When two laboratories disagree, or when a single code path is refactored, ported to a new language, accelerated on a GPU, distributed across nodes, compiled with aggressive floating-point flags, or reseeded under a new random-number generator, there is often no sharp, agreed-upon quantity against which either party can be declared correct or incorrect. The literature on reproducibility and artifact evaluation has repeatedly identified this absence of oracles — checkable ground truths — as a central obstacle [1,2]. The contribution of this disclosure is to recognize that the companion empirical program [3] furnishes exactly such an oracle, and to build a complete, freely reusable conformance methodology around it. That program studies a k-state (with k = 4) voter-like dynamics with signed edges, extracts two per-node scalars — a local stress Ti ∈ [0,1] and a local admissibility ψi ∈ [0,1] — and reports as its principal observable the across-node Spearman rank correlation H3 = ρS(T, ψ). Its headline structural result is not a fitted curve but an exact identity: on every degree-regular ensemble the two rank orderings are inverse-locked, so H3 = −1 to machine precision, with a sample standard deviation of order 10−16, and this holds independently of network size, of the negative-edge fraction, and of the random seed. Why an exact invariant is the ideal benchmark anchor A test whose pass condition is a single, seed-independent, machine-precision number has three properties that distributional tests lack. (i) It is unambiguous: a conforming run must return −1 to within a tolerance set by floating-point round-off, not by statistics, so “pass” and “fail” are crisp. (ii) It is broadly sensitive: the value is the joint product of the graph generator, the state alphabet, the sign assignment, the update rule, the stress and admissibility definitions, the rank-correlation reduction, and the numerical pipeline — so a defect in any of these layers tends to perturb it away from −1. (iii) It is cheap: because it is seed-independent, a single small regular-graph realization suffices, making it suitable for fast continuous-integration gates. We therefore promote this identity to the apex of a tiered benchmark, and add to it the other stable, already-completed quantities of the companion program — a topology-invariant mean-stress baseline, a mean-degree variance-attribution signature, and a set of perturbation-response signatures — each with its own appropriate tolerance regime. The remainder of the document specifies these quantities precisely (Sec. II), defines the four-tier suite and its acceptance criteria (Sec. III), gives portable data formats and reference code (Sec. IV), enumerates in deliberate breadth the practical settings in which the suite is useful (Sec. V), explains how the construction generalizes to other dynamics and observables (Sec. VI), and provides a worked protocol and checklist (Sec. VII). This document is written as an enabling disclosure: it is intended to teach a practitioner of ordinary skill how to construct and use the benchmark on any platform, in full, without recourse to any proprietary component, and it is released for unrestricted public reuse (Sec. VIII). II.Reference observables and ground-truth quantities This section fixes the exact contract that an implementation must satisfy in order to be testable against the suite. All quantities are taken from, or directly derived from, the companion study [3]. II.AThe model contract A conforming simulator operates on a connected, undirected graph G = (V,E) with n = |V| nodes and m = |E| edges. Each node i carries a discrete state si ∈ {0, 1, …, k−1} with alphabet size k = 4 in the reference configuration. Each edge (i,j) carries a sign σij ∈ {+1, −1}; the fraction of negative edges fneg is the principal control parameter. The update is of the sequential voter type. A run relaxes to a configuration from which the two per-node observables are computed. II.BThe two per-node observables and the principal statistic For each relaxed configuration the implementation computes, per node, a local stress Ti ∈ [0,1] (the local conflict between the focal node and its neighbours) and a local admissibility ψi ∈ [0,1] (the compatibility of the focal node's state with its local equilibrium). The principal observable is the across-node Spearman rank correlation H3 = ρS(T·, ψ·) = 1 − 6 Σi di2 / [ n(n2 − 1) ], (1) where di is the difference between the rank of Ti and the rank of ψi. Auxiliary observables that the suite also uses are the mean admissibility ⟨ψ⟩, the fraction of zero-admissibility nodes π0, the mean local stress ⟨T⟩, and a binary collapse flag that is true when ⟨ψ⟩ < 0.02 and π0 > 0.95 both hold. The collapse flag is a single Boolean derived from two of the above scalars; an implementation that reports the scalars correctly reports the flag correctly by construction. The exact identity (the apex oracle) On any degree-regular ensemble — every node has the same degree d — the rank ordering of Ti across nodes is the exact reverse of the rank ordering of ψi, so all rank differences are maximal and symmetric, eq. (1) collapses to its extremal value, and H3 = −1 exactly. The companion study confirms this on the 2D lattice and the random d-regular ensemble with a sample standard deviation of order 10−16, i.e. at the level of double-precision round-off, and observes that the value is invariant to size, frustration, and seed. II.CCanonical ensembles and their structural descriptors The suite fixes nine canonical generators so that every implementer draws from identical distributions. The exact generator parameters and the structural descriptors — mean degree ⟨k⟩, degree coefficient of variation CV(k), and mean clustering ⟨C⟩ at n = 1000 — are reproduced in Table I from the companion study and serve as Tier-1 distributional reference values in their own right. Table I. The nine canonical generators and their structural descriptors at n = 1000. The two perfectly regular ensembles (lattice, random regular) carry the exact H3 = −1 oracle. Ensemble Generator specification ⟨k⟩ CV(k) ⟨C⟩ lattice 2D grid, relabelled 3.83 0.104 0.000 random_regular d-regular, d = 8 8.00 0.000 0.013 watts_strogatz WS(n, k=8, p=0.10) 8.00 0.110 0.472 modular_sbm SBM, 4 equal blocks 6.10 0.393 0.028 erdos_renyi G(n, p), p = 8/(n−1) 8.06 0.346 0.016 barabasi_albert BA(n, m=4) 7.94 1.002 0.058 powerlaw_cluster Holme–Kim (m=4, p=0.25) 7.92 1.049 0.151 random_geometric RGG, r=√(8/πn)×1.65 19.45 0.272 0.626 geometric_soft_radius RGG, r=√(8/πn)×2.10 30.59 0.250 0.637 II.DThe stable reference values used by the suite The benchmark draws only on quantities that the companion program reports as complete and stable. These are summarized in Table II together with the tier into which each is placed and the kind of tolerance it admits. Table II. Reference values and their benchmark tiers. Exact values admit a floating-point tolerance; statistical values admit a calibrated band; signature values admit a sign/ordering/magnitude criterion. Reference quantity Value Tier Tolerance type H₃ on regular ensembles −1.000 0 machine (∼1e−12) s.d. of H₃ on regular ∼1e−16 0 machine |H₃| six-class range [0.80, 1.00] 1 band mean stress ⟨T⟩ 0.324 1 band stress topology-invariance (KW p) 1.00 1 band mean-degree importance (H₃) 0.94 2 band mean-degree importance (⟨ψ⟩) 0.83 2 band cross-validated R² 0.99 2 band collapse-classifier AUC 0.998 2 band matched-degree residual (k=8) 0.195 3 band degree–collapse boundary {.00,.11,.79,.97,1.0} 3 monotone+band frustration peak window fₙ₋ₐ ∈ [.30,.50] 3 ordering node-thinning uncollapse (10%) −60 pp 3 sign+magnitude edge-addition collapse (10%) ∼−5 to −7 pp 3 sign+magnitude III.The four-tier benchmark specification The suite is organized into four tiers of increasing statistical looseness and decreasing discriminatory sharpness. An implementation declares conformance at the highest tier it passes; the tiers are cumulative in spirit but independently checkable, so a lightweight continuous-integration gate may run Tier 0 alone, while a publication-grade qualification runs all four. III.ATier 0 — exact / analytic tests (seed-free, machine precision) Tier 0 consists of the exact identity of Sec. II.B. For each perfectly regular ensemble (the 2D lattice and the random d-regular graph), at any single small size (e.g. n = 300), any single seed, and any single fneg, the implementation must return H3 = −1 with |H3 + 1| ≤ ε0, where ε0 is a floating-point tolerance (default 10−12 for double precision; relaxable to 10−5 for single precision and documented as such). A second Tier-0 test asserts that the across-seed sample standard deviation of H3 on a regular ensemble is below 10−12, i.e. that the value does not move with the seed. Tier 0 is the recommended minimal gate for every commit. III.BTier 1 — distributional baselines with calibrated bands Tier 1 checks ensemble-level statistics that are not exact but are stable. (i) The six-class ordering of |H3| must hold: regular > quasi-regular (WS) > modular (SBM) > homogeneous-random (ER) > scale-free (BA, HK) > geometric, with class means near −1.000, −0.995, −0.948, −0.918, −0.85, [−0.84,−0.80] respectively, each within a band (default ±0.02 on the mean, with the additional requirement that within-class spread be smaller than the between-class gap). (ii) The mean local stress must satisfy ⟨T⟩ = 0.324 ± δ and must be statistically indistinguishable across the nine topologies (a Kruskal–Wallis test should fail to reject; the reference reports p = 1.00). Tier-1 bands are calibrated from the across-seed dispersion of the reference data and are published with the suite so that implementers do not set them ad hoc. III.CTier 2 — mechanism-attribution signatures Tier 2 checks that the causal structure of the model is reproduced, using the variance-attribution pipeline of the companion mechanism audit. A random-forest predictor (500 trees, 5-fold cross-validation) trained on the pooled data over structural descriptors must attribute dominant importance to mean degree — near 0.94 for H3 and 0.83 for ⟨ψ⟩ — with a cross-validated R2 near 0.99 and a collapse-classifier AUC near 0.998, each within a band (default ±0.03 on importances and ±0.01 on R2/AUC). Tier 2 is the appropriate level at which to certify that a re-implementation, a surrogate, or an accelerated solver preserves not just the marginal statistics but the predictor structure. III.DTier 3 — perturbation-response signatures Tier 3 checks the controlled-comparison and perturbation behaviour. (i) At matched mean degree ⟨k⟩ = 8, the seven-family H3 spread must be near 0.195 — i.e. a residual topology effect of the same order as the uncontrolled spread must survive degree-matching. (ii) The topology-averaged collapse rate must rise sharply and monotonically through approximately {0.00, 0.11, 0.79, 0.97, 1.00} at ⟨k⟩ ∈ {6, 8, 12, 16, 24}. (iii) The collapse rate as a function of fneg at matched degree must be non-monotonic, peaking in [0.30, 0.50] and essentially vanishing for fneg ≥ 0.60. (iv) In the collapsed regime, degree-targeted node-thinning at the 10% level must reduce the collapse rate by an order of magnitude more than uniform edge addition deepens it (reference: −60 percentage points versus −5 to −7); the suite checks the sign and the order-of-magnitude asymmetry rather than the exact percentage, because the underlying companion module is still completing and the exact figure is expected to settle. Tier-3 criteria are deliberately expressed as signs, orderings, and order-of-magnitude relations precisely so that they remain valid as the reference figures are finalized. TIER 0 TIER 1 TIER 2 TIER 3 Exact / analytic H₃ = −1 seed-free machine precision minimal CI gate Distributional 6-class |H₃| order ⟨T⟩ = 0.324 KW p = 1.00 calibrated bands Mechanism RF imp. 0.94 R² = 0.99 AUC = 0.998 attribution match Perturbation residual ≈ 0.195 sharp k-boundary asym. bistability sign & magnitude Fig. 1. The four-tier conformance suite. Sharpness decreases left to right: Tier 0 is an exact, seed-free, machine-precision gate; Tiers 1–3 add distributional, mechanistic, and perturbation-response criteria with progressively looser tolerances expressed as bands, orderings, and sign/magnitude relations. IV.Reference implementation, data formats, and tolerances To make the disclosure enabling on any platform, this section gives a portable on-disk manifest schema, the observable contract in code, and a self-contained reference test for the apex oracle. None of these depend on any proprietary library; they are expressed in terms of generic primitives available in every scientific computing environment. IV.APortable result manifest (JSON) Every run emits a single JSON record. Conformance tooling reads only this record, so the simulator's internal language and architecture are irrelevant. The schema is intentionally minimal and FAIR-aligned [4] so that records are findable, accessible, interoperable, and reusable. // one JSON object per (ensemble, n, f_neg, seed) cell { "schema": "p02-benchmark/result/1.0", "ensemble": "random_regular", // one of the nine canonical generators "generator_params": { "d": 8 }, "n": 300, "k": 4, "f_neg": 0.25, "seed": 7, "H3": -1.0, // principal observable, eq. (1) "psi_mean": 0.41, // ⟨ψ⟩ "pi0": 0.00, // fraction zero-admissibility nodes "T_mean": 0.324, // ⟨T⟩ "collapse": false, // derived: psi_mean<0.02 AND pi0>0.95 "precision": "float64", // declares the FP tier for ε₀ "rng": { "name": "pcg64", "stream": 7 }, "env": { "impl": "my_sim 2.1", "host": "x86_64", "threads": 8 } } IV.BThe observable contract in code The principal statistic is computed identically everywhere from the two per-node vectors. The reference uses the rank-difference form of eq. (1); ties (which do not occur on regular ensembles for the reference observables) are handled by average ranks. import numpy as np from scipy.stats import rankdata def H3(T, psi): # Spearman rank correlation across nodes; eq. (1) rT, rP = rankdata(T), rankdata(psi) n = T.size d2 = np.sum((rT - rP) ** 2) return 1.0 - 6.0 * d2 / (n * (n * n - 1)) def collapse_flag(psi_mean, pi0): return (psi_mean < 0.02) and (pi0 > 0.95) IV.CThe apex oracle as a unit test The Tier-0 gate reduces to a few lines. It is seed-free in the sense that any seed must pass; the test below fixes one for determinism of the test itself, not of the result. def test_exact_regular_fixed_point(simulate): # 'simulate' returns per-node (T, psi) for a relaxed config. for ensemble in ("lattice", "random_regular"): for seed in (0, 1, 2): T, psi = simulate(ensemble=ensemble, n=300, k=4, f_neg=0.25, seed=seed) assert abs(H3(T, psi) + 1.0) <= 1e-12 # float64 # seed-independence: variance below machine noise vals = [H3(*simulate("random_regular", 300, 4, 0.25, s)) for s in range(8)] assert np.std(vals) <= 1e-12 IV.DTolerance philosophy Tolerances are layered by epistemic status. Exact quantities (Tier 0) get a floating-point tolerance keyed to the declared precision field, so that single-precision, double-precision, and extended-precision implementations are each judged fairly. Statistical quantities (Tiers 1–2) get bands calibrated from the reference across-seed dispersion and shipped with the suite. Perturbation signatures (Tier 3) are judged by sign, ordering, and order of magnitude, so that they remain meaningful while the underlying figures are finalized in the companion program. This layering is what lets the same suite serve both a fast pre-commit hook and a slow publication-grade qualification. V.Methods of use in practice (broad enumeration) The remainder of the value of an exact, freely available oracle lies in the breadth of contexts it serves. The following enumeration is deliberately wide; each subsection states the problem, the way the suite addresses it, and the concrete tier(s) and procedures involved. The list is intended to be exhaustive of the obvious uses and suggestive of the rest, so that the methodology is publicly documented for any of them. V.AContinuous-integration regression testing A simulator's repository runs the Tier-0 oracle as a pre-commit hook and on every pull request. Because the value is seed-free and a single small regular graph suffices, the gate runs in well under a second and yet trips on a large fraction of accidental regressions — a wrong neighbour loop, an off-by-one in the rank reduction, a sign error in the edge assignment, a silent change in the default k. Tiers 1–3 run nightly or on release branches as progressively heavier jobs. This converts “the numbers look about the same” into a hard pass/fail that survives refactoring. V.BCross-language and cross-implementation conformance When the same model is re-implemented in a second language (e.g. a Python prototype re-coded in C++, Rust, Julia, Fortran, or a GPU kernel), the only agreed contract between the two is the manifest schema of Sec. IV.A. Both implementations emit records; conformance tooling checks that both pass Tier 0 exactly and that their Tier-1/2/3 statistics agree within band. This is a conformance test in the standards sense: it certifies that two independent codes implement the same model, not merely two plausible ones. V.CFloating-point and reduction-order auditing Aggressive compiler flags, fused multiply–add, reassociation, and reordered parallel reductions change rounding. The apex oracle is an unusually clean probe of this, because on a regular graph the true answer is exactly representable (−1) and any drift is pure numerical error. By tightening ε0 and watching when Tier 0 begins to fail, a team can bound the numerical error introduced by a given build configuration, summation order, or accumulator width — and can do so without a separate, hand-derived analytic reference. V.DRandom-number-stream verification The seed-independence Tier-0 assertion is a direct test of the random pipeline. If a code accidentally couples streams across threads, reuses a global generator, or mis-jumps a counter-based generator, the regular-graph value typically stops being seed-invariant or drifts off −1 for some seeds. The rng field of the manifest records the generator and stream so that a divergence can be localized to the random layer rather than the dynamics. V.EParallel, GPU, distributed, and out-of-core reproducibility Porting a serial simulator to shared-memory threads, to a GPU, to a multi-node MPI decomposition, or to an out-of-core/streaming layout introduces many opportunities for subtle divergence: race conditions in the update, boundary-exchange bugs in a partitioned graph, or non-deterministic atomic reductions. Tier 0 is the cheapest possible canary for all of these, and Tiers 1–3 confirm that the accelerated code preserves the full statistical and mechanistic behaviour, not just the easy cases. The same procedure qualifies a containerized or cloud re-deployment of an existing code. V.FDifferentiable simulators and machine-learning surrogates Differentiable (autodiff) re-implementations and learned emulators — graph neural networks, neural operators, or other surrogate models trained to predict H3, ⟨ψ⟩, or the collapse flag from structural inputs — can be qualified at the appropriate tier. A surrogate that reproduces Tier-1 marginals but fails Tier 2 has learned the outputs without the mechanism; one that fails Tier 3 has not captured the perturbation response. The exact Tier-0 value is a useful hard constraint or regularization target for a differentiable surrogate, since it must hold by construction on regular inputs. V.GHardware, accelerator, neuromorphic, and probabilistic substrates The same suite qualifies non-conventional execution substrates: FPGA and ASIC implementations of the update rule, neuromorphic or analog hardware, fixed-point DSP pipelines, and probabilistic or sampling-based accelerators. For each, Tier 0 with a substrate-appropriate ε0 establishes that the substrate computes the right invariant, and the higher tiers establish that its noise characteristics do not distort the statistical and mechanistic behaviour. Because the oracle is exact in the ideal limit, it gives a principled way to state how much a noisy substrate departs from ground truth. V.HApproximate solvers, mean-field theories, and closure validation Analytic approximations — mean-field treatments, pair approximations, message-passing or cavity methods, and moment closures — can be checked against the same reference values. A closure that does not reproduce the exact regular-graph limit is immediately disqualified; one that reproduces Tier 0 and Tier 1 but not the matched-degree residual of Tier 3 has missed the residual structural physics. This makes the suite a target for theory development as well as for software, and gives an analytic sub-project a concrete acceptance bar. V.IEducation, courseware, and skills assessment Because Tier 0 is exact and cheap, it is an excellent teaching and assessment instrument. A course assignment to implement signed voter dynamics can be auto-graded against the oracle: a student's code is correct on the structurally hard part exactly when it returns −1. The tiered structure also teaches the difference between reproducing a number, reproducing a distribution, and reproducing a mechanism — a distinction that is otherwise hard to make concrete. V.JPeer review, artifact evaluation, and conformance certification Journals and conferences with artifact-evaluation tracks [1] can require that a submitted simulator emit conforming manifests and pass a declared tier; the tier becomes a citable claim (“Tier-2 conformant”). Independent re-runners can reproduce the manifests and confirm conformance without re-deriving anything. A simple, freely issuable conformance badge keyed to the highest passing tier lets the community recognize validated codes without any central authority, registration, or fee. V.KApplied multi-agent, swarm, and IoT consensus-controller qualification Engineered systems that run voter-like or consensus-like decision rules — robot swarms, sensor and IoT meshes, distributed-ledger gossip, autonomous-vehicle coordination, multi-agent reinforcement-learning controllers — can be qualified against the suite as a pre-deployment sanity layer. The Tier-3 degree–collapse boundary and the asymmetric bistability signature are exactly the behaviours an applied team needs to verify their controller exhibits (or deliberately avoids) before fielding it; the suite turns those behaviours into checkable acceptance tests rather than informal expectations. V.LProvenance, environment capture, and supply-chain reproducibility The manifest's env and rng fields make every conforming run self-describing, so a divergence years later can be attributed to a compiler, a library, a host architecture, or a generator change rather than to the science. Capturing the manifest alongside the environment specification is a lightweight, format-stable provenance record that survives re-execution on future systems and supports long-term, supply-chain-aware reproducibility. Common thread In every case the leverage comes from the same source: an exact, seed-free, machine-precision invariant that a correct implementation must reproduce and that a wide variety of defects break, surrounded by progressively looser statistical, mechanistic, and perturbation-response checks. The breadth of the list above is a property of the oracle, not of any one application. VI.Generalization and extension The construction is not specific to the reference configuration; it is a template. We document the natural extensions so that they too are publicly recorded. VI.AOther alphabet sizes and update rules The exact regular-graph identity is driven by the inverse-locking of the two rank orderings under degree-regularity, not by the specific value k = 4. The same Tier-0 oracle is therefore expected to hold for other alphabet sizes, and the suite admits a k parameter so that an implementation may be qualified at several alphabet sizes. The degree–collapse phase boundary is expected to shift with k, which itself becomes a higher-tier signature once characterized. Alternative nodewise update rules (Glauber–Ising, multi-state Potts, Kuramoto-type, contagion/SIR-type) define their own per-node stress/admissibility analogues; wherever those analogues inverse-lock under regularity, an exact Tier-0 oracle exists for them too, and the same four-tier scaffold applies. VI.BOther graph families and real-world substrates Any degree-regular family — lattices in higher dimension, circulants, cages, Cayley graphs of regular generating sets — furnishes the same exact oracle, so the suite can be broadened with additional Tier-0 ensembles at no loss of sharpness. For real-world (non-regular) graphs the exact oracle does not apply, but the matched-degree control of Tier 3 is precisely the tool that makes real-graph comparisons interpretable, so the suite extends to empirical-graph qualification by way of degree-matched comparison rather than by an exact identity. VI.COther observables and reductions The methodology is agnostic to the choice of the two per-node observables and to the rank reduction. Any pair of per-node scalars that inverse-lock under degree-regularity yields an exact Tier-0 oracle; any monotone-invariant correlation (Spearman, Kendall) yields the same extremal value. Implementers who define their own per-node diagnostics can reuse the entire scaffold by identifying the regularity-induced exact value of their chosen statistic. VII.Worked validation protocol and checklist The following protocol is the recommended end-to-end procedure. It is stated so that a practitioner can execute it without further reference. Wire the observable contract. Implement Ti, ψi, eq. (1), and the collapse flag exactly as in Sec. II.B; emit the manifest of Sec. IV.A. Run the Tier-0 gate. On the lattice and random-regular ensembles, assert |H3 + 1| ≤ ε0 and across-seed s.d. ≤ 10−12. Fix any failure here before proceeding; a Tier-0 failure almost always localizes to the update rule, the rank reduction, the sign assignment, or the numerical pipeline. Run Tier 1. Sweep the nine ensembles; verify the six-class |H3| ordering, ⟨T⟩ = 0.324 ± δ, and stress topology-invariance (Kruskal–Wallis fails to reject). Run Tier 2. Train the random forest (500 trees, 5-fold CV) on pooled structural descriptors; verify mean-degree importance near 0.94/0.83, R2 ≈ 0.99, collapse AUC ≈ 0.998. Run Tier 3. Verify the matched-degree residual near 0.195, the monotone sharp degree–collapse boundary, the non-monotonic frustration window peaking in [0.30,0.50], and the order-of-magnitude asymmetry of node-thinning versus edge-addition (check sign and magnitude, not the exact percentage). Declare a tier. Record the highest tier passed, the tolerances used, and the environment; publish the manifests. The declared tier is the citable conformance claim. One-line acceptance summary A simulator is minimally conforming if it returns H3 = −1 to floating-point tolerance, seed-independently, on the lattice and random-regular ensembles; it is fully conforming if it additionally reproduces the Tier-1 distributional baselines, the Tier-2 mechanism attribution, and the Tier-3 perturbation-response signatures within their published tolerances. VIII.Distribution, openness, and reuse posture This specification, including the reference values, the tier definitions, the manifest schema, the reference code, the protocol, and every extension described in Sec. VI, is placed in the public record and released for unrestricted use. Any party may implement, run, modify, embed, redistribute, teach, or build upon the benchmark, in any setting, commercial or non-commercial, without registration, credential, fee, or any grant of permission. The reference quantities are facts about a publicly described model; the test definitions and schemas are published here in full and enabling detail precisely so that they remain a common, freely available standard rather than a controlled resource. Conformance may be self-declared; no central authority issues or withholds it. Implementers are encouraged to publish their manifests openly under FAIR principles [4] so that conformance is independently checkable by anyone. IX.Limitations and scope Several caveats bound the claims of this disclosure and are stated plainly so that the benchmark is not over-trusted. (L1) The exact Tier-0 oracle is a property of degree-regular graphs; it certifies the structurally hard part of the model but does not by itself certify behaviour on irregular graphs, which is why Tiers 1–3 exist. (L2) The Tier-2 attribution and Tier-3 perturbation reference figures derive from companion runs of which the matched-degree audit is roughly half complete; for this reason Tier 3 is specified by sign, ordering, and order of magnitude rather than by exact figures, and its bands should be tightened only as the companion program finalizes [3]. (L3) The reference statistics here do not yet carry bootstrap confidence intervals or multiple-comparison corrections; the suite's Tier-1/2 bands are calibrated from across-seed dispersion and should be replaced with rigorously estimated bands when those become available. (L4) The construction is currently demonstrated for a single voter-type dynamics; its extension to other dynamics (Sec. VI) is well-motivated but must be separately verified for each new rule. None of these limitations affect the soundness of the Tier-0 oracle, which is exact and self-evidently checkable. X.Conclusions Computational network-dynamics codes have lacked sharp oracles, leaving validation to distributional eyeballing that hides a broad class of defects. The companion empirical program supplies an exact, seed-independent, machine-precision invariant — the inverse-locking of local stress and local admissibility on degree-regular graphs, giving H3 = −1 — and this disclosure turns that invariant into the apex of a complete, freely reusable, four-tier conformance suite, complete with a portable manifest schema, reference code, calibrated and sign/magnitude tolerances, a worked protocol, and a broad, explicitly enumerated set of practical uses spanning continuous integration, cross-substrate conformance, numerical and random-stream auditing, parallel and accelerated reproducibility, differentiable and surrogate-model validation, hardware qualification, approximate-solver validation, teaching, peer review, applied consensus-controller qualification, and provenance. The same template generalizes to other alphabet sizes, dynamics, graph families, and observables wherever an analogous regularity-induced exact value can be identified. The entire specification is placed in the public record for unrestricted use, so that any implementer on any platform can verify their software against a common standard that belongs to everyone. Data and code availability All reference values, tier definitions, the manifest schema, and the reference code are contained in full within this document, which is itself the public record of the specification. No external, proprietary, or access-controlled component is required to implement or use the benchmark. Companion work The reference quantities are drawn from the companion empirical study “Topology, Mean Degree, and Admissibility Degeneracy in the Local Stress–Admissibility Coupling of k-State Voter Dynamics on Networks” (Manuscript ID P02_TOPOLOGY_PHASE_MAP, v0.5, 4 June 2026), which establishes the exact regular-graph limit, the topology-invariant mean stress, the mean-degree variance attribution, and the perturbation-response signatures used here. Reuse This specification is released for unrestricted public use, study, implementation, modification, and redistribution. No registration, fee, or permission is required. References Association for Computing Machinery, Artifact Review and Badging (current version), describing reproducibility/replicability badging for computational artifacts. National Academies of Sciences, Engineering, and Medicine, Reproducibility and Replicability in Science (The National Academies Press, 2019). J. G. Melegh, “Topology, Mean Degree, and Admissibility Degeneracy in the Local Stress–Admissibility Coupling of k-State Voter Dynamics on Networks,” preprint P02_TOPOLOGY_PHASE_MAP v0.5 (2026). M. D. Wilkinson et al., “The FAIR Guiding Principles for scientific data management and stewardship,” Scientific Data 3, 160018 (2016). C. Castellano, S. Fortunato, and V. Loreto, “Statistical physics of social dynamics,” Reviews of Modern Physics 81, 591 (2009). P. Clifford and A. Sudbury, “A model for spatial conflict,” Biometrika 60, 581 (1973); R. A. Holley and T. M. Liggett, Annals of Probability 3, 643 (1975). D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world networks,” Nature 393, 440 (1998). A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science 286, 509 (1999). P. Holme and B. J. Kim, “Growing scale-free networks with tunable clustering,” Physical Review E 65, 026107 (2002). S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science 296, 910 (2002). L. Breiman, “Random forests,” Machine Learning 45, 5 (2001). P02_BENCHMARK_SUITE · v1.0 · 5 June 2026 · open technical disclosure.

Found an issue? Give us feedback