
Initial public release of the Latent Structure Benchmark (LSB). LSB applies Cultural Domain Analysis (CDA) elicitation protocols — free listing, pile sorting, pile interview — to large language models as if they were informants. It surfaces the corpus lens: the latent categorical structure of a training corpus, refracted through training and alignment, made visible by structured elicitation. LSB is not a capability benchmark, not a leaderboard, and not a ranking. This release includes: The open-data bundle (CC0 1.0 Universal, 1.55 GB): https://huggingface.co/datasets/AILLM1999/latent-structure-benchmark The reproducible build script (scripts/build_db.py) and full data dictionary (docs/DATA_DICTIONARY.md) The dashboard at https://cogstructurelab.com Every method-defining document under docs/ and ARCHITECTURE.md
