
Production pipeline source code, database schema, migrations, and Kubernetes deployment manifests accompanying Heasman and Eglington (2026), a methodology paper describing a reproducible Python and PostgreSQL pipeline for assembling domain-specific text corpora from the xDD Snippet API. Includes pre-flight hit checking, in-stream Counter pruning for memory-bounded streaming, pool-segregated parallel workers, and in-database information-theoretic statistics.
