geonlp-pipeline-paper-2026: A Reproducible Pipeline for Geoscientific Text Mining

Heasman, Drew; Eglington, Bruce

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software

Data sources: ZENODO

geonlp-pipeline-paper-2026: A Reproducible Pipeline for Geoscientific Text Mining

integration_instructionsResearch softwarekeyboard_double_arrow_right Software Under curationPublisher:Zenodo

Authors: Heasman, Drew; Eglington, Bruce;

doi: 10.5281/zenodo.20546250

geonlp-pipeline-paper-2026: A Reproducible Pipeline for Geoscientific Text Mining

- Summary

Abstract

Production pipeline source code, database schema, migrations, and Kubernetes deployment manifests accompanying Heasman and Eglington (2026), a methodology paper describing a reproducible Python and PostgreSQL pipeline for assembling domain-specific text corpora from the xDD Snippet API. Includes pre-flight hit checking, in-stream Counter pruning for memory-bounded streaming, pool-segregated parallel workers, and in-database information-theoretic statistics.

Found an issue? Give us feedback