
Frozen dataset and benchmark release for the preprint A Data-Centric Audit of Estonian Water-Quality Open Data. The release contains a canonical 69,447-row corpus derived from Terviseamet open data, deterministic label-reproducibility audit buckets, an 80/20 temporal split, regenerated baseline model metrics, LightGBM test predictions, SHAP summaries, Croissant metadata, and a dataset supplement PDF. The upstream measurements remain Terviseamet open data. This derived release does not claim validation, endorsement, or official response from Terviseamet; provider response is pending.
