
Summary Description Fourteen-month deployment of four Sensirion SCD41 low-cost CO₂ sensors at the SMEAR Estonia station, Järvselja, Estonia. Three sensors (A, B, C) operated at 30 m on the atmospheric sensing mast; the fourth (Sensor D; SCT1) at 2 m. One LGR reference instrument provided ground-truth measurements at 30 m. Sensors operated under adverse environmental conditions outside manufacturer specifications. The dataset includes 10-minute aggregated readings, per-sensor with merged outputs, quality-control flags, and validation receipts. Intended as an adverse-conditions benchmark for evaluating robust calibration methods for low-cost environmental sensors. Depicted preview Taylor diagram summarising SCD41 sensor performance relative to the LGR reference. Radial distance encodes standard deviation (ppm); angular position encodes Pearson correlation; dashed arcs show centred RMSD. Arrows indicate sensors whose standard deviation exceeds the plot radius. Data Dictionary Timestamps All timestamps are in Coordinated Universal Time (UTC), formatted as ISO 8601 strings in CSV files (YYYY-MM-DD HH:MM:SS) and as datetime64[ns] in Parquet files. Each timestamp marks the start of a non-overlapping 10-minute aggregation window aligned to the Unix epoch. Missing Data Missing values appear as empty cells in CSV and as NaN in Parquet. No interpolation or gap-filling is applied. The merged file uses an outer join: if a sensor has no data at a given timestamp, all its columns are NaN for that row. Per-Sensor Files: SCD41 (Sensors A, B, C, D) File pattern: {SENSOR}_10min.csv, {SENSOR}_{YYYY}_{MM}.csv Sensors: CO2_SCT1_2M (Sensor D, 2 m), CO2_30M_A, CO2_30M_B, CO2_30M_C (30 m) Column Type Unit Description timestamp datetime UTC Start of 10-minute window co2_mean float64 ppm Mean CO₂ concentration co2_std float64 ppm Standard deviation of CO₂ within window co2_count int64 — Number of deduplicated raw samples in window temp_mean float64 °C Mean on-chip temperature (SHT4x sensor) temp_std float64 °C Standard deviation of temperature temp_count int64 — Number of temperature samples humidity_mean float64 % RH Mean relative humidity (SHT4x sensor) humidity_std float64 % RH Standard deviation of humidity humidity_count int64 — Number of humidity samples co2_qc string — Quality-control flag for CO₂ temp_qc string — Quality-control flag for temperature humidity_qc string — Quality-control flag for humidity 13 columns per file. At the SCD41 native sampling rate of 0.2 Hz, a full 10-minute window contains up to 120 deduplicated samples. Per-Sensor Files: LGR Reference File pattern: LGR_30M_10min.csv, LGR_30M_{YYYY}_{MM}.csv Each measured variable produces four columns following the pattern {variable}_mean, {variable}_std, {variable}_count, and (where applicable) {variable}_sd for pooled standard deviation of the instrument-reported uncertainty. Column pattern Type Unit Description CO2_dry_* float64 ppm Dry-air CO₂ mole fraction (primary reference) CO2_ppm_* float64 ppm Wet-air CO₂ concentration CH4_dry_* float64 ppm Dry-air CH₄ mole fraction CH4_ppm_* float64 ppm Wet-air CH₄ concentration H2O_ppm_* float64 ppm Water vapour concentration GasP_torr_* float64 torr Internal gas cell pressure GasT_C_* float64 °C Internal gas cell temperature AmbT_C_* float64 °C Enclosure ambient temperature RD0_us_* float64 µs Ring-down time (channel 0) RD1_us_* float64 µs Ring-down time (channel 1) quality_* float64 0–100 Instrument quality indicator Fit_Flag_* float64 — Spectral fit quality (3 = good) co2_qc string — QC flag for CO₂ range quality_qc string — QC flag for instrument quality and fit At the configured logging interval of approximately 120 s, each 10-minute window contains at most 5 LGR samples. Merged File File pattern: SMEAR_EE_CO2_merged.csv, SMEAR_EE_CO2_{YYYY}_{MM}_merged.csv 100 columns. All sensors are aligned to a common 10-minute timestamp index via outer join. Column Naming Convention Merged columns are prefixed with the lowercase sensor name: {sensor}_{variable}_{statistic} Examples: co2_sct1_2m_co2_mean → Sensor D (2 m), CO₂, mean co2_30m_a_temp_std → Sensor A (30 m), temperature, standard deviation lgr_30m_CO2_dry_mean → LGR reference, dry CO₂, mean SCD41 Sensor Columns (×4 sensors, 12 columns each) Each SCD41 sensor contributes 12 columns: co2_mean, co2_std, co2_count, temp_mean, temp_std, temp_count, humidity_mean, humidity_std, humidity_count, co2_qc, temp_qc, humidity_qc — all prefixed with the sensor name. LGR Columns (47 columns) The LGR contributes 48 columns. Ten measurement variables (CO₂, CH₄, H₂O, pressure, temperatures, ring-down times) each produce four columns (mean, std, count, and pooled SD of the instrument-reported uncertainty); quality and Fit_Flag produce three columns each (mean, std, count — no instrument-reported SD exists for these). Two QC flag columns (co2_qc, quality_qc) complete the set. Pooled SD columns carry the suffix _sd__pooled_sd (double underscore). Cluster Statistics (3 columns) Computed from the three co-located 30 m SCD41 sensors (A, B, C): Column Type Unit Description scd41_30m_cluster_co2_mean float64 ppm Mean CO₂ across reporting 30 m sensors scd41_30m_cluster_co2_std float64 ppm Standard deviation across reporting 30 m sensors scd41_30m_cluster_co2_count int64 — Number of 30 m sensors reporting (0–3) Cluster statistics use only sensors with valid data at each timestamp. If one sensor reports, std is NaN. Quality-Control Flags Flag Applies to Meaning OK All sensors Value within expected range BELOW_RANGE All sensors Below minimum threshold (CO₂ < 300 ppm, T < −40 °C, RH < 0 %) ABOVE_RANGE All sensors Above maximum threshold (CO₂ > 1000 ppm, T > 60 °C, RH > 100 %) SPIKE SCD41 only Absolute change > 200 ppm between consecutive raw samples NO_DATA All sensors No valid samples in aggregation window LOW_QUALITY LGR only Quality indicator < 95 BAD_FIT LGR only Fit flag ≠ 3 We recommend filtering on qc_flag = 'OK' for initial analyses. In this deployment, only OK, BELOW_RANGE, ABOVE_RANGE, and NO_DATA appear in the output. The remaining flags (SPIKE, LOW_QUALITY, BAD_FIT) are defined by the pipeline but were not triggered by any record in the current dataset. File Formats Format Extension Compression Notes CSV .csv None Human-readable; timestamps as ISO 8601 strings Parquet .parquet Snappy Columnar binary; preserves native types; recommended for large-scale analysis CSV and Parquet files for the same time period contain identical data. Parquet files are typically 5–10× smaller. File Naming Pattern Example Contents {SENSOR}_10min.csv CO2_30M_A_10min.csv Full concatenated time series {SENSOR}_{YYYY}_{MM}.csv CO2_30M_A_2025_03.csv Single-month extract {SENSOR}_{YYYY}_{MM}.parquet CO2_30M_A_2025_03.parquet Monthly Parquet {SENSOR}_uptime.csv CO2_30M_A_uptime.csv Daily uptime statistics SMEAR_EE_CO2_merged.csv — All sensors, full period SMEAR_EE_CO2_{YYYY}_{MM}_merged.* SMEAR_EE_CO2_2025_03_merged.parquet Monthly merged
