
MeteoGalicia Interpolation Case Study This record packages inputs, documentation, scripts and results for the HF-EOLUS Task 3 interpolation experiment over the Vilano HF-radar footprint (open Atlantic, NW Iberia; radar context in [1]). The dataset was produced with the hf-eolus-interpolation-toolkit [2] by ingesting MeteoGalicia's historical WRF_HIST d03 (4 km) winds, refining the grid by a factor of four (native 4 km → ~1 km effective resolution), fitting kriging/IDW models and exporting the resulting GeoParquet layers together with metadata, plots and comparisons against the Puertos del Estado Vilano buoy. Due to size, the interpolation outputs live in two Zenodo supplements; this record points to those packages while keeping the case assets. What’s inside this README: How to reconstruct the catalog from Zenodo supplements. A quick snapshot of the dataset (coverage, sources, counts). Where everything lives in the repository (catalogs, reports, overrides). Key interpolation stats (mix, validation). Comparisons against the PdE buoy and ANN predictions. Makefile knobs and helper targets for reconstruction. Table of contents How to reconstruct this case study Dataset highlights Storage and catalog layout Interpolation mix Validation summary PdE Vilano buoy comparison ANN vs PdE comparisons (Vilano_buoy) Reproducing or extending the run Inspecting the GeoParquet data (Dockerised DuckDB) How to reconstruct this case study You can rebuild the full STAC catalog and ancillary files from the published ZIPs on Zenodo [3][4][5] using the case-study Makefile (URLs are already set in the companion Makefile): Data supplement 1 of 2: see [4] (contains meteogalicia_interpolation_2018.zip, meteogalicia_interpolation_2019.zip, meteogalicia_interpolation_2020.zip). Data supplement 2 of 2: see [5] (contains meteogalicia_interpolation_2021.zip, meteogalicia_interpolation_2022.zip, meteogalicia_interpolation_2023.zip). Simplest path: run make reconstruct inside case_study/ and the catalog will be downloaded/unpacked for you. For finer control, see the Makefile section below (targets download-stac, unpack-stac, etc., and overridable variables). Dataset highlights Aspect Details Spatial domain AOI polygon stored in area_boundary.geojson, centred on the PdE Vilano buoy and matching the CONAS HF-radar footprint. Temporal coverage (GeoParquet assets) 2018-01-01 00:00 UTC → 2023-12-31 23:00 UTC (hourly partitions under case_study/catalogs/meteogalicia_interpolation/assets/parquet). Temporal coverage published in STAC 2018-01-01 00:00 UTC → 2023-12-31 23:00 UTC (complete hourly Items under case_study/catalogs/meteogalicia_interpolation/items/parquet). MeteoGalicia inputs WRF_HIST d03 (4 km deterministic mesh, 10 m winds) downloaded from the THREDDS service. Interpolation setup 4× refinement grid, 20 km variogram cutoff, 1 km variogram width, variogram model chosen per hour between Gaussian/Exponential/Spherical; regression kriging/universal kriging evaluated; IDW used as fallback when RSR deteriorates. Total wind vectors 902,615,976 records (u, v, wind_speed, wind_dir, diagnostics). Unique grid nodes 17,176 nodes spanning the refined mesh; stored as node_id. Reference hold-out test_points.csv lists the PdE Vilano buoy (node_id = "Vilano_buoy", lon -9.21, lat 43.5). It is excluded from cross-validation/test splits but still predicted (as its own node) to enable external checks like the buoy comparisons. Buoy comparison window 2018-05-01 00:00 UTC → 2019-02-01 23:00 UTC (3,379 matched samples against the PdE catalog in case_study/catalogs/pde_vilano_buoy). Toolkit release hf-eolus-interpolation-toolkit v0.2.0 ([2]). Storage and catalog layout area_boundary.geojson – Polygon delimiting the AOI given to the ingestion workflow. test_points.csv – CSV with points for external validation (see Validation summary below); currently only the Vilano buoy. catalogs/catalog.json – STAC root pointing to the MeteoGalicia collection and the PdE buoy catalog. catalogs/pde_vilano_buoy – Input STAC catalog for the Puertos del Estado Vilano buoy (reference observations). catalogs/sar_range_final_pivots_joined – STAC catalog with ANN wind inversion outputs (Sentinel-1 SAR) including the Vilano_buoy node. catalogs/vilano_node_subset – STAC subset derived from the interpolation catalog, containing only the Vilano_buoy node (data + metadata + plots). catalogs/meteogalicia_interpolation/collection.json – Collection metadata (extent, providers, citation). Contents (fully populated for 2018–2023): assets/parquet – Hourly GeoParquet partitions organised as year=YYYY/month=MM/day=DD/hour=HH/data.parquet. assets/metadata – JSON sidecars with CRS, provenance, variogram parameters and AOI footprints. assets/plots – Diagnostic PNGs (quadrant grid + empirical variograms) matching each hour. items/parquet, items/metadata, items/plots – STAC Items per asset type; each hourly ID lives inside items///.json. reports/buoys/vilano – CSV metrics, Markdown summary and PNG figures for the PdE comparison. reports/ann – CSVs and Markdown summaries for the ANN vs PdE comparisons (full, filtered, confident subsets) plus supporting aligned CSV. stac_overrides/collection_override.json and stac_overrides/item_override.json – JSON fragments injected into the published collection/items during catalog generation. Makefile options (case_study/Makefile) Variables (can be overridden on the CLI, e.g. make download-stac STAC_ZIP_URLS="..."): STAC_ZIP_URLS: space-separated list of STAC ZIP URLs to download (defaults to the 2018/2019/2020 ZIPs from zenodo.17670561). STAC_ZIP_DIR: directory to store the downloaded ZIPs (default stac_zips). STAC_DEST: destination folder for the unzipped catalog (default catalogs/meteogalicia_interpolation). CASE_ZIP_URL: optional URL of an ancillary case-study package (unset by default). CASE_ZIP_FILE: local filename for the ancillary ZIP (default case_study_package.zip). Targets: make download-stac: downloads the ZIPs listed in STAC_ZIP_URLS into STAC_ZIP_DIR. make unpack-stac: unzips each file in STAC_ZIP_DIR into STAC_DEST (creates it if missing). make download-case: downloads the ancillary package from CASE_ZIP_URL into CASE_ZIP_FILE. make unpack-case: unzips CASE_ZIP_FILE into the current directory. make reconstruct: runs download-stac + unpack-stac; if CASE_ZIP_URL is set, also runs download-case + unpack-case. Interpolation mix The following summaries give a quick sense of how the dataset is composed. First, we break down rows by origin (interpolated grid, retained originals, and the single test point), then we describe which variogram model was selected per hour for each wind component. Samples per source Source Samples % of rows Interpolated grid nodes 846,176,202 93.75 Original WRF nodes retained 56,387,223 6.25 Hold-out/test point emissions 52,551 0.01 Most rows stem from the refined grid (≈94%), with only ~6% coming from original WRF nodes kept for completeness and a negligible fraction from the Vilano test point, confirming the dataset is dominated by interpolated predictions. Model selection across hourly runs Component Gaussian Exponential Spherical U component 32,445 hours (61.7 %) 16,033 hours (30.5 %) 4,073 hours (7.8 %) V component 28,152 hours (53.6 %) 16,892 hours (32.1 %) 7,507 hours (14.3 %) Gaussian models are chosen most often for both components, with Exponential as the main alternative and Spherical used in a smaller share of hours. Validation summary This section condenses the quality checks performed inside the interpolation workflow: k-fold cross-validation on the MeteoGalicia input points (no buoy data involved) and a held-out test split drawn from the same MeteoGalicia sample. Metrics are computed per hour and then averaged across all timestamps to give a single headline for each wind component. The test_points.csv entries (e.g., the PdE Vilano buoy) are excluded from these splits but still predicted as dedicated nodes, so they do not influence the internal CV/test metrics; they are assessed separately in the buoy/ANN comparisons. Metric U component V component Cross-validation RSR 0.148 0.169 Cross-validation bias (m/s) +0.000 +0.000 Hold-out RSR (test) 0.080 0.089 Hold-out bias (m/s) +0.000 -0.002 Figures are based on the hourly diagnostics embedded in each partition (cv_* and test_* columns) and computed with duckdb/duckdb:latest so they remain reproducible. PdE Vilano buoy comparison Direct comparisons against the Puertos del Estado Vilano buoy gauge how well the interpolated winds follow in-situ observations at 10 m (buoy measurements are height-corrected from 3 m to 10 m using a neutral log profile). The tables below summarise speed and component errors over the full overlap window. Observed-versus-predicted metrics (37,541 matched hours) are stored in reports/buoys/vilano: Samples RMSE (m/s) MAE (m/s) Bias (m/s) Corr R² Scatter index Dir. RMSE (deg) Comp. corr 37,541 5.11 4.06 +0.74 0.14 -0.77 0.71 87.48 0.29 Component-level scores: Variable RMSE (m/s) RSR Bias (m/s) Wind speed 5.11 1.33 +0.74 U component 6.39 1.17 +0.32 V component 7.16 1.22 -0.07 The Markdown report (vilano_report.md) and the figures (wind_speed_timeseries.png, wind_speed_scatter.png) provide quick visual checks and cite the exact observation/prediction windows (2018-01-01 → 2023-12-31). ANN vs PdE comparisons (Vilano_buoy) We benchmark both the ANN wind inversion (case_study/catalogs/sar_range_final_pivots_joined, [1]) and the interpolation against the in-situ PdE buoy, using the aligned PdE/interpolation pairs (reports/buoys/vilano/aligned_records.csv) as a shared reference frame. We look at three progressively tighter subsets: A full overlap set (9,926 hours) with every timestamp shared by buoy and ANN. A range-filtered set (5,305 hours) that keeps only buoy winds inside the ANN’s operating band [5.7, 17.8] m/s. An ANN-confident set (3,701 hours) that applies the range filter and also requires the ANN to mark its prediction as in-range and consistent. Headline speed metrics (see ann_vs_pde_metrics_speed.csv): Subset Model Samples RMSE (m/s) Bias (m/s) Corr Dir. RMSE (deg) full ANN 9,926 3.82 +2.02 0.45 72.51 full Interp 9,926 5.23 +1.58 0.12 87.74 filtered ANN 5,305 2.66 +0.03 0.39 53.74 filtered Interp 5,305 4.54 -0.73 0.10 81.75 ann_confident ANN 3,701 2.78 +0.12 0.34 42.47 ann_confident Interp 3,701 4.61 -1.23 0.11 82.29 Component metrics are available in ann_vs_pde_metrics_components.csv, and a narrative summary in ann_vs_pde_summary.md. Across subsets, the ANN consistently outperforms the interpolation on speed metrics (lower RMSE/bias and better directional RMSE/correlation), and the gap widens as we tighten the filters to the ANN’s valid operating range. Use the filtered or ANN-confident subsets when focusing on ANN skill; the full set highlights that interpolation remains competitive at broader ranges but carries a positive bias. Reproducing or extending the run Clone & restore the toolkit (outside of this repository) and make sure renv::restore() has been executed. The AWS prerequisites described in the root README.md also apply here. Reconstruct locally via Makefile (recommended): run make reconstruct inside case_study/ to download/unpack the published STAC ZIPs and the ancillary package. Run the pipeline wrapper if you need to regenerate from S3 instead of the ZIPs: ./run_pipeline_case_study.sh \ --toolkit-dir /path/to/hf-eolus-interpolation-toolkit \ --profile hf_eolus --region eu-west-3 --bucket jlhc-hf-eolus \ --region-name VILA-PRIO-HF \ --ingest-start 2018-01-01 --ingest-end 2023-12-31 \ --interp-start 2018-01-01 --interp-end 2023-12-31 \ --res-factor 4 --cutoff-km 20 --width-km 1 \ --stac-incremental --stac-by-year --stac-years 2018,2019,2020,2021,2022,2023 The wrapper sequentially calls every toolkit script (tests, IAM, ingestion, Batch interpolation, STAC builder) unless you opt into --skip-* flags. To keep the committed STAC untouched while verifying a run, omit --force-overwrite-stac. When you need to append newly generated partitions, keep --stac-incremental so only the missing items are added. Use --stac-by-year to avoid syncing the full archive at once; --stac-limit-years can reduce I/O when loading existing Items during incremental rebuilds. Buoy comparisons are enabled automatically via case_study/buoy_comparison_config.json; disable them with --skip-buoy-comparison or by setting PIPELINE_ENABLE_BUOY_COMPARISON=0. At the end the script prints whether hashes of the regenerated catalog match the tracked files and where the buoy reports were written. Inspecting the GeoParquet data (Dockerised DuckDB) Per project rules, both quicklooks and audits should rely on the official DuckDB image. Examples: # Total vectors docker run --rm -v "$(pwd)":/work -w /work duckdb/duckdb:latest duckdb -c " SELECT count(*) AS rows FROM read_parquet('local_sync/year=*/month=*/day=*/hour=*/data.parquet'); " # Unique grid nodes docker run --rm -v "$(pwd)":/work -w /work duckdb/duckdb:latest duckdb -c " SELECT count(DISTINCT node_id) AS unique_nodes FROM read_parquet('local_sync/year=*/month=*/day=*/hour=*/data.parquet'); " # Hourly diagnostics (averaged later) docker run --rm -v "$(pwd)":/work -w /work duckdb/duckdb:latest duckdb -c " WITH hourly AS ( SELECT timestamp, max(cv_rsr_u) AS cv_rsr_u, max(test_rsr_u) AS test_rsr_u FROM read_parquet('local_sync/year=*/month=*/day=*/hour=*/data.parquet') GROUP BY 1 ) SELECT avg(cv_rsr_u) AS avg_cv_rsr_u, avg(test_rsr_u) AS avg_test_rsr_u FROM hourly; " Adapt the queries as needed for additional diagnostics; remember to clean up any temporary files the container might leave behind before committing. Acknowledgements This work has been funded by the HF-EOLUS project (TED2021-129551B-I00), financed by MICIU/AEI /10.13039/501100011033 and by the European Union NextGenerationEU/PRTR - BDNS 598843 - Component 17 - Investment I3. Members of the Marine Research Centre (CIM) of the University of Vigo have participated in the development of this repository. The Government of Galicia, through MeteoGalicia, runs and publishes outputs from its WRF numerical model at various resolutions. This resource uses the 4 km domain (named 'd03') from the WRF_HIST historical dataset. The files were downloaded from MeteoGalicia's THREDDS server (WRF_HIST/d03 directory). MeteoGalicia makes these data freely available; the only condition for using them is to cite MeteoGalicia as the data source. We thank MeteoGalicia for facilitating open access to these atmospheric model simulations. We thank Puertos del Estado for providing public access to the buoy data through https://portus.puertos.es Disclaimer This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software. References [1] Herrera Cortijo, J. L., Fernández-Baladrón, A., Rosón, G., Gil Coto, M., Dubert, J., & Varela Benvenuto, R. (2025). HF-EOLUS. Task 2. HF-Radar Wind Inversion Models and Results for VILA and PRIO Stations. Zenodo. https://doi.org/10.5281/zenodo.17131227[2] Herrera Cortijo, J. L., Fernández-Baladrón, A., Rosón, G., Gil Coto, M., Dubert, J., & Varela Benvenuto, R. (2025). HF-EOLUS Wind Interpolation Toolkit for MeteoGalicia models output (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.17598353[3] Herrera Cortijo, J. L., Fernández-Baladrón, A., Rosón, G., Gil Coto, M., Dubert, J., & Varela Benvenuto, R. (2025). HF-EOLUS. Task 3. MeteoGalicia Wind Interpolation Outputs. Zenodo. https://doi.org/10.5281/zenodo.17490873[4] Herrera Cortijo, J. L., Fernández-Baladrón, A., Rosón, G., Gil Coto, M., Dubert, J., & Varela Benvenuto, R. (2025). HF-EOLUS. Task 3. MeteoGalicia Wind Interpolation Outputs. Data Supplement 1 of 2 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17670561[5] Herrera Cortijo, J. L., Fernández-Baladrón, A., Rosón, G., Gil Coto, M., Dubert, J., & Varela Benvenuto, R. (2025). HF-EOLUS. Task 3. MeteoGalicia Wind Interpolation Outputs. Data Supplement 2 of 2 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17678282[6] Fernández-Baladrón, A., Varela Benvenuto, R., & Herrera Cortijo, J. L. (2020). Interrelationships between Surface Circulation and Wind in the Ría de Vigo. Zenodo. https://doi.org/10.5281/zenodo.17490675
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
