
This dataset contains the source data, model outputs, and supplementary materials for the study on integrating citizen science (GBIF) and professional survey (sPlot) data for global plant trait mapping. NOTE: All trait values are represented as power-transformed (Yeo-Johnson) community-weighted means. Primary manuscript: Lusk, D., Wolf, S., Svidzinska, D. et al. Crowdsourced biodiversity monitoring fills gaps in global plant trait mapping. Nat Commun 17, 1203 (2026). https://doi.org/10.1038/s41467-026-68996-y Dataset Contents SourceData.zip (692 MB) A compressed archive containing the main source data files: SourceData.xlsx An Excel workbook with 5 sheets: 1. all_results (1,110 rows): Model performance metrics across all traits, resolutions, and data sources - Resolutions: 1km, 22km, 55km, 111km, 222km - Trait sets: SCI (sPlot), CIT (GBIF), COMB (combined) - Metrics: Pearson's r, R², RMSE, nRMSE, MAE, MedAE 2. all_biome_results (777 rows): Per-biome model performance and uncertainty metrics - 7 biomes (Boreal, Desert, Mediterranean, Temperate, Tropical, Tundra, Montane) - Includes mean coefficient of variation (COV) and area of applicability (AOA) fraction 3. feature_importance (137,678 rows): Permutation-based feature importance scores - 150 environmental predictor variables from 5 datasets - Importance scores with standard deviations and p-values 4. splot_gbif_correlation (185 rows): Correlation between sPlot and GBIF sparse trait grids - Pearson correlation coefficients at each resolution 5. trait_id_mapping (37 rows): Mapping from trait IDs to human-readable names spatial_folds.parquet (180 MB) Spatial cross-validation fold assignments for all 37 traits (~95.6 million location-trait combinations).- Columns: x, y, fold, trait_id- Coordinates in EPSG:6933 (World Equidistant Cylindrical) cv_obs_vs_pred.parquet (566 MB) Cross-validation observed vs. predicted values (~35.6 million observations).- Columns: x, y, obs, pred, trait_id, trait_set_abbr- Used for generating observed vs. predicted scatter plots SCI_CIT_sparse_maps_1km.zip (7.2 GB) 1-km resolution sparse community-weighted mean (CWM) trait maps derived from:- gbif/: 37 GeoTIFF files from GBIF citizen science observations (CIT)- splot/: 37 GeoTIFF files from sPlot vegetation survey data (SCI) Each GeoTIFF contains 6 bands:1. Mean trait value2. Standard deviation3. Median4. 5th percentile5. 95th percentile6. Observation count Coordinate reference system: EPSG:6933 (World Equidistant Cylindrical) Data Sources - sPlot: Global vegetation plot database (Bruelheide et al., 2019)- GBIF: Global Biodiversity Information Facility occurrence records- TRY: Plant trait database (Kattge et al., 2020) File Formats - .xlsx: Microsoft Excel Open XML Format (readable with Excel, LibreOffice, pandas)- .parquet: Apache Parquet columnar format (readable with pandas, R arrow, etc.)- .tif: Cloud Optimized GeoTIFF (readable with GDAL, rasterio, QGIS, etc.) Contact Daniel Lusk
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
