
Overview This Zenodo release provides a global facility-level solar photovoltaic (solar PV) inventory with facility-scale energy generation and aerosol-related loss estimates, prepared alongside the manuscript: "Coal plants persist as a large barrier to the global solar energy transition" Nature Sustainability. The dataset was generated using the framework described in the manuscript and Methods. In brief, a three-step workflow was used: (1) identify candidate PV facilities globally by combining existing inventories, crowd-sourced records, and a CNN-based scan of Sentinel-2 imagery; (2) extract precise panel footprints from confirmed sites using SAM-based segmentation; and (3) integrate the resulting footprints with MERRA-2 atmospheric reanalysis and a validated PV model to estimate facility-level generation and losses from clouds and aerosols. The release includes PV facility footprints and core attributes:PV_ID, latitude, longitude, country, year, area_m2. In the main inventory files, year is the PV facility build/commissioning year (installation year), estimated from Sentinel-2 time-series classification as described in the manuscript Methods. It contains two complementary data components: A global geospatial PV facility inventory (`.gpkg`, `.csv`, `.parquet`). Annual facility-level PV generation/loss tables (PV_facility_generation_year_YYYY.csv, currently 2017-2023). Package Contents `global_pv_facility_inventory.gpkg` Layer: `global_pv_facility_inventory` Geometry: `MultiPolygon` CRS: `EPSG:4326` (WGS 84) `global_pv_facility_inventory.csv` Attribute-only table (no geometry) `global_pv_facility_inventory.parquet` GeoParquet (geometry + attributes) `Year-specific facility-level generation/loss tables (top-level CSV files)` Generated to support the manuscript analysis of facility-level PV energy generation and losses. For each year-specific file, analysis includes only facilities installed by that year; therefore facility counts differ across years. Year-specific facility-level generation/loss tables: `PV_facility_generation_year_2017.csv` `PV_facility_generation_year_2018.csv` `PV_facility_generation_year_2019.csv` `PV_facility_generation_year_2020.csv` `PV_facility_generation_year_2021.csv` `PV_facility_generation_year_2022.csv` `PV_facility_generation_year_2023.csv` Each file includes: Core facility columns: `PV_ID`, `latitude`, `longitude`, `country`, `year`, `area_m2` `power_POA (kWh)`: power generation estimated from plane-of-array (POA) irradiance. `power_POA_clr (kWh)`: POA-based power generation under clear-sky (cloud-free) conditions. `power_POA_cln (kWh)`: POA-based power generation under clean-sky (aerosol-free) conditions. `aerosol_loss (kWh)`: facility-level aerosol-related energy loss, computed as `power_POA (kWh) - power_POA_cln (kWh)`. How to Use This Dataset (Technical) If you need geometry, use: `global_pv_facility_inventory.gpkg` (GIS-friendly) `global_pv_facility_inventory.parquet` (fast analytics with geometry) If you need tabular attributes only, use: `global_pv_facility_inventory.csv` For energy generation/loss analysis, use: `PV_facility_generation_year_YYYY.csv` (currently 2017-2023) Linkages: `PV_ID` is the facility identifier across all files. `year` supports year-specific filtering and aggregation. How This Dataset Is Used in the Paper To map and quantify global facility-level PV deployment (location, footprint area, and installation year). To estimate facility-level PV generation from POA irradiance under: all-sky conditions (`power_POA (kWh)`), clear-sky conditions (`power_POA_clr (kWh)`), clean-sky conditions (`power_POA_cln (kWh)`). To quantify aerosol-related generation loss at facility level (`aerosol_loss (kWh)`), then aggregate by geography/year for manuscript analysis. Potential Reuse in Other Research National/regional assessments of aerosol impacts on PV generation. Benchmarking climate and air-quality penalties for existing PV fleets. Integration with grid, policy, or emissions datasets for energy-transition studies. Geospatial analyses linking PV siting patterns with environmental and socioeconomic variables. Snapshot Statistics Facilities: 140,945 Countries: 181 Inventory years: 2017-2024 Generation/loss tables: 2017-2023 Latitude range: 41.61° S to 68.38° N Contact Dr. Rui Song: (rui.song@physics.ox.ac.uk); or (rui.song90@gmail.com)
remote sensing, machine learning, solar photovoltaic, aerosols and clouds
remote sensing, machine learning, solar photovoltaic, aerosols and clouds
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
