Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

PathOS Impact of Open Access Routes on Topic Persistence Case Study Research Data, Code, and Analysis Results

Authors: Stavropoulos, Petros; Papageorgiou, Haris; Grypari, Ioanna;

PathOS Impact of Open Access Routes on Topic Persistence Case Study Research Data, Code, and Analysis Results

Abstract

This repository contains the data, scripts, and results for the Impact of Open Access Routes on Topic Persistence case study, part of the PATHOS project. Overview Artificial intelligence methods are being rapidly mobilized to tackle the climate crisis, but the knowledge base often burns bright and fades quickly. This case study asks whether two distinct Open Access (OA) routes help AI-for-Climate research topics stay active in the literature: Green OA: self-archiving in repositories Published OA: journal-mediated open access with a clear licence Bronze OA and dual-mode publications are excluded for treatment clarity. Closed Access (CA) articles serve as the counterfactual. By foregrounding topic persistence as a key dimension of impact, the study goes beyond short-term citation counts and investigates whether openness helps research topics remain visible long enough to demonstrate their potential. Repository Structure ├── README.md ├── fos_taxonomy_v0.1.2.json ├── persistent_topics_create_collection.py ├── persistent_topics_find_paper_openaireids.py ├── persistent_topics_find_paper_affiliations.py ├── persistent_topics_get_collection_author_gender.py ├── persistent_topics_calculate_indicators.py ├── persistent_topics_calculate_indicators_sdg.py ├── persistent_topics_indicators_create_data_for_vis.py └── persistent_topics_collection_w_outcomes/ ├── complete_collection_df.parquet / .xlsx ├── topic_attribution_df.parquet / .xlsx ├── results/ │ ├── analysis_conclusions.txt │ ├── summary_statistics.xlsx │ ├── treatment_effects_green_oa.xlsx │ ├── treatment_effects_published_oa.xlsx │ ├── descriptive_effects_any_oa.xlsx │ ├── tables/ │ │ ├── 01_executive_summary.xlsx │ │ ├── 02_treatment_group_characteristics.xlsx │ │ ├── 03_causal_effects_summary.xlsx │ │ ├── 04_topic_persistence_analysis.xlsx │ │ ├── 05_gender_equity_outcomes.xlsx │ │ ├── 06_economic_impact_analysis.xlsx │ │ ├── 07_publication_year_analysis.xlsx │ │ └── 08_robustness_analysis.xlsx │ ├── visualizations/ │ │ ├── 01_sample_overview.png │ │ ├── 02_causal_effects.png │ │ ├── 03_outcome_analysis.png │ │ └── 04_temporal_and_balance.png │ └── final_visualization_data_figures/ │ ├── data/ │ └── figures/ └── results_sdg_only/ ├── sdg_analysis_conclusions.txt ├── green_matched_sdg_papers.xlsx ├── published_matched_sdg_papers.xlsx ├── closed_matched_a_sdg_papers.xlsx ├── closed_matched_b_sdg_papers.xlsx ├── tables/ │ ├── 01_sdg_distribution_matched_samples.xlsx │ ├── 02_sdg_treatment_effects.xlsx │ ├── 03_sdg_vs_non_sdg_comparison.xlsx │ ├── 04_sdg_categories_by_impact.xlsx │ ├── 05_sdg_gender_industry_collaboration.xlsx │ ├── 06_sdg_analysis_summary.xlsx │ ├── 07_sdg_alignment_comparison_matched.xlsx │ └── 08_sdg_alignment_effects_summary.xlsx └── visualizations/ ├── 01_sdg_distribution_overview.png ├── 02_sdg_treatment_effects.png ├── 03_sdg_impact_analysis.png └── 04_sdg_alignment_comparison_matched.png Data Sources External Data Sources (not included) Semantic Scholar Academic Graph: full publication metadata OpenAIRE Graph: European research infrastructure data PATSTAT: patent database for citation analysis ROR: Research Organization Registry SciNoBo toolkit: FOS classification, interdisciplinarity, SDG mapping, FWCI scores Included Data Complete processed collection with outcomes Topic attribution dataset (paper-topic mappings, persistence scores) Analysis results: matched samples, treatment effects, summary statistics SciNoBo Field of Science taxonomy (fos_taxonomy_v0.1.2.json) Scripts Data Processing persistent_topics_create_collection.py – integrates multiple data sources, outcomes, affiliations, patent citations persistent_topics_find_paper_openaireids.py – maps DOIs to OpenAIRE IDs persistent_topics_find_paper_affiliations.py – extracts affiliations, science-industry collaboration persistent_topics_get_collection_author_gender.py – gender classification of authors Analysis persistent_topics_calculate_indicators.py – main causal inference analysis (PSM for Green OA vs CA, Published OA vs CA) persistent_topics_calculate_indicators_sdg.py – SDG-focused treatment effects persistent_topics_indicators_create_data_for_vis.py – prepares final visualization datasets and figures Key Findings Sample Total: 132,134 papers (2000–2021) Green OA: 3,792 papers Published OA: 19,045 papers Closed Access: 92,998 papers Contributions New Topic Persistence Metric for long-term impact Clean OA treatment definitions (excluding dual-mode and Bronze) Separate analysis of Green vs Published OA pathways Main Results 8 significant causal effects across outcomes Enhanced topic persistence in OA papers Positive gender equity outcomes Evidence of economic impact (patents, collaborations) SDG Findings 24,948 SDG-relevant papers (18.9% of sample) 11 significant treatment effects for SDG-related research Stronger knowledge sustainability for achieving SDG goals Methodology Design Propensity Score Matching (PSM) with balanced covariates Separate analyses for Green OA vs CA and Published OA vs CA Robust outcome metrics (including new persistence measure) Treatment Definitions Green OA: repository-based Published OA: journal-based (gold, hybrid, diamond) Closed Access: no open provision Excluded: dual-mode and Bronze OA Outcomes Citation impact (traditional) Topic persistence (novel metric) Gender equity in authorship Economic impact (patents, collaboration) Field effects (disciplinary and SDG)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Funded by
Related to Research communities
OpenAIRE