Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Other ORP type . 2025
License: CC BY
Data sources: Datacite
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

OSSVul - ReplicationPackage

Authors: Al Hajj Ibrahim, Sara;

OSSVul - ReplicationPackage

Abstract

OSS Vulnerability Dataset and Model Evaluation Framework This archive contains both datasets and experimental code used in a study on open-source software vulnerability detection. It integrates vulnerability information from the National Vulnerability Database (NVD) with software development artifacts extracted from GitHub and provides a unified framework for constructing datasets and evaluating multiple vulnerability detection models. The archive provides the data processing pipeline, curated datasets, and experimental scripts used in the study. Vulnerability detection is performed at the sample level with results aggregated at the CVE level to reflect practical vulnerability identification scenarios. Contents CVE Data Collection Model Data Collection Model Experiments CVE_data.xlsx CVE Data Collection This component includes scripts used to construct a unified CVE dataset. CVE records from 1999 to July 2024 were collected from the National Vulnerability Database (NVD) and consolidated into the file CVE_data.xlsx. References to GitHub artifacts, including commits, pull requests, and issues, were extracted from CVE entries and filtered to retain valid artifacts. Artifact creation timestamps and temporal metrics were computed for time-aware analysis. Model Data Collection This component provides scripts for constructing model-specific inputs. Datasets were generated at the artifact levels and include both vulnerable and non-vulnerable samples. Due to dataset size, intermediate CSV outputs were merged during preprocessing, and temporal ordering was preserved by splitting the data into RQ2 and RQ3 subsets, presented in experimental datasets. Model Experiments This component contains experimental code, configurations, and datasets used to evaluate the following vulnerability detection models: MemVul VulCurator PatchRNN LineVul DeepTraVul Experiments are conducted independently for each model using a consistent evaluation protocol. All models operate at the sample level, and a CVE is considered vulnerable if at least one associated sample is predicted as vulnerable. Notes All datasets and experimental scripts required to reproduce the reported results are included in this archive.The CVE dataset is provided in the file CVE_data.xlsx.This archive is intended to support reproducible research on software vulnerability detection.

Related Organizations
Keywords

Software Security, AI for Software Engineering, Vulnerability assessment, Vulnerability Detection, Open Source Software

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities