
The sheet file contains feature summarized among literature.This zip file contains essential materials to demonstrate the experimental results presented in the paper: "Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence"The package's structure is shown below:poisoning-dataset This folder contains the dataset collected for model training. The samples were sourced from Backstabbers-Knife-Collection and Maloss-samples. Access to Backstabber's Knife Collection requires a request to the authors, while access to Maloss-samples can be obtained by filling out the Google Form provided by them. 10-fold-predict-result This folder contains the prediction results of each fold for Effectiveness Evaluation (RQ1) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious). ablation-predict-result This folder contains the prediction results of each fold for the Ablation Study (RQ3) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious). real-world This folder contains the code and data for the Real-World Usefulness Evaluation (RQ4) in the paper. monitor-result includes tables for each month's monitoring results. The CSV columns are as follows: Column Description package package version name positive 1 for flagged by cerebro, 0 for not flagged by cerebro TP 1 for considered as malicious package by manual inspection, 0 for not considered as malicious package real-world-monitor contains Python code to download and collect newly uploaded package versions in npm and PyPI. The requirements for running the code are: feedparser==6.0.10 lxml==4.9.1 pandas==1.4.4 Requests==2.31.0 To download newly uploaded package versions during the last 5 minutes, run the following commands: # npm $ cd PATH/TO/real-world/pipeline_npm $ python pipeline.py # pypi $ cd PATH/TO/real-world/pipeline_pypi $ python pipeline.py If you want to change the 5-minute collection period, modify the minutes global variable in the Python file. For 24/7 collection, use scheduling tools like crontab on Linux. collect.ipynb is a Jupyter Notebook that concatenates CSV files generated by cerebro for each download time period, as well as move/unzip packages flagged by cerebro.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
