Downloads provided by UsageCounts
This replication package can be used for replicating results in the paper. It contains 1) a dataset of 290,255 repositories; and 2) Python scripts for training and interpreting models. The GitHub repository of the paper is available at https://github.com/mcxwx123/Sustainable_projects. We recommend manually setup the required environment in a commodity Linux machine with at least 1 CPU Core, 8GB Memory and 100GB empty storage space. We conduct development and execute all our experiments on a Ubuntu 20.04 server with two Intel Xeon Gold CPUs, 320GB memory, and 36TB RAID 5 Storage. We use GHTorrent to restore historical states of 290,255 repositories with more than 57 commits, 4 PRs, 1 issue, 1 fork and 2 stars. The raw data of repositories (collected in their first 1,3,5 months(s)) are stored in `Replication Package/data/prodata_1.pkl`, `Replication Package/data/prodata_3.pkl`, and `Replication Package/data/prodata_5.pkl`. The contribution of features resulting from LIME model is stored in `Replication Package/data/limeres_m3_t2_k1.pkl`. `Replication Package/data/X_test_m3_t2_k1.pkl` and `Replication Package/data/y_test_m3_t2_k1.pkl` store the test dataset for the LIME model. You can run `Replication Package/fitdata.py` to get the results in Table 3 and 4, run `Replication Package/draw_compare_variable.py` to get Figure 2 and run `Replication Package/allvari_statistics.py` to get Table 5. In `Replication Package/Variable_comparison_with_different_parameter.pdf`, we show the LIME results under different parameters. In `Replication Package/sample_pros.csv`, we also provide the list of randomly selected repositories in Section 3.1. The explanations for collecting the variables, the examples of variable effects on project sustainability, and the hyperparameter setting of the machine learning models are provided in the README.md file.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 31 | |
| downloads | 4 |

Views provided by UsageCounts
Downloads provided by UsageCounts