Downloads provided by UsageCounts
In this archive, you can find all the data used in the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells". sklearn_full_cells.csv is the dataset from the paper of Pimentel et al. filtered with only Data Science notebooks. complete.csv is the dataset obtained after the full run of ReSplit on the dataset: both merging and splitting. split.csv is the dataset obtained after running only the splitting part of our dataset. merged.csv is the dataset obtained after running only the merging part of our dataset. duplicates_id.csv contains the IDs of the duplicate notebooks for deduplication. changes.csv contains the IDs of the datasets, as well as their length before and after running ReSplit. survey.csv is the table with the results of the survey. In the dataset CSVs, each line is a cell that has a unique identifier and an identifier of the corresonding notebook.
jupyter notebooks, jupyter cells, refactoring
jupyter notebooks, jupyter cells, refactoring
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 7 | |
| downloads | 1 |

Views provided by UsageCounts
Downloads provided by UsageCounts