
This dataset accompanies the study “It Makes the Code Clearer: Why Developers Adopt Modern Python Features in Open Source Projects.” It contains the tools, scripts, and data used in the quantitative and qualitative analyses presented in the paper. BooksCodes.xlsx — Qualitative results spreadsheet: contains the 77 coded excerpts (quote substrings) and their assigned themes/categories from the manual thematic analysis. github_api_scrapper-master.zip — Python-based infrastructure used to collect pull request comments from GitHub’s REST API and assemble the raw qualitative corpus. PullRequestCommentsDataset.xlsx — The final set of 494 manually verified pull request comments discussing Python source code rejuvenation, including repository identifiers and coding metadata. PyMiner-develop.zip — The PyMiner tool used to mine the source code history of 424 GitHub repositories, parse ASTs to detect modern Python features, and emit project-level CSV outputs. pyminer-postgres-backup.sql — PostgreSQL dump of the full raw PR-comments corpus (~395,702 comments) and related tables (schema, data, constraints). Suitable for restoration via psql (e.g., psql -U -d -f pyminer-postgres-backup.sql). All materials are provided to support replication and further research on source code rejuvenation and language feature adoption in Python.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
