
This dataset collected from GitHub was used to conduct an empirical study on security issues in GitHub Copilot-generated code. We provide below a brief description of each folder and file: 1. source-data foldercontains all the code files from the Code and Repository labels that we collected from github and used in our study. Code snippets are included in these code files. 2. scan-result foldercontains the commands used to perform security scans and all the results from security scans performed using static analysis tools. 3. filtered-result foldercontains the results we kept after filtering the scan results in Step 5. 4. fix-result foldercontains code snippets before and after fixes in RQ3 and the results of fixes for security issues. 5. project-url.xlsxprovides the projects from the Repository label and source files from the Code label that contain Copilot-generated code from GitHub.--SOURCE gives the URL of the project from GitHub.--FILE gives the path to the source code file in the project (only for the source files from the Code label).--NOTE gives the statement describing the project as generated by Copilot.--FUNCTION gives a functional description of the project.--DOMAIN gives the application domain that the project containing the Copilot-generated code belongs to. 6. corresponding_cwe.xlsxprovides warning messages from static analysis tools corresponding to CWEs. 7. files_with_security_issues.xlsxprovides information about code snippets with security issues. 8. cwe-result.xlsxprovides the types and quantities of CWEs identified in the scan results.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
