
Software Engineers' Human Errors This dataset contains 200 GitHub comments with manual human error annotations, released as part of the following publication: Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023. Included Files The "developer_human_errors.csv" file contains the full dataset of 200 software defect descriptions annotated with human error types (slips, lapses, mistakes) and T.H.E.S.E. categories. CSV Fields ID: Unique identifier for the comment. SOURCE: Whether this comment originates from a commit, issue, or pull request. COMMENT_URL: The URL linking to the comment. COMMENT_TEXT: The raw comment text. HUMAN_ERROR_TYPE: Whether the software defect described is a slip, lapse, or mistake. THESE_V4_ID: Manually assigned T.H.E.S.E. category with labels corresponding to Version 4 of T.H.E.S.E. THESE_NAME: Name corresponding to manually assigned T.H.E.S.E. category. Annotation Details Human error types span slips, lapses, and mistakes from James Reason's Generic Error Modelling System (GEMS): Slips: Failures of attention. Lapses: Failures of memory. Mistakes: Failures of planning. T.H.E.S.E. categories are summarized below: S01: Typos & Misspellings S02: Syntax Errors S03: Overlooking documented Information S04: Multitasking Errors S05: Hardware Interaction Errors S06: Overlooking Proposed Code Changes S07: Overlooking Existing Functionality S08: General Attentional Failure L01: Forgetting to Finish a Development Task L02: Forgetting to Fix a Defect L03: Forgetting to Remove Development Artifacts L04: Working with Outdated Source Code L05: Forgetting an Import Statement L06: Forgetting to Save Work L07: Forgetting Previous Development Discussion L08: General Memory Failure M01: Code Logic Errors M02: Incomplete Domain Knowledge M03: Wrong Assumption Errors M04: Internal Communication Errors M05: External Communication Errors M06: Solution Choice Errors M07: Time Management Errors M08: Inadequate Testing M09: Incorrect/Insufficient Configuration M10: Code Complexity Errors M11: Internationalization/String Encoding Errors M12: Inadequate Experience Errors M13: Insufficient Tooling Access Errors M14: Workflow Order Errors M15: General Planning Failure Contact Please contact Benjamin S. Meyers (email) with questions about this data and its collection. Acknowledgments Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
Commits, Software Engineering, Generic Error Modelling System, GitHub, Issues, Human Error, T.H.E.S.E., Taxonomy of Human Errors in Software Engineering (T.H.E.S.E.), Slips, Mistakes, Pull Requests, Lapses, Natural Language, GEMS
Commits, Software Engineering, Generic Error Modelling System, GitHub, Issues, Human Error, T.H.E.S.E., Taxonomy of Human Errors in Software Engineering (T.H.E.S.E.), Slips, Mistakes, Pull Requests, Lapses, Natural Language, GEMS
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
