Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter"

Name: Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter"
Creator: Gregory Gay
Keywords: Large Language Models, Automated Test Generation, Generative AI, Search-Based Test Generation, Readability

Gregory Gay

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: ZENODO

Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter"

Research datakeyboard_double_arrow_right Dataset 28 Aug 2023Publisher:Zenodo

Authors: Gregory Gay;

doi: 10.5281/zenodo.8289841 , 10.5281/zenodo.8296610

Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter"

- Summary
- Subjects
- Metrics

Abstract

While automated test generation can decrease the human burden associated with testing, it does not eliminate this burden. Humans must still work with generated test cases to interpret testing results, debug the code, build and maintain a comprehensive test suite, and many other tasks. Therefore, a major challenge with automated test generation is understandability of generated test test cases. Large language models (LLMs), machine learning models trained on massive corpora of textual data - including both natural language and programming languages - are an emerging technology with great potential for performing language-related predictive tasks such as translation, summarization, and decision support. In this study, we are exploring the capabilities of LLMs with regard to improving test case understandability. This package contains the data produced during this exploration: The examples directory contains the three case studies we tested our transformation process on: queue_example: Tests of a basic queue data structure httpie_sessions: Tests of the sessions module from the httpie project. string_utils_validation: Tests of the validation module from the python-string-utils project. Each directory contains the modules-under-test, the original test cases generated by Pynguin, and the transformed test cases. Two trials were performed per case example of the transformation technique to assess the impact of different results from the LLM. The survey directory contains the survey that was sent to assess the impact of the transformation on test readability. survey.pdf contains the survey questions. responses.xlsx contains the survey results.

Related Organizations

University of Gothenburg
Sweden

Keywords

Large Language Models, Automated Test Generation, Generative AI, Search-Based Test Generation, Readability

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average