Text-to-text Generation for Issue Report Classification

Submission for the NLBSE Issue Report Tool Competition This package accompanies the submission titled "Text-to-text Generation for Issue Report Classification" to the NLBSE Issue Report Tool Competition. The package provides resources for replicating the experiments and results presented. Description of ZIP Files: issue_classification_t5: This archive contains the code for replicating the study, including the retrieval of the pre-trained model, fine-tuning procedures, and inference execution. code: Contains all the code files. finetuning.py: The contents of this file comprise the code for fine-tuning the VMware/flan-t5-large-alpaca model on the issue report classification task. Additionally, embedded comments provide guidance on executing the fine-tuning process. Be sure to read the embedded comments. inference.py: This file contains the codebase for conducting inference using the fine-tuned model. Similar to the fine-tuning script, instructions for running the inference process are embedded as comments within the file. download_plm.py: This file contains the code for downloading VMware/flan-t5-large-alpaca from https://huggingface.co/VMware/flan-t5-large-alpaca . requirements.txt: This file enumerates the required Python modules and their respective versions necessary for the successful execution of the provided code. data: Folder contains the NLBSE issue report classification data and model output after running inference using inference.py on issue-report-test.csv checkpoint-3000-output.csv: The contents of this CSV file present the output obtained after fine-tuning the VMware/flan-t5-large-alpaca model for 2 epochs (F1-score of 0.8297) on issue-report-train.csv and running the inference on issue-report-test.csv. Column 'label' contains the ground truth labels. Column 'Model generated output' contains the predicted label by the model. issue-report-train.csv: NLBSE24 isssue report tool competition train dataset. (Source: https://github.com/nlbse2024/issue-report-classification) issue-report-test.csv: NLBSE24 isssue report tool competition test dataset. (Source: https://github.com/nlbse2024/issue-report-classification) finetuned_model_checkpoint-3000: This zip file contains the fine-tuned model (VMware/flan-t5-large-alpaca) to 2 epochs.

Environment details: Operating System: Ubuntu 22.04 NVIDIA Driver Version: 470.141.03 NVIDIA CUDA Version: 12.2.1 Python version: 3.10 GPU Name: Nvidia A100 GPU Memory: 20 GiB CPU Memory: 60 GiB Note: We also attempted fine-tuning using a V100 GPU, and the results showed slight differences, potentially attributed to variations in GPU architecture. However, running inference on any GPU using the provided model finetuned_model_checkpoint-3000 should yield the same results as reported.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average