
Submission for the NLBSE Issue Report Tool Competition This package accompanies the submission titled "Text-to-text Generation for Issue Report Classification" to the NLBSE Issue Report Tool Competition. The package provides resources for replicating the experiments and results presented. Description of ZIP Files: issue_classification_t5: This archive contains the code for replicating the study, including the retrieval of the pre-trained model, fine-tuning procedures, and inference execution. code: Contains all the code files. finetuning.py: The contents of this file comprise the code for fine-tuning the VMware/flan-t5-large-alpaca model on the issue report classification task. Additionally, embedded comments provide guidance on executing the fine-tuning process. Be sure to read the embedded comments. inference.py: This file contains the codebase for conducting inference using the fine-tuned model. Similar to the fine-tuning script, instructions for running the inference process are embedded as comments within the file. download_plm.py: This file contains the code for downloading VMware/flan-t5-large-alpaca from https://huggingface.co/VMware/flan-t5-large-alpaca . requirements.txt: This file enumerates the required Python modules and their respective versions necessary for the successful execution of the provided code. data: Folder contains the NLBSE issue report classification data and model output after running inference using inference.py on issue-report-test.csv checkpoint-3000-output.csv: The contents of this CSV file present the output obtained after fine-tuning the VMware/flan-t5-large-alpaca model for 2 epochs (F1-score of 0.8297) on issue-report-train.csv and running the inference on issue-report-test.csv. Column 'label' contains the ground truth labels. Column 'Model generated output' contains the predicted label by the model. issue-report-train.csv: NLBSE24 isssue report tool competition train dataset. (Source: https://github.com/nlbse2024/issue-report-classification) issue-report-test.csv: NLBSE24 isssue report tool competition test dataset. (Source: https://github.com/nlbse2024/issue-report-classification) finetuned_model_checkpoint-3000: This zip file contains the fine-tuned model (VMware/flan-t5-large-alpaca) to 2 epochs.
Environment details: Operating System: Ubuntu 22.04 NVIDIA Driver Version: 470.141.03 NVIDIA CUDA Version: 12.2.1 Python version: 3.10 GPU Name: Nvidia A100 GPU Memory: 20 GiB CPU Memory: 60 GiB Note: We also attempted fine-tuning using a V100 GPU, and the results showed slight differences, potentially attributed to variations in GPU architecture. However, running inference on any GPU using the provided model finetuned_model_checkpoint-3000 should yield the same results as reported.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
