Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type
Data sources: ZENODO
addClaim

Replication Package for the Paper ``Can Large Language Models Decompose User Stories into Tasks? Exploring the Role of Prompting Strategies and Models''

Authors: Anonymous, Anonymous;

Replication Package for the Paper ``Can Large Language Models Decompose User Stories into Tasks? Exploring the Role of Prompting Strategies and Models''

Abstract

DELTA This repository contains supplementary materials for the research paper: "Can Large Language Models Decompose User Stories into Tasks? Exploring the Role of Prompting Strategies and Models". The paper was submitted to the Automated Software Engineering Conference 2026. Structure DELTA - Pipeline implementation for decomposing user stories into tasks and evaluating the quality of the resulting decompositions Prompt iterations – Prompt engineering experiments and observations Descriptive statistics – Task title length statistics Evaluation session – Human evaluation session including informed consent form and evaluation session protocol Statistics – Per research question (RQ) statistical results Research Questions RQ1How do persona-based zero- and few-shot prompting strategies affect text similarity between LLM-generated and human-created decompositions? RQ2How do model family and size affect text similarity between LLM-generated and human-created decompositions? RQ3To what extent do text similarity metrics align with human expert judgments of decomposition quality? RQ4To what extent does the criteria-based automated evaluation module of DELTA agree with human expert judgments of decomposition quality?

Powered by OpenAIRE graph
Found an issue? Give us feedback