Ontology-based Automatic Reasoning and NLP for Software Traceability with OntoTrace: Experiment protocol, demographics, and additional information

This repository contains all data from protocols, guidelines, and slides, among other resources from the paper "Ontology-based Automatic Reasoning and NLP for Software Traceability with OntoTrace", currently under review at the REFSQ2023. How to use: Inside the first folder, you will find two sub-folders: OntoTraceV2.0-Experiment and OntoTraceV2.0-Extendend-description. The first sub-folder contains all data related to the quasi-experiment performed to validate effectiveness, efficiency, and satisfaction. On the other hand, the second sub-folder includes extended information about SPARQL queries, vectorizer validation tests, guidelines, and architecture. We describe each sub-folder content in the following paragraphs. OntoTraceV2.0-Experiment: Guidelines and slides. Contains all the procedures, slides, and questionnaires given to the experimental subjects during the quasi-experiment. You can find the following files in this sub-folder: G3-S1-Guidelines. PDF file containing the first session (manual traceability) guidelines. G3-S2-Guidelines. PDF file containing the second session (OntoTraceV2.0 traceability) guidelines. M1-DemographicQuestionnaire. PDF file containing the ethical agreement and the demographic questionnaire. RASOPHS21-L5 Traceability. PDF files containing slides about traceability terminology, traceability process and strategy, and traceability cost and benefit. Such slides are part of training subjects in traceability since most had no experience with software traceability before the experiment. RASOPHS21-L5-Exp-O1-P1. PDF file containing user stories from experimental object O1. GR1 (Group 1 of experimental subjects) uses this version of the user stories during the first session (manual traceability). RASOPHS21-L5-Exp-O2-P1. PDF file containing user stories from experimental object O2. GR2 (Group 2 of experimental subjects) uses this version of the user stories during the first session (manual traceability). RASOPHS21-L6 Traceability-OntoTrace. PDF file containing slides about traceability tools (especially about OntoTrace). Such slides are part of training subjects in traceability since most had no experience with software traceability before the experiment. RASOPHS21-L6-Exp-O1-P2. PDF file containing user stories from experimental object O1 (version O1.2). GR2 uses this version of the user stories during the second session (OntoTraceV2.0 traceability). RASOPHS21-L6-Exp-O2-P2. PDF file containing user stories from experimental object O2 (version O2.2). GR1 uses this version of the user stories during the second session (OntoTraceV2.0 traceability). O#. Sub-folders with the pattern O# contain information about experimental objects (i.e., O0, O1, and O2). They share the same structure and files. O#-base. PDF file containing a text-based requirements description of O#. This is an overview of the software we use to create the user stories of each O# later. O#-ExperimentalObjectDesign. MS Excel file containing all the O# design information. This file has eight tabs: UserStories. This tab contains all user stories in the O#. That includes the ID, the source where it is taken from (based on the O#-base), user role, action, objects, reason, and the final result concatenating the last four columns. Sources. This tab contains all source artefacts based on the previously defined user stories. That includes the user story ID, source ID, name, and type (Role, Action, Object, or Goal). Targets. This tab contains all target artefacts based on the Merlin EDG model. That includes the target ID, name, and type (Class, Relationship, Method, Attribute). UserStroiesAndRulesClass. This tab contains all rules used to create the base solution of traceability. We use transformation rules and our experience transforming user stories into EDG models to devise a base traceability solution. We link each trace into a rule based on literature. This tab includes the element, rules based on Nasiri et al. 2020 or Barigolvski et al. 2022, resulting element, target id, target, and target type. Traces. This tab summarizes all traces found in the experimental object using the previously defined source and target IDs. Traces_interpeted. This tab interprets the Traces tab to create a more readable view of the traces in the experimental object. Nasiri2020. This tab contains the rules based on Nasiri et al. 2020 to transform user stories into EDG models (or UML-like-class-diagram models). That includes a Rule column and a description of the rule. Barigolvski2022. This tab contains the rules based on Barigolvski et al. 2022 to transform user stories into EDG models (or UML-like-class-diagram models). That includes a Rule column, a description of the rule, the expected outcome, and a set of remarks. O#-Exp-Master-v1. MXP file containing the EDG model used in each experimental object. To open this file, you should visit the merlin-academic web page, create an account and import the model contained in such file. O#-Paper-based. DRAWIO file containing the EDG model and user stories provided during the first session of the experiment. To open this file, download draw.io or visit the diagrams.net website and import the file. Results. This folder contains all data regarding the results of the quasi-experiment. You will find the following files in such a folder: Demographics. MS Excel file containing the anonymized demographic data from experimental subjects. That includes the study program, how long they have been studying such program (in months), experience in software development (in months), experience in traceability (in months), experience in testing (in months), industry experience field, and the time that has been working in industry (in months). ExperimentalResults. MS Excel file containing the anonymized experimental data from experimental subjects. We create a detailed description of each column in the following numbered list: ID. Numeric ID that we assigned to each experimental subject. Method. Column to differentiate if they use a "Manual" strategy or "Tool" strategy (OntoTrace). Group. Column to distinguish both groups (G1 and G2). Problem. Column to determine the experimental object each subject faced (O1 or O2). Starting time. Starting time executing the experimental task. Ending time. The time when the subject completed the experimental task. Q1-Q14. Likert-scale questions from 1 to 5 about satisfaction questions. You can find each question in the G3-S1-Guidelines and G3-S2-Guidelines. We assign the ID in order of question. The Number of correct discovered traceability links (TP; True Positive). Generated trace links that fit with what we expected to be traced based on the experimental object design. The Number of wrongs discovered traceability links (FP; False Positive). Generated trace links that do not fit with what we expected to be traced based on the experimental object design. The Number of missed correct traceability links (FN; False Negative). Not generated trace links that fit with what we expected to be traced based on the experimental object design. The Number of missed wrong traceability links (TN; True Negative). Not generated trace links that do not fit with what we expected to be traced based on the experimental object design. Precision. This column contains the precisions as (TP / (TP + FP)) in percentage. Efficiency. This column contains the efficiency as (#Number of traces / (Ending time - Starting time) in Traces/Min unity. PEU, PU, ITU AVG. On average, these columns contain the satisfaction measurements (Perceived ease of use, Perceived usefulness, and Intention to use). PEU, PU, ITU MED. These columns contain the satisfaction measurements (Perceived ease of use, Perceived usefulness, and Intention to use) with the Median value. RegressionResults. MS Excel file containing all the detailed information about the regression we performed in the experiment. Sheet 1 includes an extended version of our paper's Table 5. Sheet 2 contains extra information about each variable, including the R-square and Root MSE, among other statistics. Experiment-Protocol. PDF file containing a detailed version of the quasi-experiment protocol. The protocol includes detailed information about dependent and independent variables, subjects, timelines, procedures, etc. OntoTraceV2.0-Extended-description: SPARQL_queries. This sub-folder contains the SPARQL queries we implemented in the paper to retrieve the information from the automatic reasoner. Each of these queries aims to answer one artefact-related question. You will find the following files in this sub-folder: IsThereAnyUntracedSourceArtefact. This SPARQL query aims to answer the question, "Is there any untraced source artefact?" IsThereAnyUntracedTargetArtefact. This SPARQL query aims to answer the question, "Is there any untraced target artefact?" WhatAreThePossibleTraceableSourcesGivenATargetArtefact. This SPARQL query aims to answer the question, "Given a specific target artefact, which are the suggested source artefacts that may be traced to it?" WhatAreThePossibleTraceableTargetsGivenASourceArtefact. This SPARQL query aims to answer the question, "Given a specific source artefact, which are the suggested target artefacts that may be traced to it?" WhatAreTheTracesBetweenMyArtifacts. This SPARQL query aims to answer the question, "What are the traces between my artefacts?" Validation_test. This sub-folder contains the validation test we ran to evaluate different vectorization techniques and how they affect the OntoTraceV2.0 recommendation accuracy. Inside this sub-folder, you will find the following files and folders: O#. Folders with the pattern O# (i.e., O0, O1, and O2) contain the same structure and files. Insite each O# sub-folder; you will find an MS Excel file. Such a file has a similar design as O#-ExperimentalObjectDesign files (check above). This structure is used and generated by the validation scripts. O#-OntoTrace-Validation-Script. Python Jupyther Notebook file containing the validation script for each experimental object. Sec.3.1-Guidelines-Extendend. PDF file that contains an extended version of Sec 3.1, "Traceability context: Tracing user stories and EDG models." That includes a complete instantiation of Ontology101 steps into Traceability Ontology. If you have any doubts, contact mosq@zhaw.ch / ruiz@zhaw.ch

Keywords

Automatic reasoning, Ontology, OntoTrace, Traceability, NLP

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average