Multi-Label Requirements Classification with Large Taxonomies

Name: Multi-Label Requirements Classification with Large Taxonomies
Keywords: FOS: Computer and information sciences, Domain specific, Multi-label classifications, Programvaruteknik, Large-scales, Requirements traceability, requirements classification, Requirements classifications, multi-label, Computer Science - Software Engineering

Abdeen, Waleed; Unterkalmsteiner, Michael; Wnuk, Krzysztof; Chirtoglou, Alexandros; Schimanski, Christoph; Goli, Heja

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

Electronic Research Archive - Blekinge Tekniska Högskola

Conference object . 2024 . Peer-reviewed

Data sources: Electronic Research Archive - Blekinge Tekniska Högskola

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Conference object . 2024 . Peer-reviewed

Data sources: Digitala Vetenskapliga Arkivet - Academic Archive On-line

https://doi.org/10.1109/re5906...

Article . 2024 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: CC BY NC SA

Data sources: Datacite

Multi-Label Requirements Classification with Large Taxonomies

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 24 Jun 2024Embargo end date: 01 Jan 2024Publisher:IEEEJournal:2024 IEEE 32nd International Requirements Engineering Conference (RE)

Authors: Abdeen, Waleed; Unterkalmsteiner, Michael; Wnuk, Krzysztof; Chirtoglou, Alexandros; Schimanski, Christoph; Goli, Heja;

doi: 10.1109/re59067.2024.00033 , 10.48550/arxiv.2406.04797

arXiv: 2406.04797

Multi-Label Requirements Classification with Large Taxonomies

- Summary
- Subjects
- Metrics

Abstract

Classification aids software development activities by organizing requirements in classes for easier access and retrieval. The majority of requirements classification research has, so far, focused on binary or multi-class classification. Multi-label classification with large taxonomies could aid requirements traceability but is prohibitively costly with supervised training. Hence, we investigate zero-short learning to evaluate the feasibility of multi-label requirements classification with large taxonomies. We associated, together with domain experts from the industry, 129 requirements with 769 labels from taxonomies ranging between 250 and 1183 classes. Then, we conducted a controlled experiment to study the impact of the type of classifier, the hierarchy, and the structural characteristics of taxonomies on the classification performance. The results show that: (1) The sentence-based classifier had a significantly higher recall compared to the word-based classifier; however, the precision and F1-score did not improve significantly. (2) The hierarchical classification strategy did not always improve the performance of requirements classification. (3) The total and leaf nodes of the taxonomies have a strong negative correlation with the recall of the hierarchical sentence-based classifier. We investigate the problem of multi-label requirements classification with large taxonomies, illustrate a systematic process to create a ground truth involving industry participants, and provide an analysis of different classification pipelines using zero-shot learning.

Published by IEEE at the Requirements Engineering Conference (2024) - Industrial Innovation Track

Related Organizations

Blekinge Institute of Technology
Sweden
HOCHTIEF VICON GMBH
Germany

Keywords

FOS: Computer and information sciences, Domain specific, Multi-label classifications, Programvaruteknik, Large-scales, Requirements traceability, requirements classification, Requirements classifications, multi-label, Computer Science - Software Engineering, Multi-labels, Multi-class classification, Software design, domain-specific tax-onomy, Requirements engineering, Software Engineering, Taxation, Software Engineering (cs.SE), Development activity, large-scale, Multiprogramming, Sentence-based, Taxonomies, Zero-shot learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

bronze

Beta

SDGs Suggest

9. Industry and infrastructure

Beta

SDGs:

9. Industry and infrastructure,

Related to Research communities

Knowmad Institut