Non-Functional Requirements from 269 Software Development Projects

Automated NLP-Based Classification of Non-Functional Requirements in Blockchain and Cross-Domain Software Systems Using BERT and Machine Learning Abstract: Automated non-functional requirements (NFRs) classification enhances consistency and traceability by systematically labeling requirements, saving effort, supporting early architectural and testing decisions, improving stakeholder communication, and enabling quality across diverse software domains. While prior work has applied natural language processing (NLP) and machine learning (ML) to NFR classification, existing datasets are often limited in size, domain diversity, and contextual richness. This study presents a novel dataset comprising over 2,400 NFRs spanning 269 software projects across 26 software application domains, including nine blockchain projects. The raw requirements are standardized using Rupp’s boilerplate to reduce vagueness and ambiguity, and the classification of NFRs types follows ISO/IEC 25010 definitions. We employ a range of traditional machine learning, deep learning, and a transformer-based model (i.e., BERT-base) for automated classification of NFRs, evaluating performance across cross-domain and blockchain-specific NFRs. Results highlight that domain-aware adaptation significantly enhances classification accuracy, with traditional ML and deep learning models showing strong performance on blockchain requirements. This work contributes a publicly available, context-rich dataset and provides empirical insights into the effectiveness of NLP-based NFR classification in both general and blockchain-specific settings.

Related Organizations

University of Roehampton
United Kingdom
COMSATS University Islamabad
Pakistan

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average