
arXiv: 2202.04431
Programming language documentation refers to the set of technical documents that provide application developers with a description of the high-level concepts of a language (e.g., manuals, tutorials, and API references). Such documentation is essential to support application developers in effectively using a programming language. One of the challenges faced by documenters (i.e., personnel that design and produce documentation for a programming language) is to ensure that documentation has relevant information that aligns with the concrete needs of developers, defined as the missing knowledge that developers acquire via voluntary search. In this article, we present an automated approach to support documenters in evaluating the differences and similarities between the concrete information need of developers and the current state of documentation (a problem that we refer to as the topical alignment of a programming language documentation). Our approach leverages semi-supervised topic modelling that uses domain knowledge to guide the derivation of topics. We initially train a baseline topic model from a set of Rust -related Q&A posts. We then use this baseline model to determine the distribution of topic probabilities of each document of the official Rust documentation. Afterwards, we assess the similarities and differences between the topics of the Q&A posts and the official documentation. Our results show a relatively high level of topical alignment in Rust documentation. Still, information about specific topics is scarce in both the Q&A websites and the documentation, particularly related topics with programming niches such as network, game, and database development. For other topics (e.g., related topics with language features such as structs, patterns and matchings, and foreign function interface), information is only available on Q&A websites while lacking in the official documentation. Finally, we discuss implications for programming language documenters, particularly how to leverage our approach to prioritize topics that should be added to the documentation.
Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Programming Languages, Programming Languages (cs.PL)
Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Programming Languages, Programming Languages (cs.PL)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
