On the Applicability of Language Models to Block-Based Programs

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 May 2023Embargo end date: 01 Jan 2023Publisher:IEEEJournal:2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Authors: Elisabeth Griebl; Benedikt Fein; Florian Obermüller; Gordon Fraser 0001; René Just;

doi: 10.1109/icse48619.2023.00199 , 10.48550/arxiv.2302.03927

arXiv: 2302.03927

On the Applicability of Language Models to Block-Based Programs

- Summary
- Subjects
- Metrics

Abstract

Block-based programming languages like Scratch are increasingly popular for programming education and end-user programming. Recent program analyses build on the insight that source code can be modelled using techniques from natural language processing. Many of the regularities of source code that support this approach are due to the syntactic overhead imposed by textual programming languages. This syntactic overhead, however, is precisely what block-based languages remove in order to simplify programming. Consequently, it is unclear how well this modelling approach performs on block-based programming languages. In this paper, we investigate the applicability of language models for the popular block-based programming language Scratch. We model Scratch programs using n-gram models, the most essential type of language model, and transformers, a popular deep learning model. Evaluation on the example tasks of code completion and bug finding confirm that blocks inhibit predictability, but the use of language models is nevertheless feasible. Our findings serve as foundation for improving tooling and analyses for block-based languages.

To appear at the 45th IEEE/ACM International Conference on Software Engineering (ICSE'2023)

Related Organizations

University of Mary
United States
University of Passau
Germany

Keywords

Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Programming Languages, D.2.5, D.2.3, 68-04, D.2.5; D.2.3, Programming Languages (cs.PL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering