
The increasing integration of AI-powered tools in software development raises crucial questions about the quality of the code they generate, particularly in rapidly evolving fields like mobile application development. This study addresses the need for up-to-date evaluations of AI-generated code quality in non-native applications, a gap in current research. To investigate this problem, we conducted an experiment where five prominent AI code generation tools– Gemini Code Assist, GitHub Copilot, ChatGPT, Windsurf IDE, and Deepseek– were prompted to generate code for a chess game in two mobile development frameworks: React Native and Kotlin. This resulted in a comparative analysis of ten AI-generated applications. The quality of the generated code was assessed using software quality metrics, informed by a comprehensive literature review. Our analysis revealed a moderate to high degree of variation across the generated applications in key metrics such as cyclomatic complexity, lines of code, and cognitive complexity. However, the observed results did not provide conclusive evidence to definitively identify a single AI tool as consistently producing the highest quality code across both frameworks. While the study provides valuable insights into the variability of code quality among different AI tools, the findings suggest that further research is necessary to achieve a more comprehensive understanding of the factors influencing the quality of AI-generated code. More in-depth investigation is required to draw definitive conclusions regarding the optimal AI tools for specific development contexts and to explore strategies for consistently generating high-quality code with AI assistance.
Programvaruteknik, AI-generated code, Code quality, Native and Non-Native applications., Software Engineering, Large Language Models in coding, AI-assisted development
Programvaruteknik, AI-generated code, Code quality, Native and Non-Native applications., Software Engineering, Large Language Models in coding, AI-assisted development
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
