Microservices Performance Testing with Causality-enhanced Large Language Models

Name: Microservices Performance Testing with Causality-enhanced Large Language Models
Keywords: Microservices; Performance testing; Large Language Models; Causal reasoning; Retrieval-augmented generation

Cristian Mascia; Roberto Pietrantuono; Antonio Guerriero; Luca Giamattei; Stefano Russo

Found an issue? Give us feedback

downloadFull-Text

Archivio della ricer...arrow_drop_down

Archivio della ricerca - Università degli studi di Napoli Federico II

Conference object . 2025

Full-Text: https://www.iris.unina.it/bitstream/11588/1003236/1/FORGE25%20%282%29.pdf

Data sources: Archivio della ricerca - Università degli studi di Napoli Federico II

FEDOA - IRIS Università degli Studi Napoli Federico II

Conference object . 2025

Data sources: FEDOA - IRIS Università degli Studi Napoli Federico II

https://doi.org/10.1109/forge6...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

Archivio della Ricerca - Università di Salerno

Conference object . 2025

Data sources: Archivio della Ricerca - Università di Salerno

http://dx.doi.org/10.1109/forg...

Conference object

License: STM Policy #29

Full-Text: http://xplorestaging.ieee.org/ielx8/11052780/11052781/11052809.pdf?arnumber=11052809

Data sources: Sygma

http://dx.doi.org/10.1109/forg...

Conference object . 2025

Data sources: European Union Open Data Portal

Microservices Performance Testing with Causality-enhanced Large Language Models

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 27 Apr 2025Publisher:IEEEJournal:2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)Funded by:EC | uDEVOPS

Authors: Cristian Mascia; Roberto Pietrantuono; Antonio Guerriero; Luca Giamattei; Stefano Russo;

doi: 10.1109/forge66646.2025.00022

handle: 11588/1003236 , 11386/4918560

Microservices Performance Testing with Causality-enhanced Large Language Models

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Efficient performance testing of microservices is essential for engineers to ensure that deviations of performance/resource usage metrics from expectations are promptly identified within their rapid release cycle. To this aim, engineers would need to explore the space of possible workload configurations and focus only on the critical ones, e.g., low-load configurations that unexpectedly cause performance issues. This requires a great effort, and can be infeasible in short release cycles.We present CALLMIT, a framework using Large Language Models (LLM) enhanced by causal reasoning to automatically generate critical workloads for microservices performance testing. Engineers query CALLMIT to generate workload configurations expected to expose deviations from performance requirements, so as to actually run only tests that trigger critical configurations. We present the experimental evaluation on three subjects, with comparison to a conventional Retrieval-Augmented Generation technique. The results show that causal models improve the correct identification by LLM of performance-critical workload configurations.

Related Organizations

Università degli studi di Salerno
Italy
University Federico II of Naples
Italy

Keywords

Microservices; Performance testing; Large Language Models; Causal reasoning; Retrieval-augmented generation

3 Research products, page 1 of 1

sock-shop-demo software on GitHub
IsRelatedTo
callmito software on GitHub
IsRelatedTo
muBench software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average