
This work presents a context-aware framework for applying Large Language Models (LLMs) to automated test case generation in enterprise Point-of-Sale (POS) systems. Modern retail platforms operate as highly distributed, transaction-intensive environments involving complex interactions across payment gateways, promotion engines, inventory systems, and regulatory constraints. Ensuring correctness and reliability in such systems remains a significant challenge for traditional test design approaches. This paper introduces the Context-Aware LLM Test Generation Framework (CALTGF), a structured four-layer architecture that integrates domain-specific business rules, regulatory constraints (including PCI-DSS and EBT compliance), structured prompt engineering, automated validation, and human-in-the-loop governance. The framework addresses key limitations of general-purpose LLMs, including business rule hallucination, transaction state inconsistencies, and domain misalignment in generated test scenarios. The approach is evaluated across multiple enterprise POS transaction categories, including payment processing, promotion and coupon validation, return and refund workflows, self-checkout operations, and multi-tender transactions. Empirical results indicate improved test case quality, expanded edge case coverage, and a reduction in manual test authoring effort compared to conventional approaches. In addition to the architectural contribution, this work presents a taxonomy of LLM failure modes specific to enterprise transaction systems and proposes mitigation strategies through domain context injection and validation pipelines. The findings highlight the importance of combining AI-driven automation with governance mechanisms to ensure reliability in production-grade testing environments. This research contributes to the broader field of AI-assisted quality engineering by demonstrating how LLMs can be effectively adapted for domain-critical enterprise systems where correctness, compliance, and transactional integrity are essential.
Prompt Engineering, Automated Testing, Retail Systems, Distributed Systems, Test Case Generation, Software Testing, Quality Engineering, Transaction Validation, Large Language Models (LLMs), Multi-Tender Transactions, AI-Assisted Testing, Enterprise POS Systems
Prompt Engineering, Automated Testing, Retail Systems, Distributed Systems, Test Case Generation, Software Testing, Quality Engineering, Transaction Validation, Large Language Models (LLMs), Multi-Tender Transactions, AI-Assisted Testing, Enterprise POS Systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
