
The study investigates whether Large Language Models (LLMs) can support inter-model consistency assessment across heterogeneous modeling paradigms. Specifically, the evaluation focuses on consistency reasoning between AUTOSAR and ROS2 modeling frameworks within cyber-physical systems (CPS). The replication package provides all artifacts necessary to reproduce the experimental results, including dataset construction, prompt templates, baseline implementations, evaluation scripts, aggregated results, and figure generation utilities. Reproducibility All experiments were conducted under controlled conditions: Fixed LLM model versions Standardized prompt templates Consistent inference parameters Structured JSON output enforcement Identical evaluation pipeline across models Vendor-level aggregation is computed as the arithmetic mean across all configurations, since each configuration is evaluated on the same dataset (50 instances). Detailed step-by-step reproduction instructions are provided in the included README file. Research Context This dataset enables controlled evaluation of LLM-based reasoning for multi-dimensional inter-model consistency in heterogeneous CPS modeling environments. The work contributes empirical evidence regarding: The comparative performance of LLMs versus traditional heuristic baselines The effect of prompting strategies on architectural reasoning Vendor-level differences in semantic and behavioral consistency assessment Limitations of LLMs in safety-critical modeling contexts To our knowledge, this represents one of the first systematic empirical evaluations of LLM-based reasoning for inter-model architectural consistency in heterogeneous CPS frameworks.
Inter-Model Consistency, Cyber-Physical Systems, Large Language Models, Model-Driven Engineering
Inter-Model Consistency, Cyber-Physical Systems, Large Language Models, Model-Driven Engineering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
