
This dataset comprises synthetic UML diagrams, explicitly focusing on activity and sequence diagrams generated using PlantUML—a text-based tool for creating visual diagrams. By leveraging randomized text strings based on PlantUML syntax, we produced a diverse and scalable collection that emulates standard UML diagrams. Each diagram is accompanied by its corresponding PlantUML code, facilitating a clear understanding of the visual representation's textual foundation. Data from smaller datasets is reused in the larger datasets, as each model was trained on the data separately, as described in the original paper. It's recommended just to use the Extra Large dataset when interested in using the data in its entirety. Each category is divided into four subsets based on size (approximately): Small: 6,000 training diagrams and 1,500 testing diagrams. Medium: 12,000 training diagrams and 3,000 testing diagrams. Large: 24,000 training diagrams and 6,000 testing diagrams. Extra Large: 120,000 training diagrams and 30,000 testing diagrams.
Software Engineering, Unified Modeling Language
Software Engineering, Unified Modeling Language
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
