
Achieving semantic interoperability of research data is key to enabling cross-domain data integration, reuse, and knowledge discovery [1]. While the need to align heterogeneous datasets using shared vocabularies and ontologies is widely recognized, doing so remains a considerable challenge in practice [2], [3]. Researchers face several challenges: • (C1) Lack of expertise in ontologies: Many researchers are unfamiliar with ontology engineering and semantic annotation. • (C2) Absence of established domain ontologies: While some domains, such as medicine, have well-established vocabularies, other domains, such as production engineering, may lack suitable or widely adopted ontologies, making it difficult to identify reusable options. • (C3) Technical barriers: The knowledge required to work with technologies such as RDF or mapping tools often presents an entry barrier. • (C4) Tool heterogeneity: Working with multiple disconnected tools adds cognitive and technical overhead. • (C5) Limited resources: Researchers typically face time constraints, making it difficult to invest in familiarizing themselves with complex tools or processes. • (C6) Proprietary solutions: Many semantic mapping tools (e.g., Talend [4]) are proprietary and not suitable for scientific work. To address these challenges, we present KONDA, an LLM-based tool that supports semantic enrichment of research datasets and the construction of explorable knowledge graphs within a single integrated workflow. The KONDA workflow is as follows: • An interface prompts the user to upload their research dataset, along with optional supplementary documents (e.g., protocols, DMPs, README files) to provide the tool with context. • The user is supported in the selection of suitable ontologies via a direct integration with the TIB Terminology Service [5], with the option to add custom ontologies. • The tool performs automated LLM-based semantic annotation of the dataset using the provided context and selected ontologies. A feedback screen enables the user to review and correct annotations. • The annotated data is provided in RDF format with an immediate visualization as a knowledge graph. KONDA's architecture comprises a user interface, a server backend managing sessions and data processing, and an API layer that connects the tool to an LLM, where the semantic enrichment is conducted with techniques such as named entity recognition, relation extraction, and ontology-based annotation. Through KONDA, a guided, interactive tool is provided in which users receive LLM-assisted suggestions and the opportunity to intuitively explore their enriched data directly through automated knowledge graph creation, thus reducing required technical or formal training in semantic technologies (C1, C3). The discovery of reusable ontologies is enabled through the integration of terminology services (C2). KONDA unifies the pipeline within a single, cohesive environment (C4). The tool's semi-automated workflow provides fast and visually supported results with minimal manual effort (C5) while retaining opportunities for human feedback to ensure output quality. Finally, KONDA's modular backend supports the deployment of both proprietary and open LLMs (C6). KONDA empowers researchers to semantically enrich their datasets with minimal effort, offering an integrated and adaptable solution. Future development will focus on persistent graph storage, automated ontology recommendations, and evaluation in real-world settings. By leveraging LLMs and emphasizing usability, KONDA provides a robust foundation for advancing data interoperability across disciplines.
Large Language Models, Semantic Annotation, Ontologies, Research Data Management, Knowledge Graphs, Interoperability
Large Language Models, Semantic Annotation, Ontologies, Research Data Management, Knowledge Graphs, Interoperability
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
