Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

AI4DiTraRe: Towards LLM-Based Information Extraction for Standardising Climate Research Repositories

Authors: Jacyszyn, Anna M.; Jiang, Shufan; Gesese, Genet Asefa; Hertling, Sven; Kerzenmacher, Tobias; Nowack, Peer; Barthlott, Sabine; +2 Authors

AI4DiTraRe: Towards LLM-Based Information Extraction for Standardising Climate Research Repositories

Abstract

In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.

In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.

This position paper was accepted for publication in the First AAAI Bridge on Artificial Intelligence for Scholarly Communication AI4SC, 25-26 February 2025 - Philadelphia, Pennsylvania, USA; co-located with the 39th AAAI Conference on Artificial Intelligence (AAAI-25).

This short publication consists of two pages of main body together with two pages of references and an appendix.

Keywords

climate research, metadata standardisation, large language models, digitalisation, information extraction

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Upload OA version
Are you the author? Do you have the OA version of this publication?