Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text

Authors: Nandana Mihindukulasooriya; Sanju Tiwari; Carlos F. Enguix; Kusum Lata;

Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text

Abstract

This is the repository for ISWC 2023 Resource Track submission for Text2KGBench: Benchmark for Ontology-Driven Knowledge Graph Generation from Text. Text2KGBench is a benchmark to evaluate the capabilities of language models to generate KGs from natural language text guided by an ontology. Given an input ontology and a set of sentences, the task is to extract facts from the text while complying with the given ontology (concepts, relations, domain/range constraints) and being faithful to the input sentences. It contains two datasets (i) Wikidata-TekGen with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19 ontologies and 4,860 sentences. An example An example test sentence: Test Sentence: {"id": "ont_music_test_n", "sent": "\"The Loco-Motion\" is a 1962 pop song written by American songwriters Gerry Goffin and Carole King."} An example of ontology: Ontology: Music Ontology Expected Output: { "id": "ont_k_music_test_n", "sent": "\"The Loco-Motion\" is a 1962 pop song written by American songwriters Gerry Goffin and Carole King.", "triples": [ { "sub": "The Loco-Motion", "rel": "publication date", "obj": "01 January 1962" },{ "sub": "The Loco-Motion", "rel": "lyrics by", "obj": "Gerry Goffin" },{ "sub": "The Loco-Motion", "rel": "lyrics by", "obj": "Carole King" },] } The data is released under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY 4.0) License. The structure of the repo is as the following. Text2KGBench src: the source code used for generation and evaluation, and baseline benchmark the code used to generate the benchmark evaluation evaluation scripts for calculating the results baseline code for generating the baselines including prompts, sentence similarities, and LLM client. data: the benchmark datasets and baseline data. There are two datasets: wikidata_tekgen and dbpedia_webnlg. wikidata_tekgen Wikidata-TekGen Dataset ontologies 10 ontologies used by this dataset train training data test test data manually_verified_sentences ids of a subset of test cases manually validated unseen_sentences new sentences that are added by the authors which are not part of Wikipedia test unseen test unseen test sentences ground_truth ground truth for unseen test sentences. ground_truth ground truth for the test data baselines data related to running the baselines. test_train_sent_similarity for each test case, 5 most similar train sentences generated using SBERT T5-XXL model. prompts prompts corresponding to each test file unseen prompts unseen prompts for the unseen test cases Alpaca-LoRA-13B data related to the Alpaca-LoRA model llm_responses raw LLM responses and extracted triples eval_metrics ontology-level and aggregated evaluation results unseen results results for the unseen test cases llm_responses raw LLM responses and extracted triples eval_metrics ontology-level and aggregated evaluation results Vicuna-13B data related to the Vicuna-13B model llm_responses raw LLM responses and extracted triples eval_metrics ontology-level and aggregated evaluation results dbpedia_webnlg DBpedia Dataset ontologies 19 ontologies used by this dataset train training data test test data ground_truth ground truth for the test data baselines data related to running the baselines. test_train_sent_similarity for each test case, 5 most similar train sentences generated using SBERT T5-XXL model. prompts prompts corresponding to each test file Alpaca-LoRA-13B data related to the Alpaca-LoRA model llm_responses raw LLM responses and extracted triples eval_metrics ontology-level and aggregated evaluation results Vicuna-13B data related to the Vicuna-13B model llm_responses raw LLM responses and extracted triples eval_metrics ontology-level and aggregated evaluation results This benchmark contains data derived from the TekGen corpus (part of the KELM corpus) [1] released under CC BY-SA 2.0 license and WebNLG 3.0 corpus [2] released under CC BY-NC-SA 4.0 license. [1] Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565, Online. Association for Computational Linguistics. [2] Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. Creating Training Corpora for NLG Micro-Planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 179–188, Vancouver, Canada. Association for Computational Linguistics.

Keywords

Large Language Models, Knowledge Graph, Knowledge Graph Generation, Benchmark, Relation Extraction

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 155
    download downloads 12
  • 155
    views
    12
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
155
12