descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2021Embargo end date: 01 Jan 2019Publisher:IEEEJournal:2021 IEEE 15th International Conference on Semantic Computing (ICSC)

Authors: Röder, Michael; de Souza, Geraldo; Kuchelev, Denis; Desouki, Abdelmoneim Amer; Ngomo, Axel-Cyrille Ngonga;

doi: 10.1109/icsc50631.2021.00054 , 10.48550/arxiv.1912.08026

arXiv: 1912.08026

ORCA - a Benchmark for Data Web Crawlers

- Summary
- Subjects
- Related research
  (5)
- Metrics

Abstract

The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://github.com/dice-group/orca.

8 pages, submitted to a conference

Related Organizations

University of Paderborn
Germany
Institute of Applied Building Informatics
Germany

Keywords

Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Databases, Databases (cs.DB)

5 Research products, page 1 of 1

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average