PG-SB: A Benchmark for Schema Discovery in Property Graphs

PG-SB unifies ten datasets (five real, five synthetic) covering the domains of social networks, neuroscience connectomes, biomedicine, finance/leaks, communications, stream analytics, and internet measurements. For each dataset, we provide the ground-truth schema and enumerate the corresponding type patterns (node/edge patterns) observed in the data, capturing the structural variability of label and property co-occurrence. The benchmark includes a noise injection framework that (i) randomly removes 0-40 % of node/edge properties and (ii) varies label availability across three settings: 100% (all labels retained), 50% (half retained), and 0% (no labels), summing up to 150 test cases.(All dataset resources are inside the datasets.zip)

Related Organizations

Grenoble Computer Science Laboratory
France
Harokopio University
Greece
French National Centre for Scientific Research
France
Foundation for Research and Technology Hellas
Greece
University of Crete
Greece

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average