
PG-SB unifies ten datasets (five real, five synthetic) covering the domains of social networks, neuroscience connectomes, biomedicine, finance/leaks, communications, stream analytics, and internet measurements. For each dataset, we provide the ground-truth schema and enumerate the corresponding type patterns (node/edge patterns) observed in the data, capturing the structural variability of label and property co-occurrence. The benchmark includes a noise injection framework that (i) randomly removes 0-40 % of node/edge properties and (ii) varies label availability across three settings: 100% (all labels retained), 50% (half retained), and 0% (no labels), summing up to 150 test cases.(All dataset resources are inside the datasets.zip)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
