OpenBioLink2020

The OpenBioLink2020 Dataset is a highly challenging biomedical benchmark dataset containing over 5 million positive and negative edges. The test set does not contain trivially predictable, inverse edges from the training set and does contain all different edge types, to provide a more realistic edge prediction scenario. For further information, please check out the github repository. OpenBioLink2020: directed, high quality is the default dataset that should be used for benchmarking purposes. To allow anayzing the effect of data quality as well as the directionality of the evaluation graph, four variants of OpenBioLink2020 are provided -- in directed and undirected setting, with and without quality cutoff. Additionally, each graph is available in RDF N3 format (without train-validation-test splits) and BEL. OpenBioLink is a resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data. It contains benchmark datasets as well as tools for creating custom benchmarks and training and evaluating models. The OpenBioLink benchmark aims to meet the following criteria: Openly available Large-scale Wide coverage of current biomedical knowledge and entity types Standardized, balanced train-test split Open-source code for benchmark dataset generation Open-source code for evaluation (independent of model) Integrating and differentiating multiple types of biological entities and relations (i.e., formalized as a heterogeneous graph) Minimized information leakage between train and test sets (e.g., avoid inclusion of trivially inferable relations in the test set) Coverage of true negative relations, where available Differentiating high-quality data from noisy, low-quality data Differentiating benchmarks for directed and undirected graphs in order to be applicable to a wide variety of link prediction methods Clearly defined release cycle with versions of the benchmark and public leaderboard Please note that the OpenBioLink benchmark files contain data derived from external ressources. Licensing terms of these external resources are detailed here.

{"references": ["Anna Breit, Simon Ott, Asan Agibetov, Matthias Samwald, OpenBioLink: A benchmarking framework for large-scale biomedical link prediction, Bioinformatics, , btaa274, https://doi.org/10.1093/bioinformatics/btaa274"]}

Related Organizations

Medical University of Vienna
Austria

Keywords

biomedical link prediction, knowledge graph, large scale

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	53
download	downloads	229

53
views
229
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

53

229