Pattern Sampling in Distributed Databases

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object , Article 01 Jan 2020 France English Publisher:Springer International PublishingFunded by:FCT | D4

Authors: Diop, Lamine; Diop, Cheikh Talibouya; Giacometti, Arnaud; Soulet, Arnaud;

doi: 10.1007/978-3-030-54832-2_7

Pattern Sampling in Distributed Databases

- Summary
- Subjects
- Metrics

Abstract

Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication of extracted patterns from the different nodes. To circumvent this difficulty, this paper revisits the problem of pattern mining in distributed databases by benefiting from pattern sampling. Specifically, we propose the algorithm DDSampling that randomly draws a pattern from a distributed database with a probability proportional to its interest. We demonstrate the soundness of DDSampling and analyze its time complexity. Finally, experiments on benchmark datasets highlight its low communication cost and its robustness. We also illustrate its interest on real-world data from the Semantic Web for detecting outlier entities in DBpedia and Wikidata.

Country

France

Related Organizations

National Research Institute for Agriculture, Food and Environment
France
Laboratory of Fundamental and Applied Computer Science of Tours
France
Centre Val de Loire
France
François Rabelais University
France
Université François-Rabelais Tours
France

View all View all

Keywords

[INFO.INFO-WB] Computer Science [cs]/Web, [INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

Green

Funded by

FCT| D4