
doi: 10.1145/3639363
This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G . Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric simulation identifies tuples t in 𝒟 and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart, i.e., it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop HER , a parallel system to check whether ( t, v ) makes a match, find all vertex matches of t in G , and compute all matches across 𝒟 and G , all in quadratic-time; moreover, HER supports incremental computation of these in response to updates to 𝒟 and G . Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database 𝒟 and graph G for both batch and incremental computations.
Database theory, Learning and adaptive systems in artificial intelligence, incremental algorithm, entity resolution, knowledge graph, relational database, Graph theory (including graph drawing) in computer science, parallelization, relative boundedness, Parallel algorithms in computer science
Database theory, Learning and adaptive systems in artificial intelligence, incremental algorithm, entity resolution, knowledge graph, relational database, Graph theory (including graph drawing) in computer science, parallelization, relative boundedness, Parallel algorithms in computer science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
