A central problem in data integration and data cleansing is to identify
pairs of entities in data sets that describe the same real-world object.
Many existing methods for matching entities rely on explicit linkage rules,
which specify how two entities are compared for equivalence. Unfortunately,
writing accurate linkage rules by hand is a non-trivial problem that
requires detailed knowledge of the involved data sets. Another important
issue is the efficient execution of linkage rules.
In this thesis, we propose a set of novel methods that cover the complete
entity matching workflow from the generation of linkage rules using genetic