TuneR: Fine tuning of rule-based entity matchers

Publication date

2019-11-03

Authors

Paganelli, Matteo
Guerra, Francesco
Sottovia, Paolo
Velegrakis, YannisORCID 0000-0001-6332-0296ISNI 0000000125737584

Editors

Advisors

Supervisors

Document Type

Part of book
Open Access logo

License

Abstract

A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human intervention, since a typical dataset needs the definition of a large number of rules and possible interconnections that cannot be manually managed. In this paper, we propose TuneR, a software library supporting developers (i.e., coders, scientists, and domain experts) in tuning sets of matching rules. It aims to reduce human intervention by offering a tool for the optimization of rule sets based on user-defined criteria (such as effectiveness, interpretability, etc.). Our goal is to integrate the framework in the Magellan ecosystem, thus completing the functionalities required by the developers for performing Entity Matching tasks.

Keywords

Data deduplication, Data integration, Entity resolution, General Business,Management and Accounting, General Decision Sciences

Citation

Paganelli, M, Guerra, F, Sottovia, P & Velegrakis, Y 2019, TuneR : Fine tuning of rule-based entity matchers. in CIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, pp. 2945-2948, 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, 3/11/19. https://doi.org/10.1145/3357384.3357854, conference