Abstract: Duplicate detection, which is an important subtask of data cleaning, is the task of identifying multiple representations of a same real-world object and necessary to improve data quality.