Exploiting relationships for object consolidation.Appeared in ACM IQIS Workshop co-located with ACM SIGMOD 2005.Zhaoqi Chen, Dmitri V. Kalashnikov, and Sharad Mehrotra
Computer Science Department Abstract
Data mining practitioners frequently have to spend
significant portion of their project time on data preprocessing
before they can apply their algorithms on real-world datasets.
Such a preprocessing is required because many real-world
datasets are not perfect, but rather they contain missing,
erroneous, duplicate data and other data cleaning problems.
It is a well established fact that, in general, if
such problems with data are not corrected, applying
data mining algorithm can lead to wrong results.
The latter is known as the "garbage in, garbage out" principle.
Given the significance of the problem, numerous data cleaning
techniques have been designed in the past to address the aforementioned
problems with data. Categories and Subject Descriptors:H.2.m [Database Management]: Miscellaneous – Data cleaning;H.2.8 [Database Management]: Database Applications – Data mining; H.2.5 [Information Systems]: Heterogeneous Databases; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Keywords:GDF, relationship-based data cleaning, object consolidation, record linkage, data mining,
Downloadable files:Paper: IQIS05_dvk.pdfPresentation: IQIS05_dvk.ppt BibTeX entry:@inproceedings{IQIS05::dvk, author = {Zhaoqi Chen and Dmitri V. Kalashnikov and Sharad Mehrotra}, title = {Exploiting relationships for object consolidation}, booktitle = {Proc. of International ACM SIGMOD Workshop on Information Quality in Information Systems (ACM IQIS 2005)}, year = {2005}, month = {June 17}, address = {Baltimore, MD, USA} }Back to Kalashnikov's homepage |