ICDE
- D. V. Kalashnikov, Z.
Chen, R. Nuray-Turan, S. Mehrotra and N. Ashish. Disambiguation
algorithm for people search on the web. In the proceedings of IEEE
ICDE 20007 Conference. April, 2007
- B. On, N. Koudas, D. Lee, and D. Srivastava. Group Linkage. In ICDE 2007. April,
2007.[link]
- I. Mansuri and S. Sarawagi. A system for
integrating unstructured data into relational databases. In ICDE,
2006.[link]
- S. Chaudhuri, V. Ganti, and R.
Kaushik. A primitive operator for similarity joins in data cleaning. In
ICDE, 2006 . [link]
- S. Chaudhuri, V. Ganti, R.
Motwani. Robust identification of fuzzy duplicates. In ICDE, 2005.
[link]
- G. Bhalotia, A. Hulgeri,, C.
Makhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing
in databases using BANKS. In ICDE. 2002.[link]
VLDB
- D.
Menestrina, O. Benjelloun, H. Garcia-Molina. Generic
Entity Resolution with Data Confidences. In First Int'l VLDB
Workshop on Clean Databases,,2006.[link]
- A. Arasu, V. Ganti, R. Kaushik.Efficient
exact set-similarity joins. In VLDB, 2006.[link]
- L. Jin and C. Li. Selectivity
Estimation for Fuzzy String Predicates in Large Datasets. In VLDB,
2005. [link]
- M. Michaklowski, S. Thakkar, C.
A. Knoblock. Exploiting Secondary Sources for unsupervised Record
Linkage. In VLDB, 2004. [link]
- V. Verykios, G.V.
Moustakides, and M. Elfeky. A bayesian decision model for cost optimal
record matching. VLDB Journal, 2003. [link]
- R. Ananthakrishna, S. Chaudhuri,
and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In
VLDB Conference. 2002. [link]
- L. Gravano, P. Ipeirotis, H. Jagadish, N. Koudas,
S. Muthukrishnan, and D. Srivastava. Approximate string joins in a
database (almost) for free. In VLDB, 2001.[link]
- Y. Zhuang and L. Chen. In
network Outlier Cleaning for Data Collection in Sensor Networks. In CleanDB
Workshop.[link]
SIGMOD
- S. Chaudhuri, K. Ganjam, V. Ganti, R. Kapoor, V.
Narasayya, and T. Vassilakis. Data cleaning in Microsoft SQL server. In
SIGMOD, 2005. [link]
- X. Dong, A. Y. Halevy, and J. Madhavan. Reference
reconciliation in complex information spaces. In SIGMOD, 2005.
[link]
- S. Chaudhuri, K.
Ganjam, V. Ganti, R. Motwani. Robust and efficient fuzzy match for
online data cleaning. In SIGMOD, 2003. [link]
- A. E. Monge and C. P.
Elkan. An efficient domain-independent algorithm for detecting
approximately duplicate database records. In SIGMOD, 1997. [link]
- M. Hernandez and S.
Stolfo. The merge/purge problem for large databases. In SIGMOD, 1995.
[link]
- W. W. Cohen. Integration of heterogeneous
databases without common domains using queries based on textual
similarity. In SIGMOD, 1998. [link]
SDM
- B. On, D. Lee. Scalable
Name Disambiguation using Multi-level Graph Partition. In SIAM SDM, April 2007 [ link]
- I. Bhattacharya and L. Getoor. A latent
dirichlet model for unsupervised entity resolution. In SIAM SDM. 2006. [link]
- D. V. Kalashnikov, S.
Mehrotra, and Z. Chen. Exploiting relationships for domain independent
data cleaning. In SDM 2005. 2005. [link]
- B. Malin. Unsupervised
name disambiguation via social network similarity. In Workshop on
Link Analysis, Counterterrorism, and Security, 2005. [link]
KDD
- E. Agichtein and V. Ganti. Mining
reference tables for automatic text segmentation. In SIGKDD, 2004.
[link]
- I. Bhattacharya and L. Getoor. Deduplication
and group detection using links. In LinkKDD-04. 2004. [link
]
- M.
Bilenko and R. J. Mooney. Adaptive Duplicate Detection Using
Learnable String Similarity Measures. In SIGKDD. 2003. [link]
- M. Bilenko and R. J.
Mooney. On Evaluation and Training-Set
Construction for Duplicate Detection. In KDD 2003 Workshop. 2003.
- W. W. Cohen and J. Richman. Learning
to match and cluster high-dimensional data sets for data integration. In
SIGKDD, 2002. [link]
- S. Sarawagi and A. Bhamidipaty. Interactive
deduplication using active learning. In SIGKDD, 2002. [link]
- S. Tejada, C. A. Knoblock, and S.
Minton. Learning domain independent string transformation weights for
high accuracy object identification. In SIGKDD, 2002. [link]
- A. E. Monge and C.
Elkan. The field matching problem: Algorithms and applications. In
SIGKDD, 1996. [link]
- W. Cohen, H. Kautz,
and D. McAllester. Hardening soft information sources. In
SIGKDD, 2000. [link]
- I. Bhattacharya and
L. Getoor. Query-time entity resolution. In SIGKDD. 2006 [link]
- R. Holzer, B. Malin and L. Sweeney. Email
alias detection using social network analysis. In SIGKDD Workshop, 2005.[link]
- A. McCallum, K. Nigam, and L. H. Ungar. Efficient
Clustering of High-Dimensional Data Sets with Application to Reference
Matching. In ACM KDD, Boston, MA, 2000. [link]
DASFAA
- R. Nuray-Turan, D. V.
Kalashnikov and S. Mehrotra. Self-tuning in graph-based reference
disambiguation. In the Proceedings of DASFAA 2007. April,
2007.[link]
- L. Jin, C. Li, and S. Mehrotra. Efficient
Record Linkage in Large Data Sets. In DASFAA 2003, 2003. [link]
ICDM
1.
B. On, E. Elmacioglu, D. Lee, J.
Kang, and J. Pei. Improving grouped-entity resolution using quasi-cliques.
In ICDM 2006. December, 2006 [link]
2.
P. Singla and P. Domingos. Entity
resolution with markov logic. In ICDM 2006. December, 2006. [link]
3.
M. Bilenko, B. Kamath, and R. J.
Mooney. Adaptive Blocking: Learning to Scale Up Record Linkage and
Clustering. In ICDM. 2006. [link]
4.
M. Bilenko, S. Basu, and M. Sahami. Adaptive
Product Normalization: Using Online Learning for Record Linkage in Comparison
Shopping. In ICDM. 2005. [link]
PKDD
- L. Bolelli, S.
Ertekin, C. L.Giles. Clustering Scientific
Literature Using Sparse Citation Graph Analysis. 10th European Conference on
Principles and Practice of Knowledge Discovery in Databases (PKDD 2006):
30-41, 2006. [link]
- J. Huang, S. Ertekin, C. L. Giles. Efficient Name
Disambiguation for Large-Scale Databases. 10th European
Conference on Principles and Practice of Knowledge Discovery in Databases
(PKDD 2006): 536-544, 2006. [link]
JCDL
- B. On, E. Elmacioglu. D. Lee, J. Kang, and J. Pei. An effective approach to entity resolution
problem using quasi-clique and its application to digital libraries. In
JCDL. June, 2006.[link]
- Y. F. Tan, M-Y. Kan and D. Lee. Search Engine Driven Author Disambiguation. In JCDL. June,
2006. [link]
- H. Han, H. Zha, C. L. Giles. Name disambiguation in
author citations using a K-way spectral clustering method. Joint
Conference on Digital Libraries 2005 (JCDL 2005): 334-343, 2005.[link]
IQIS
- Z. Chen, D. V.
Kalashnikov and S. Mehrotra. Exploiting relationships for object
consolidation. In IQIS. 2005. [link]
- A. Al-Lawati. D. Lee.
And P. McDaniel. Blocking-aware private record linkage. In IQIS.
2005. [link]
- D. Lee, B. On. J.
Kang and S. Park. Effective and scalable solutions for mixed and split
citation problems in digital libraries. In IQIS. 2005. [link]
IJCAI
1.
P. Kanani, A. McCallum, and C. Pal. Improving
author coreference by resource-bounded information gathering from the web. In
IJCAI. 2007. [link]
2.
S. Hill. Social network relational vectors for anonymous
identity matching. In IJCAI, 2005. [link]
3. B. Milch,
B. Marthi, D. Sontag, S. Russell, D. L. Ong, and A. Kolobov. Blog:
Probabilistic models with unknown objects. In IJCAI, 2005. [link]
AAAI
- X. Li, P. Morie, and
D. Roth. Identification and tracing of ambiguous names: discriminative
and generative approaches. In AAAI, 2004. [link]
- W. Shen, X. Li and A.
Doan. Constraint-based entity matching. In AAAI 2005. 2005
NIPS
1.
A. McCallum and B. Wellner. Conditional
models of identity uncertainty with application to noun coreference. In
NIPS. 2004.[link]
2. H.
Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity
uncertainty and citation matching. In NIPS, 2002. [link]
SIGIR
1. E.
Minkov, W. W. Cohen, and A. Y. Ng. Contextual Search and Name Disambiguation
in Email using Graphs. In SIGIR-2006. [link]
2. J.
Artiles, J. Gonzalo, an S. Sekine. A testbed for people searching strategies
in the WWW. In SIGIR. 2005.[link]
Journals
and Other Conferences/Workshops
- D. V. Kalashnikov and
S. Mehrotra. Domain-independent data cleaning via analysis of
entity-relationship graph. In ACM TODS. June, 2006.[link]
- G. Navarro. A
guided tour to approximate string matching. ACM Computing Surveys, 2001
[link]
- R. Bekkerman and A.
McCallum. Disambiguating web appearances of people in a social network.
In WWW. 2005. [link]
- A. Culotta and A.
McCallum. Joint deduplication of multiple record types in relational
data. In CIKM. 2005. [link]
- P. Ravikumar and W.
W. Cohen. A hierarchical graphical model for record linkage. In
UAI, 2004. [link]
- A. McCallum, K.
Bellare and F. Pereira. A conditional random field for
discriminatively-trained finite-state string edit distance. In UAI.
2005. [link]
- E. Ristad, and P.
Yianilos. Learning string edit distance. IEEE Trans. Pattern
Analysis and Machine Intelligence, 1998. [link]
- I. Bhattacharya and L. Getoor. Relational
clustering for multi-type entity resolution. In MRDM.2005. [link]
- P. Singla and P.
Domingos. Multi-relational record linkage. In MRDM, 2004. [link]
- V. Sehgal, L. Getoor,
and P. Viechniki. Entity resolution in geospatial data integration. In
GIS, 2006. [link]
- I. Bhattacharya and L. Getoor. Iterative
record linkage for cleaning and integration. In DMKD’04. DMKD.
[link]
- O. Benjelloun, H. Garcia-Molina,
H. Kawai, T. E. Larson, D. Menestrina, Q. Su, S. Thavisomboon, J. Widom. Generic Entity Resolution in the SERF
Project. IEEE Data Engineering Bulletin, June 2006. [link]
- L. Gravano, P. Ipeirotis, H. Jagadish, N. Koudas,
S. Muthukrishnan, L. Pietarinen, and D. Srivastava. Using qgrams in a
DBMS for approximate string processing. IEEE Data Engineering
Bulletin, 24(4):28–34, 2001.
[link]
- M. Lee, W. Hsu, and
V. Kothari. Cleaning the spurious links in data. IEEE
Intelligent Systems. 2004. [link]
- R. Al-Kamha and D.W. Embley. Grouping
Search-Engine Returned Citations for Person Name Queries. In
WIDM’04, 2004. [link]
- W. W. Cohen, P.
Ravikumar, and S. E. Fienberg. A comparison of string distance metrics
for name-matching tasks. IIWeb Workshop, 2003. [link]
- P. Christen, T. Churches, and J. X. Zhu. Probabilistic
name and address cleaning and standardization. The Australian Data
Mining Workshop,
- 2002. [link]
- W. E. Winkler. Methods for
record linkage and Bayesian networks. Technical Report, US Census Bureau, 2002. [link]
- E. Cohen and D.
Lewis. Approximating matrix multiplication for pattern recognition
tasks. J. Algorithms. 30(2): 211-252. [link]
- I. Fellegi and A.
Sunter. A theory for record linkage. Journal of Amer.
Statistical Association. 1969 [link]
- J. Maletic and A.
Marcus. Data cleansing: Beyond integrity checking. In Conf. on
Information Quality, 2000. [link]
- M. Jaro. Probabilistic
linkage of large public health data files. Statistics in medicine,
1995. [link]
- M. Jaro. Advances
in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of Amer. Statistical Association, 1989.
- M. Lee, H. Lu, T.
Ling, and Y. Ko. Cleansing data for mining and warehouse. In
DEXA, 1999. [link]
- S. Tejada, C. A. Knoblock,
and S.Minton. Learning object identification rules for information
integration. Information Systems Journal, 2001. [link]
- W. E. Winkler. The
state of record linkage and current research problems. Technical Report, US Census Bureau, 1999. [link]
- M. Bilgic. L.
Licamele, L. Getoor, and B. Schneiderman. D-dupe: An interactive tool
for entity resolution in social networks. In IEEE VAST ,2006.[link]
- A.
Culotta and A. McCallum. Tractable learning and inference with
higher-order representations. In ICML Workshop on Open Problems in
Statictical Relational Learning. 2006. [link]
- E. Minkov and W. W. Cohen. An Email and
Meeting Assistant using Graph Walks. In CEAS-2006. [link]
- J. Hassell, B. Aleman-Meza, and I. B. Arpinar.
Ontology-driven automatic entity disambiguation in unstructured text.
In 5th International Semantic Web Conference (ISWC2006), 2006. [link]
- J. Kang, D. Lee and
P. Mitra. Identifying value mappings for data integration: an
unsupervised approach. In WISE. 2005. [link]
- X. Li, P. Morie, and D.Roth. Semantic integration
in text: From ambiguous names to identifiable entities. AI
Magazine. Special issue on semantic integration. 2005. [link]
- H. Newcombe, J.
Kennedy, S. Axford, and A. James. Automatic linkage of vital records. Science,
1959.