ICS215 - Paper Readings (Tentative)
Web: Search, Structure, and Data Extraction
- Steve Lawrence and C. Lee Giles,
Searching the World Wide Web, Science, 1998.
- Sergey Brin and Lawrence Page, The Anatomy of a
Large-Scale Hypertextual Web Search Engine WWW7/Computer
Networks 30(1-7): 107-117, 1998.
- Jon M. Kleinberg, Authoritative
Sources in a Hyperlinked Environment, Journal of ACM 46(5):
604-632, 1999.
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and
Andrew Tomkins,
Trawling the Web for emerging cyber-communities, WWW 1999.
- Sergey Brin, Extracting
Patterns and Relations from the World Wide Web, WebDB Workshop,
1998.
- Anand Rajaraman and Jeffrey D. Ullman,
Querying Websites Using Compact Skeletons, PODS 2001.
Data Integration
Peer-Based Data Integration and Sharing
- Bernstein, et. al, "Data Management for Peer-to-Peer
Computinng: A vision", WedDB 2002.
- Wee Siong Ng et al, "PeerDB: A P2P-based System for
Distributed Data Sharing", ICDE 2003.
- Halevy et al, "Schema Mediation in Peer Data Management
Systems", ICDE 2003.
- Jayant Madhavan and Alone Halevy, "Composing Mappings Among
Data Sources," VLDB 2003.
Data Cleansing
- Mauricio Hernandez and Salvatore Stolfo, The
Merge/Purge Problem for Large Databases, SIGMOD 1995.
- L. Gravano, P. Ipeirotis, H. V. Jagadish, N. Koudas,
S. Muthukrishnan, and D. Srivastava, Approximate
String Joins in a Database (Almost) for Free, VLDB 2001.
- Liang Jin, Chen Li, and Sharad Mehrotra. Efficient
Similarity String Joins in Large Data Sets. DASFAA, 2003.
- Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev
Motwani. Robust and Efficient Fuzzy Match for Online Data
Cleaning, SIGMOD 2003.
- Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh Ganti.
Eliminating Fuzzy Duplicates in Data Warehouse. VLDB 2002.
-
Mohamed G. Elfeky, Ahmed K. Elmagarmid, Vassilios S. Verykios.
TAILOR: A Record Linkage Tool Box. ICDE 2002.
-
Vijayshankar Raman and Joseph M. Hellerstein. Potter's Wheel: An
Interactive Data Cleaning System. VLDB 2001.
-
Sunita Sarawagi, Anuradha Bhamidipaty, Alok Kirpal, and Chandra
Mouli. Alias: An active learning led interactive deduplication
system. VLDB 2002.
-
Dominik Luebbers, Udo Grimmer, Matthias Jarke. Systematic
Development of Data Mining-Based Data Quality Tools. VLDB 2003.
-
Flip Korn, S. Muthukrishnan, Yunyue Zhu. Checks and Balances:
Monitoring Data Quality Problems in Network Traffic Databases.
VLDB 2003.
-
S. Cenk Sahinalp, Murat Tasan, Jai Macker, Z. Meral Ozsoyoglu.
Distance Based Indexing for String Proximity Search. ICDE 2003.
-
H. Jagadish, N. Koudas, S. Muthukrishnan. On Effective
Multi-Dimensional Indexing For Strings. SIGMOD 2000.
Ranking queries
Data Mining:
Disclaimer
These documents are made available as a means to ensure timely
dissemination of scholarly and technical work on a non-commercial
basis. Copyright and all rights therein are maintained by the
authors or by other copyright holders, notwithstanding that they
have offered their works here electronically. It is understood that
all persons copying this information will adhere to the terms and
constraints invoked by each copyright holder. These works may not
be reposted without the explicit permission of the copyright
holder.