|
The UCI Database Group |
Introduction |
Traditional data-integration systems use a centralized mediation approach, in which a centralized "mediator" accepts user queries and collects information from heterogeneous sources to compute answers. Recent database applications are seeing the emerging need to support data integration in distributed, peer-based environments. In such an environment, autonomous peers (sources) connected by a network are willing to exchange data and services with each other. These applications include supply-chain management, B2B integration, and Web-based file-sharing services (e.g., Napster, Gnutella, Morpheus). For instance, several labs at UC Irvine are conducting research related the Human Genome Project, and they are very willing to share their experimental results with each other. This sharing needs a distributed infrastructure, in which each lab provides its own data for other participants, as well as accesses data from other labs. As another example, recent terrorist attacks show the great need of new intelligence-sharing technologies, which can strengthen the ability to prevent, detect, and respond to existing and emerging homeland safety threats. Distributed data integration can make intelligence sharing effective to strengthen homeland security. The goal of the Raccoon Project is to allow different information sources to share and query their data with each other. We are developing an infrastructure to allow each source to publish its own data, and query on information from other peers. In addition, the Internet provides a huge amount of rich information. It will be desirable to allow users of different sources to access information from the Web as well. Recently XML is becoming a standard for data exchange. However, XML only makes sources "homogeneous" at the syntactic level, and it does not solve the heterogeneity of sources at the semantic level. There are also many problems we plan to pursue in the XML domain. This project is partially supported by the NSF CAREER Award, No. IIS-0238586, titled "CAREER: Peer-Based Data Integration and Sharing of Heterogeneous Sources," and the NSF RESCUE project under Award Number 0331707. |