--------------------------------------------------------------------- RACCOON: A Peer-Based System for Data Integration and Sharing The RACCOON Project, http://www.ics.uci.edu/~raccoon/ Release v 2.0 November 29, 2004 Copyright (c) 2004 by Database Group Department of Computer Science University of California, Irvine Irvine, CA 92697 Qi Zhong (qzhong@ics.uci.edu) Jia Li (jiali@ics.uci.edu) Chen Li (chenli@ics.uci.edu) University of California, Irvine Partially supported by the National Science Foundation under the CAREER-Award Grant IIS-0238586. Please send technical questions about this release to raccoon@ics.uci.edu . --------------------------------------------------------------------- This software was created by members of the Database Group at UC Irvine and is distributed free of charge. It is placed in the public domain and permission is granted for anyone to use, duplicate, modify and redistribute it provided this notice is attached. There is absolutely NO WARRANTY OF ANY KIND with respect to this software; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL ANY PARTY BE LIABLE TO ANYONE FOR ANY DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, INCLUDING, WITHOUT LIMITATION, DAMAGES RESULTING FROM LOST DATA OR LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES. The package complies to the GNU copyright terms at http://www.gnu.org/copyleft/gpl.html. We include a copy of the terms in gpl.txt in this distribution package. It also complies to the copyright terms of University of California, Irvine. The information in this software is subject to change without notice and should not be construed as a commitment by any employees of the Database Group or any other employee of University of California. --------------------------------------------------------------------- A brief description of this demo is described in the following paper: RACCOON: A Peer-Based System for Data Integration and Sharing. Chen Li, Jia Li, Qi Zhong. International Conference on Data Engineering (ICDE), Demo Track, Boston MA, USA, March, 2004. --------------------------------------------------------------------- - Introduction Traditional data-integration systems use a centralized mediation approach, in which a centralized mediator accepts user queries and collects information from heterogeneous sources to compute answers. Recent database applications are seeing the emerging need to support data integration and sharing in distributed, peer-based environments. In such an environment, autonomous peers (sources) connected by a network are willing to exchange data and services with each other. The goal of the Raccoon Project is to allow these different information sources to share and query their data with each other. This release includes the implementation of the Raccoon system as of the release date. This readme file describes how to run the demo. - Platform Requirements: Any Java-compatible environment with a Java 2 SDK. (We tested the code on Java(TM) 2 Runtime Environment, Standard Edition, version "1.4.1_01".) - Installation 1. Download the source code from http://www.ics.uci.edu/~raccoon/ . 2. Unzip the source code. 3. Type the following commands to compile the code. On a Windows environment: javac -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class .\Raccoon\Raccoon.java rmic -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class Raccoon.NET.NetModuleImpl On a Unix environment: javac -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class ./Raccoon/Raccoon.java rmic -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class Raccoon.NET.NetModuleImpl - Configuration Before starting to run the demo, you need to configure the peer-to-peer database network. If you want to use some databases provided by the Raccoon Project, you can skip this step and go to the "Running the demo" step by using the provided files "10001.xml", ..., "10005.xml". In this case, you will be using five peers running on two databases (MySQL) on two machines of the Raccoon Project. If you do want to set up your own peers and network, you need to do the following. For each peer, create an XML file similar to the sample file "10001.xml" provided in the package. The following is part of the example file 10004.xml. (1) <PEERVIWER> (2) <RMIPORT port = "10004"/> (3) <NEIGHBOR ip="127.0.0.1" port="10001"/> (4) <NEIGHBOR ip="127.0.0.1" port="10005"/> (5) <DATABASE system ="MySQL" host="128.195.38.176" dbName = "peerDB" user="demo" passwd="demo" /> (6) <NODE_INFO> (7) <NODE_INFO_NAME>OCHousing</NODE_INFO_NAME> (8) <NODE_INFO_DESC>Orange County Housing</NODE_INFO_DESC> (9) <NODE_INFO_BW>56Kbps</NODE_INFO_BW> (10) <NODE_INFO_SIZE>100M</NODE_INFO_SIZE> (11) <NODE_INFO_TYPE>Physical Data</NODE_INFO_TYPE> (12) <RELATION name = "OCHousing" /> (13) </NODE_INFO> (14) </PEERVIWER> Line (2) shows that this peer is using port 10004 to communicate with other peers. Any unoccupied port can be used here. Line (3) shows that this peer has a neighbor running on machine "127.0.0.1" and using port 10002. You can specify any IP, but you have to make sure that there is a peer running on that IP with that port. Notice that this machine is a remote machine, which is different from the machine on which you are running this demo. Thus this port number is different from the RMI port number. Line (4) shows another neighbor peer. (It happens to be on the same IP machine as the previous peer. In general you can specify any IP machine with a port as a neighbor, as long as that a peer is really running on that port.) If you run this demo with several peers on the same machine, then you need to specify the new IP address for all these peers' XML files. Line (5) specifies the database for the local peer. The system is MySQL and it is running on 128.195.38.176. The database name is "peerDB" using "demo" and "demo" as the user and password. Currently our system supports MySQL, Informix, and DB2. The IP address is not limited on the local machine. The following two databases examples are available for the demo: <DATABASE system ="MySQL" host="CSE104g.ics.uci.edu" dbName="peerDB" user="demo" passwd="demo"/> <DATABASE system ="MySQL" host="RESCUE15.ics.uci.edu" dbName="peerDB" user="demo" passwd="demo"/> !!!NOTICE!!! You have to make sure that the username and password are correct and the user has the right to do queries on the database. In addition, KEEP THIS XML FILE PROPERLY, SINCE IT MAY CONTAIN INFORMATION ABOUT THE USER AND PASSWORD. Lines (6)~(13) specify the information about this local peer. Line (7) gives the name of the peer. Line (8) gives the description of peer. Line (9) shows the network bandwidth. Currently this information is not used. Line (10) gives the size of the data set at this peer. Line (11) gives the type of the data set. Line (12) gives the name of the relation you want to share. This relation must be defined in the database specified in Line (5) - Running the Demo After writing configuration XML files (e.g. file 10001.xml) to specify the peer network, you can start each peer using the corresponding configuration file by typing the following command. (The following is using 10001.xml as an example.) Windows: java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml Unix: java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml This command will start a peer with a PeerViewer browser on your system. Alternatively, you could also type the following command. Windows: java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy Unix: java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy This command will start the peer in a "dummy" mode. In this mode, the peer can only answer queries from other peers, but it cannot issue queries. In addition, there will be no browser for a peer in a dummy mode. A peer in a dummy mode uses much fewer resources. Because of some limitations, currently our implementation can access only one database on a DBMS server from an IP address. Thus, if you are simulating multiple peers on one machine, either configure each of them to access different DBMS servers, or configure them to use the same database on the same DBMS server. - Functionalities: The current implementation provides the following features. o A user can navigate the peer network. Each peer is represented by a colored node in the browser, called PeerViewer. By right clicking each node and choosing "Expand Node," the user can explore its neighboring nodes. If the user puts the mouse on a node for a second, all the information about the node will be listed on the right-hand side, including the network bandwidth, relation schema, etc. o The user can search in the network for relations that are "similar" to a given relation. This search can be done by right clicking the local node and choosing "Search Node." The user needs to specify a relation in the local database. This search will be propagated in the network. For each peer, we use schema-mapping techniques to identify relations that are "similar" to the given relation. All the peers with similar relations will be returned. o After a search is done, the system will return all the peers with similar relations. If the user is interested in a returned relation, she can click the "Add Mapping" button, and the system will compute an attribute-level mapping based on the similarity between each pair of attributes. (Each mapping is directed.) The user can validate the mapping, and add this mapping into the system. This mapping will be stored in the two peers to be used to answer an "Extended Query" (see below). o Querying the peers. A user can issue queries on the system. There are two kinds of querying modes. (1) Focused Querying Mode: The user can pose a query on peer relations. In this mode, the system will answer the query by accessing the specified relations only, without expanding the query to other relations. For instance, suppose peer A has a relation student(id, name), and peer B has a relation exam(id, grade). The user issues the following query: SELECT A:student.name, B:exam.grade FROM A:student, B:exam WHERE A:student.id = B:exam.id; Here each query should have a prefix to indicate the peer name. Notice that each query statement should end with a semicolon. In the focused querying mode, the system will compute the answers by using the two specified relations. (2) Extended Querying Mode. In such a mode, for each query, a peer with a specified relation will utilize available mappings of the relation, and expand the "subquery" on this relation to other relations that have mappings with this relation. The motivation of supporting this querying mode is to allow the user to get as much information as possible to answer a user query. For instance, in the example above, suppose peer B has a mapping between B.exam with another relation "exam2" at a different peer C. Then peer B will translate the condition on B.exam to the relation C.exam2 to get more information to answer the query. - Sample Testing Scripts If you want to do a quick test, you can use five peers provided by the Raccoon Project. Currently these peers are running on two MySQL databases on two machines of the Raccoon Project at UC Irvine. Please contact the authors if you have problems starting these peers. To use these five peers, do the following: 1) Open a command line window and start a peer with 10001.xml. Windows: java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml Unix: java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy 2) Open a command line windows and start a peer with 10002.xml. 3) Open a command line windows and start a peer with 10003.xml. 4) Open a command line windows and start a peer with 10004.xml. 5) Open a command line windows and start a peer with 10005.xml. Be sure to modify these five XML files to specify the IP address (your machine) properly. You don't have to modify anything if you are running all the peers on one machine. If you want to use your own database, you need to create the database. You may use the commands in the file "sample/sample-data.sql" to create and populate the different tables/databases. You can use the queries in the file "sample/test_script.txt" to test searches and queries. - System Architecture The current system has five modules: Net, GUI, Resource Manager, Search Manager, and Query Manager, as illustrated by the file "./architecture.jpg" (1) Net: This module does all the communication tasks among different peers. It is implemented using RMI. (2) GUI: It displays the network and supports different operations. It is based on GraphLayout library offered by TouchGraph Inc. (3) Resource Manager: It stores the following resources: * Peers known to a local node; * Mapping links created by user; * An interface dealing with data stored in local databases. (4) Search Manager: It supports searches for similar relations. Currently we use a basic schema-mapping algorithm to compute the similarity between two schemas. (5) Query Manager: It processes user queries. It contains a parser, which parses an SQL query and builds a query tree. It has an engine that executes the query tree. In the implementation, we do simple optimization, e.g., pushing selections and projections down the tree. - Used Libraries: o java_cup and JLEX: used to generate the SQL Parser. o db2java.zip: JDBC driver for IBM DB2. o ifxjdbc.jar and ifxjdbc-g.jar: JDBC Driver for Informix. o mysql-connector-java-3.0.8-stable-bin.jar: JDBC Driver for MySQL. o nanoxml-2.1.1.jar: a lightweight XML parser. o BrowserLauncher.jar: a package to display an HTML page in a java component. o TouchGraph: a GraphLayOut software to display a network. The original package could be downloaded from www.touchgraph.com . It has been modified in this implementation. The following are the major changes: * Change some methods in LB to "public." Otherwise we cannot extend them to call their methods. * Change Node and LBNode to be "implements Serializable," because we are using RMI. * Node has edge info in it. When you pass a node, you pass the edge along with it. We unpack the code and put it under the com/ subdirectory. For the latest information about the Raccoon Project, visit its home page: http://www.ics.uci.edu/~raccoon Enjoy. Last updated: November 29, 2004.