---------------------------------------------------------------------
RACCOON: A Peer-Based System for Data Integration and Sharing
The RACCOON Project, http://www.ics.uci.edu/~raccoon/
Release v 2.0
November 29, 2004

Qi Zhong (qzhong@ics.uci.edu)
Jia Li (jiali@ics.uci.edu)
Chen Li (chenli@ics.uci.edu)
University of California, Irvine

Partially supported by the National Science Foundation under the
CAREER-Award Grant IIS-0238586.

Please send technical questions about this release to raccoon@ics.uci.edu .
---------------------------------------------------------------------

This software was created by members of the Database Group at UC
Irvine and is distributed free of charge. It is placed in the public
domain and permission is granted for anyone to use, duplicate, modify
and redistribute it provided this notice is attached.

There is absolutely NO WARRANTY OF ANY KIND with respect to this
software; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL ANY PARTY BE LIABLE
TO ANYONE FOR ANY DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE,
INCLUDING, WITHOUT LIMITATION, DAMAGES RESULTING FROM LOST DATA OR
LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES.

The package complies to the GNU copyright terms at
http://www.gnu.org/copyleft/gpl.html. We include a copy of the terms
in gpl.txt in this distribution package. It also complies to the
copyright terms of University of California, Irvine.

The information in this software is subject to change without notice
and should not be construed as a commitment by any employees of the
Database Group or any other employee of University of California.
---------------------------------------------------------------------

A brief description of this demo is described in the following paper:

RACCOON: A Peer-Based System for Data Integration and Sharing. Chen
Li, Jia Li, Qi Zhong. International Conference on Data Engineering
(ICDE), Demo Track, Boston MA, USA, March, 2004.
---------------------------------------------------------------------

- Introduction

Traditional data-integration systems use a centralized mediation
approach, in which a centralized mediator accepts user queries and
collects information from heterogeneous sources to compute answers.
Recent database applications are seeing the emerging need to support
data integration and sharing in distributed, peer-based
environments. In such an environment, autonomous peers (sources)
connected by a network are willing to exchange data and services with
each other. The goal of the Raccoon Project is to allow these
different information sources to share and query their data with each
other. This release includes the implementation of the Raccoon
system as of the release date. This readme file describes how to run
the demo.

- Platform Requirements:

Any Java-compatible environment with a Java 2 SDK. (We tested the code
on Java(TM) 2 Runtime Environment, Standard Edition, version "1.4.1_01".)

- Installation

1. Download the source code from http://www.ics.uci.edu/~raccoon/ .

2. Unzip the source code.

3. Type the following commands to compile the code.

On a Windows environment:

javac -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class .\Raccoon\Raccoon.java
rmic -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class Raccoon.NET.NetModuleImpl

On a Unix environment:

javac -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class ./Raccoon/Raccoon.java
rmic -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class Raccoon.NET.NetModuleImpl

- Configuration

Before starting to run the demo, you need to configure the
peer-to-peer database network. If you want to use some databases
provided by the Raccoon Project, you can skip this step and go to the
"Running the demo" step by using the provided files "10001.xml", ...,
"10005.xml". In this case, you will be using five peers running on
two databases (MySQL) on two machines of the Raccoon Project.

If you do want to set up your own peers and network, you need to do
the following. For each peer, create an XML file similar to the
sample file "10001.xml" provided in the package. The following is
part of the example file 10004.xml.

(1) <PEERVIWER>
(2) <RMIPORT port = "10004"/>
(3) <NEIGHBOR ip="127.0.0.1" port="10001"/>
(4) <NEIGHBOR ip="127.0.0.1" port="10005"/>
(5) <DATABASE system ="MySQL" host="128.195.38.176" dbName = "peerDB" user="demo" passwd="demo" />
(6) <NODE_INFO>
(7) <NODE_INFO_NAME>OCHousing</NODE_INFO_NAME>
(8) <NODE_INFO_DESC>Orange County Housing</NODE_INFO_DESC>
(9) <NODE_INFO_BW>56Kbps</NODE_INFO_BW>
(10) <NODE_INFO_SIZE>100M</NODE_INFO_SIZE>
(11) <NODE_INFO_TYPE>Physical Data</NODE_INFO_TYPE>
(12) <RELATION name = "OCHousing" />
(13) </NODE_INFO>
(14) </PEERVIWER>

Line (2) shows that this peer is using port 10004 to communicate with
other peers. Any unoccupied port can be used here.

Line (3) shows that this peer has a neighbor running on machine
"127.0.0.1" and using port 10002. You can specify any IP, but you
have to make sure that there is a peer running on that IP with that
port. Notice that this machine is a remote machine, which is
different from the machine on which you are running this demo. Thus
this port number is different from the RMI port number.

Line (4) shows another neighbor peer. (It happens to be on the same
IP machine as the previous peer. In general you can specify any IP
machine with a port as a neighbor, as long as that a peer is really
running on that port.) If you run this demo with several peers on
the same machine, then you need to specify the new IP address for all
these peers' XML files.

Line (5) specifies the database for the local peer. The system is
MySQL and it is running on 128.195.38.176. The database name is
"peerDB" using "demo" and "demo" as the user and password. Currently
our system supports MySQL, Informix, and DB2. The IP address is not
limited on the local machine. The following two databases examples
are available for the demo:

!!!NOTICE!!! You have to make sure that the username and password
are correct and the user has the right to do queries on the database.
In addition, KEEP THIS XML FILE PROPERLY, SINCE IT MAY CONTAIN
INFORMATION ABOUT THE USER AND PASSWORD.

Lines (6)~(13) specify the information about this local peer.

Line (7) gives the name of the peer.

Line (8) gives the description of peer.

Line (9) shows the network bandwidth. Currently this information is
not used.

Line (10) gives the size of the data set at this peer.

Line (11) gives the type of the data set.

Line (12) gives the name of the relation you want to share. This
relation must be defined in the database specified in Line (5)

- Running the Demo

After writing configuration XML files (e.g. file 10001.xml) to
specify the peer network, you can start each peer using the
corresponding configuration file by typing the following command.
(The following is using 10001.xml as an example.)

Windows:

java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml

Unix:
java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml

This command will start a peer with a PeerViewer browser on your
system.

Alternatively, you could also type the following command.

Windows:
java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy

This command will start the peer in a "dummy" mode. In this mode,
the peer can only answer queries from other peers, but it cannot
issue queries. In addition, there will be no browser for a peer
in a dummy mode. A peer in a dummy mode uses much fewer resources.

Because of some limitations, currently our implementation can access
only one database on a DBMS server from an IP address. Thus, if you
are simulating multiple peers on one machine, either configure each
of them to access different DBMS servers, or configure them to use
the same database on the same DBMS server.

- Functionalities:

The current implementation provides the following features.

o A user can navigate the peer network. Each peer is represented by
a colored node in the browser, called PeerViewer. By right
clicking each node and choosing "Expand Node," the user can explore
its neighboring nodes. If the user puts the mouse on a node for a
second, all the information about the node will be listed on the
right-hand side, including the network bandwidth, relation schema,
etc.

o The user can search in the network for relations that are "similar"
to a given relation. This search can be done by right clicking the
local node and choosing "Search Node." The user needs to specify a
relation in the local database. This search will be propagated in
the network. For each peer, we use schema-mapping techniques to
identify relations that are "similar" to the given relation. All
the peers with similar relations will be returned.

o After a search is done, the system will return all the peers with
similar relations. If the user is interested in a returned
relation, she can click the "Add Mapping" button, and the system
will compute an attribute-level mapping based on the similarity
between each pair of attributes. (Each mapping is directed.) The
user can validate the mapping, and add this mapping into the
system. This mapping will be stored in the two peers to be used to
answer an "Extended Query" (see below).

o Querying the peers. A user can issue queries on the system. There
are two kinds of querying modes.

(1) Focused Querying Mode: The user can pose a query on peer
relations. In this mode, the system will answer the query by
accessing the specified relations only, without expanding the
query to other relations. For instance, suppose peer A has a
relation student(id, name), and peer B has a relation exam(id,
grade). The user issues the following query:

SELECT A:student.name, B:exam.grade
FROM A:student, B:exam
WHERE A:student.id = B:exam.id;

Here each query should have a prefix to indicate the peer
name. Notice that each query statement should end with a
semicolon.

In the focused querying mode, the system will compute the answers
by using the two specified relations.

(2) Extended Querying Mode. In such a mode, for each query, a peer
with a specified relation will utilize available mappings of the
relation, and expand the "subquery" on this relation to other
relations that have mappings with this relation. The motivation
of supporting this querying mode is to allow the user to get as
much information as possible to answer a user query.

For instance, in the example above, suppose peer B has a mapping
between B.exam with another relation "exam2" at a different peer
C. Then peer B will translate the condition on B.exam to the
relation C.exam2 to get more information to answer the query.

- Sample Testing Scripts

If you want to do a quick test, you can use five peers provided by
the Raccoon Project. Currently these peers are running on two MySQL
databases on two machines of the Raccoon Project at UC Irvine.
Please contact the authors if you have problems starting these peers.
To use these five peers, do the following:

1) Open a command line window and start a peer with 10001.xml.

2) Open a command line windows and start a peer with 10002.xml.

3) Open a command line windows and start a peer with 10003.xml.

4) Open a command line windows and start a peer with 10004.xml.

5) Open a command line windows and start a peer with 10005.xml.

Be sure to modify these five XML files to specify the IP address
(your machine) properly. You don't have to modify anything if you
are running all the peers on one machine.

If you want to use your own database, you need to create the
database. You may use the commands in the file
"sample/sample-data.sql" to create and populate the different
tables/databases.

You can use the queries in the file
"sample/test_script.txt" to test searches and queries.

- System Architecture

The current system has five modules: Net, GUI, Resource Manager,
Search Manager, and Query Manager, as illustrated by the file
"./architecture.jpg"

(1) Net: This module does all the communication tasks among
different peers. It is implemented using RMI.

(2) GUI: It displays the network and supports different
operations. It is based on GraphLayout library offered by
TouchGraph Inc.

(3) Resource Manager: It stores the following resources:
* Peers known to a local node;
* Mapping links created by user;
* An interface dealing with data stored in local databases.

(4) Search Manager: It supports searches for similar relations.
Currently we use a basic schema-mapping algorithm to compute
the similarity between two schemas.

(5) Query Manager: It processes user queries. It contains a parser,
which parses an SQL query and builds a query tree. It has an
engine that executes the query tree. In the implementation, we
do simple optimization, e.g., pushing selections and projections
down the tree.

- Used Libraries:

o java_cup and JLEX: used to generate the SQL Parser.

o db2java.zip: JDBC driver for IBM DB2.

o ifxjdbc.jar and ifxjdbc-g.jar: JDBC Driver for Informix.

o mysql-connector-java-3.0.8-stable-bin.jar: JDBC Driver for MySQL.

o nanoxml-2.1.1.jar: a lightweight XML parser.

o BrowserLauncher.jar: a package to display an HTML page in a java
component.

o TouchGraph: a GraphLayOut software to display a network. The
original package could be downloaded from www.touchgraph.com . It
has been modified in this implementation. The following are the
major changes:

* Change some methods in LB to "public." Otherwise we cannot extend
them to call their methods.

* Change Node and LBNode to be "implements Serializable," because we
are using RMI.

* Node has edge info in it. When you pass a node, you pass the edge
along with it.

We unpack the code and put it under the com/ subdirectory.

For the latest information about the Raccoon Project, visit its home page:

http://www.ics.uci.edu/~raccoon

Enjoy.

Last updated: November 29, 2004.