---------------------------------------------------------------------
 RACCOON: A Peer-Based System for Data Integration and Sharing 
      The RACCOON Project, http://www.ics.uci.edu/~raccoon/
                Release v 2.0
                November 29, 2004

Copyright (c) 2004 by Database Group
Department of Computer Science 
University of California, Irvine
Irvine, CA 92697
  
   Qi Zhong (qzhong@ics.uci.edu)
   Jia Li (jiali@ics.uci.edu)  
   Chen Li (chenli@ics.uci.edu)
   University of California, Irvine

   Partially supported by the National Science Foundation under the
   CAREER-Award Grant IIS-0238586.

   Please send technical questions about this release to raccoon@ics.uci.edu .
---------------------------------------------------------------------

This software was created by members of the Database Group at UC
Irvine and is distributed free of charge. It is placed in the public
domain and permission is granted for anyone to use, duplicate, modify
and redistribute it provided this notice is attached.
  
There is absolutely NO WARRANTY OF ANY KIND with respect to this
software; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL ANY PARTY BE LIABLE
TO ANYONE FOR ANY DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE,
INCLUDING, WITHOUT LIMITATION, DAMAGES RESULTING FROM LOST DATA OR
LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES.

The package complies to the GNU copyright terms at
http://www.gnu.org/copyleft/gpl.html. We include a copy of the terms
in gpl.txt in this distribution package. It also complies to the
copyright terms of University of California, Irvine.
  
The information in this software is subject to change without notice
and should not be construed as a commitment by any employees of the
Database Group or any other employee of University of California.
---------------------------------------------------------------------

A brief description of this demo is described in the following paper:

  RACCOON: A Peer-Based System for Data Integration and Sharing. Chen
  Li, Jia Li, Qi Zhong.  International Conference on Data Engineering
  (ICDE), Demo Track, Boston MA, USA, March, 2004.
---------------------------------------------------------------------

- Introduction

 Traditional data-integration systems use a centralized mediation
 approach, in which a centralized mediator accepts user queries and
 collects information from heterogeneous sources to compute answers.
 Recent database applications are seeing the emerging need to support
 data integration and sharing in distributed, peer-based
 environments. In such an environment, autonomous peers (sources)
 connected by a network are willing to exchange data and services with
 each other.  The goal of the Raccoon Project is to allow these
 different information sources to share and query their data with each
 other.  This release includes the implementation of the Raccoon
 system as of the release date.  This readme file describes how to run
 the demo.

- Platform Requirements:

  Any Java-compatible environment with a Java 2 SDK.  (We tested the code 
  on Java(TM) 2 Runtime Environment, Standard Edition, version "1.4.1_01".)

- Installation

 1. Download the source code from http://www.ics.uci.edu/~raccoon/ .

 2. Unzip the source code.

 3. Type the following commands to compile the code.

 On a Windows environment:

 javac -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class .\Raccoon\Raccoon.java
 rmic -classpath .;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar -d .\class Raccoon.NET.NetModuleImpl
 
 On a Unix environment:
 
 javac -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class ./Raccoon/Raccoon.java
 rmic -classpath .:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar -d ./class Raccoon.NET.NetModuleImpl

- Configuration

 Before starting to run the demo, you need to configure the
 peer-to-peer database network.  If you want to use some databases
 provided by the Raccoon Project, you can skip this step and go to the
 "Running the demo" step by using the provided files "10001.xml", ...,
 "10005.xml".  In this case, you will be using five peers running on
 two databases (MySQL) on two machines of the Raccoon Project.

 If you do want to set up your own peers and network, you need to do
 the following.  For each peer, create an XML file similar to the
 sample file "10001.xml" provided in the package.  The following is
 part of the example file 10004.xml.

(1)  <PEERVIWER>
(2)      <RMIPORT port = "10004"/>
(3)      <NEIGHBOR ip="127.0.0.1" port="10001"/> 
(4)      <NEIGHBOR ip="127.0.0.1" port="10005"/> 
(5)      <DATABASE  system ="MySQL" host="128.195.38.176" dbName = "peerDB" user="demo" passwd="demo" />
(6)      <NODE_INFO>
(7)     	<NODE_INFO_NAME>OCHousing</NODE_INFO_NAME>
(8)     	<NODE_INFO_DESC>Orange County Housing</NODE_INFO_DESC>
(9)     	<NODE_INFO_BW>56Kbps</NODE_INFO_BW>
(10)    	<NODE_INFO_SIZE>100M</NODE_INFO_SIZE>
(11)     	<NODE_INFO_TYPE>Physical Data</NODE_INFO_TYPE>
(12)    	<RELATION name = "OCHousing" />
(13)     </NODE_INFO>   
(14) </PEERVIWER>


 Line (2) shows that this peer is using port 10004 to communicate with
 other peers. Any unoccupied port can be used here.

 Line (3) shows that this peer has a neighbor running on machine
 "127.0.0.1" and using port 10002.  You can specify any IP, but you
 have to make sure that there is a peer running on that IP with that
 port.  Notice that this machine is a remote machine, which is
 different from the machine on which you are running this demo.  Thus
 this port number is different from the RMI port number.

 Line (4) shows another neighbor peer.  (It happens to be on the same
 IP machine as the previous peer.  In general you can specify any IP
 machine with a port as a neighbor, as long as that a peer is really
 running on that port.)  If you run this demo with several peers on
 the same machine, then you need to specify the new IP address for all
 these peers' XML files.

 Line (5) specifies the database for the local peer. The system is
 MySQL and it is running on 128.195.38.176. The database name is
 "peerDB" using "demo" and "demo" as the user and password. Currently
 our system supports MySQL, Informix, and DB2.  The IP address is not
 limited on the local machine.  The following two databases examples
 are available for the demo:

 <DATABASE  system ="MySQL" host="CSE104g.ics.uci.edu" dbName="peerDB" user="demo" passwd="demo"/>
 <DATABASE  system ="MySQL" host="RESCUE15.ics.uci.edu" dbName="peerDB" user="demo" passwd="demo"/>
	
 !!!NOTICE!!!  You have to make sure that the username and password
 are correct and the user has the right to do queries on the database.
 In addition, KEEP THIS XML FILE PROPERLY, SINCE IT MAY CONTAIN
 INFORMATION ABOUT THE USER AND PASSWORD.

 Lines (6)~(13) specify the information about this local peer.

 Line (7) gives the name of the peer.

 Line (8) gives the description of peer.

 Line (9) shows the network bandwidth.  Currently this information is
 not used.

 Line (10) gives the size of the data set at this peer.

 Line (11) gives the type of the data set.

 Line (12) gives the name of the relation you want to share. This
 relation must be defined in the database specified in Line (5)

- Running the Demo

 After writing configuration XML files (e.g. file 10001.xml) to
 specify the peer network, you can start each peer using the
 corresponding configuration file by typing the following command.
 (The following is using 10001.xml as an example.)

 Windows:
  
  java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml
  
 
 Unix:
  java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml

 This command will start a peer with a PeerViewer browser on your
 system.

 Alternatively, you could also type the following command.

 Windows:
java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy

 Unix:
  java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy

 This command will start the peer in a "dummy" mode.  In this mode,
 the peer can only answer queries from other peers, but it cannot
 issue queries.  In addition, there will be no browser for a peer
 in a dummy mode.  A peer in a dummy mode uses much fewer resources.

 Because of some limitations, currently our implementation can access
 only one database on a DBMS server from an IP address. Thus, if you
 are simulating multiple peers on one machine, either configure each
 of them to access different DBMS servers, or configure them to use
 the same database on the same DBMS server.

- Functionalities:

 The current implementation provides the following features.

 o A user can navigate the peer network.  Each peer is represented by
   a colored node in the browser, called PeerViewer.  By right
   clicking each node and choosing "Expand Node," the user can explore
   its neighboring nodes. If the user puts the mouse on a node for a
   second, all the information about the node will be listed on the
   right-hand side, including the network bandwidth, relation schema,
   etc.

 o The user can search in the network for relations that are "similar"
   to a given relation.  This search can be done by right clicking the
   local node and choosing "Search Node."  The user needs to specify a
   relation in the local database.  This search will be propagated in
   the network.  For each peer, we use schema-mapping techniques to
   identify relations that are "similar" to the given relation.  All
   the peers with similar relations will be returned.

 o After a search is done, the system will return all the peers with
   similar relations.  If the user is interested in a returned
   relation, she can click the "Add Mapping" button, and the system
   will compute an attribute-level mapping based on the similarity
   between each pair of attributes.  (Each mapping is directed.)  The
   user can validate the mapping, and add this mapping into the
   system. This mapping will be stored in the two peers to be used to
   answer an "Extended Query" (see below).

 o Querying the peers.  A user can issue queries on the system.  There
   are two kinds of querying modes.

  (1) Focused Querying Mode: The user can pose a query on peer
      relations.  In this mode, the system will answer the query by
      accessing the specified relations only, without expanding the
      query to other relations.  For instance, suppose peer A has a
      relation student(id, name), and peer B has a relation exam(id,
      grade).  The user issues the following query:

          SELECT A:student.name, B:exam.grade
          FROM   A:student, B:exam
          WHERE  A:student.id = B:exam.id;

     Here each query should have a prefix to indicate the peer
     name. Notice that each query statement should end with a
     semicolon.

     In the focused querying mode, the system will compute the answers
     by using the two specified relations.

 (2) Extended Querying Mode.  In such a mode, for each query, a peer
     with a specified relation will utilize available mappings of the
     relation, and expand the "subquery" on this relation to other
     relations that have mappings with this relation.  The motivation
     of supporting this querying mode is to allow the user to get as
     much information as possible to answer a user query.

     For instance, in the example above, suppose peer B has a mapping
     between B.exam with another relation "exam2" at a different peer
     C.  Then peer B will translate the condition on B.exam to the
     relation C.exam2 to get more information to answer the query.

- Sample Testing Scripts

 If you want to do a quick test, you can use five peers provided by
 the Raccoon Project.  Currently these peers are running on two MySQL
 databases on two machines of the Raccoon Project at UC Irvine.
 Please contact the authors if you have problems starting these peers.
 To use these five peers, do the following:

  1) Open a command line window and start a peer with 10001.xml.

 Windows:
 java -cp .;.\class;.\lib\nanoxml-2.1.1.jar;.\lib\BrowserLauncher.jar;.\lib\db2java.zip;.\lib\ifxjdbc.jar;.\lib\ifxjdbc-g.jar;.\lib\java_cup_v10k.zip;.\lib\mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml

 Unix:
  java -cp .:./class:./lib/nanoxml-2.1.1.jar:./lib/BrowserLauncher.jar:./lib/db2java.zip:./lib/ifxjdbc.jar:./lib/ifxjdbc-g.jar:./lib/java_cup_v10k.zip:./lib/mysql-connector-java-3.0.8-stable-bin.jar Raccoon.Raccoon 10001.xml dummy

  2) Open a command line windows and start a peer with 10002.xml.

  3) Open a command line windows and start a peer with 10003.xml.

  4) Open a command line windows and start a peer with 10004.xml.

  5) Open a command line windows and start a peer with 10005.xml.

 Be sure to modify these five XML files to specify the IP address
 (your machine) properly. You don't have to modify anything if you
 are running all the peers on one machine.

 If you want to use your own database, you need to create the
 database.  You may use the commands in the file
 "sample/sample-data.sql" to create and populate the different
 tables/databases. 
 

 You can use the queries in the file
 "sample/test_script.txt" to test searches and queries.

- System Architecture

 The current system has five modules: Net, GUI, Resource Manager,
 Search Manager, and Query Manager, as illustrated by the file
 "./architecture.jpg"

  (1) Net: This module does all the communication tasks among
      different peers.  It is implemented using RMI.

  (2) GUI: It displays the network and supports different
      operations. It is based on GraphLayout library offered by
      TouchGraph Inc.

  (3) Resource Manager: It stores the following resources:
        * Peers known to a local node;
        * Mapping links created by user;
        * An interface dealing with data stored in local databases.

  (4) Search Manager: It supports searches for similar relations.
      Currently we use a basic schema-mapping algorithm to compute
      the similarity between two schemas.

  (5) Query Manager: It processes user queries.  It contains a parser,
      which parses an SQL query and builds a query tree.  It has an
      engine that executes the query tree. In the implementation, we
      do simple optimization, e.g., pushing selections and projections
      down the tree.

- Used Libraries:

 o java_cup and JLEX: used to generate the SQL Parser.

 o db2java.zip: JDBC driver for IBM DB2.

 o ifxjdbc.jar and ifxjdbc-g.jar: JDBC Driver for Informix.

 o mysql-connector-java-3.0.8-stable-bin.jar: JDBC Driver for MySQL.

 o nanoxml-2.1.1.jar: a lightweight XML parser.

 o BrowserLauncher.jar: a package to display an HTML page in a java
   component.

 o TouchGraph: a GraphLayOut software to display a network. The
   original package could be downloaded from www.touchgraph.com .  It
   has been modified in this implementation.  The following are the
   major changes:

  * Change some methods in LB to "public."  Otherwise we cannot extend
    them to call their methods.

  * Change Node and LBNode to be "implements Serializable," because we
    are using RMI.

  * Node has edge info in it. When you pass a node, you pass the edge
    along with it.

  We unpack the code and put it under the com/ subdirectory.
  
For the latest information about the Raccoon Project, visit its home page:

  http://www.ics.uci.edu/~raccoon

Enjoy.

Last updated: November 29, 2004.