Enterprise DB Systems Architecture

System Architecture Concepts
Repositories
Centralized Processing
Distributed Processing
DB Application Integration Mechanisms

System Architecture Concepts

Enterprise System Architecture

An significant Enterprise System Architecture example taken from DB2 Magazine, appearing in the following article (optional material)

Components

Executable software systems, programs, functions, or computational methods (in object-oriented programs) with an interface (a composition of I/O resources).

Interfaces

Composition of I/O resources that Require or Provide application data object types, instance values, or component control signals

I/O Resources

Application resources -- data object types, instance values, or component control signals (start, stop, suspend, resume, error, etc.) -- Input to, or Output from, a component
(Optional) Functions or logical predicates that specify the range of acceptable values for each resource, or special values (e.g., "singularities") that signify component processing exceptions.

Connectors

Data communication mechanisms that enable the exchange or interchange of messages (data or control signals) between communicating components
Point-to-Point Application Program (Component) Interfaces (APIs) that link a Component's Outputs to other Components' Inputs
Messaging system or "information bus" for broadcasting messages from one sending component to many other listening components.

Configuration and Versions

Configuration: An arrangement of components ("floorplan") whose interfaces and I/O resources are interconnected to other components using conectors.
Version: An instance of a configuration, component or connector, with a unique interface

An (Advanced) Architectural Description Language (xArch)

For your information only :)
(Optional) xArch Notation: an XML data description schema (in progress!)
XML schemas can be sent/received over the Web

This enables portable, reusable or mobile specification of System Architectures

Repositories

Client-side

Pervasive, dominant approach to information and data storage
Data content is not organized and administered using a DBMS
Lack explicit database data model, lack standard query or data definition languages, updates are ad hoc, end-user acts as data administrator.
PC/Workstation file systems (e.g., "Windows Explorer"), directories, files, and others (e.g., "Bookmark" or "Favorites" hierarchical lists for Web browsers) are pervasive data storage mechanisms, with little support for consistent data organization, access, or update.
Centralized system architecture

Server-side

Pervasive, dominant approach to information and data storage
Data content is not organized and administered using a DBMS
Lack explicit database data model, lack standard query or data definition languages, updates are ad hoc, end-user acts as data administrator.
Network file systems (e.g., NFS, NTFS, NDS), directories and files are pervasive data storage mechanisms, with little support for consistent data organization, access, or update.

Storage Area Network (SAN) vs. Network Attached Storage (NAS)
SAN -- a network of storage repository servers
NAS -- a storage repository server attached to a network

Centralized server, distributed client system architecture

Centralized Processing Architectures

Centralized Servers

Mainframe (centralized processor, like an IBM System/390)
Star/Hub (e.g., centralized processor connected to networked PCs and other processors)
Cluster (physically centralized, logically distributed)

Multiple, tightly-coupled servers, all identical
Provides reliability through redundancy

DB may be replicated across multiple processors, each with a copy of the same DBMS.
DBMS servers can be configured to "take turns" (e.g., round-robin processing) in processing a stream of DB transactions.
Redundancy provides reliability through a fail-safe configuration of transaction processing.

If a single processor fails, then data processing can be automatically migrated to other processors.

Provides scalability through workload distribution (load-sharing)

Enterprises may acquire two or more mainframes, then add more as transaction processing load increases. Well-suited choice for scaling large DBMS processors.
Enterprises may acquire 100s of PC-class processors as a cluster, then buy more as workload increases. Well suited choice for scaling many small DBM processors.

Note: Redundancy is not the same as scalability. Having one does not imply the existence of the other. They are distinct concepts and capabilities.
Clusters are increasingly being brought together to form application/computing service grids.

Grids may be physically distributed, but logically centralized (i.e., act together as if a single system or repository server).
Grids are implemented using Web-based application services (described below)

Centralized processing is most often employed when large numbers of transactions must be processed at high rates (e.g., 100s-1000s transactions per second), in a highly reliable manner.

Distributed Processing Architectures

Client-Server

Two-tier (centralized server, distributed clients)

Least expensive to configure
Vulnerable
Generally a legacy solution

Three-tier (clients, proxy/gateway, servers)

Most common contemporary solution
Proxy/gateway accomodates, hides/isolates, and protects multiple servers and multiple clients
Well-suited to small-medium size enterprises that are not in a DBM-specific business

N-tier or multi-tier (clients, proxy, application server,…, data servers)

Most common future solution
Proxy/gateway networks accomodate, hide/isolate, and protect multiple servers and multiple clients
Well-suited to medium-very large size enterprises that can be in a DBM-specific business
N-tier architectures enable massively decentralized systems (Freenet, SETI@Home, etc.)

All tiers are increasingly being configured as clusters or multi-processors.

Client-side Platforms

Processors that request or provide data on-demand

Desktop/Laptop PC

Web/Browser User Interface and associated "helper applications"

Mobile computers: Remote, wireless/disconnected data bases
Handheld Personal Digital Asssistants (PDAs -- e.g., Palm Pilot)
Internet Devices (PCS/Internet Phone with local "address book" database)
Smart Cards
Coming attractions in 2-5 years:

Mobile DBs and Virtual DBs constituted on-demand from ad-hoc network or virtual private network (VPN) of mobile computers!

Server-side Platforms

Processors that continuously wait for and service requests for data or application processing from clients

DBMSs operate on servers.
PCs, laptops, or mobile devices can be configured to operate as a client, server, or both!
Servers organized following

Centralized Processing Architectures, as well as
According to data communication strategy.

Serverless messaging/communication

Peer-to-Peer (i.e., Client-to-Client, Client-to-Client-to-Client-to…)

Instant messaging systems like "ICQ" employ peer-to-peer communication
Supports high bandwidth messaging, but doesn't scale up to large numbers of peers.
Best for highly interactive or "bursty" applications
Also called, Point-to-Point or "Narrowcasting"

Multicast (i.e., shared messaging "trunks" which can be hierarchically organized)

Shares messaging bandwidth, so scales up is manageable
Clients "subscribe" to messages "publish(ed)" by other clients/servers
No central server bottleneck
Difficult to manage
Still seems to consume substantial network bandwidth as subscriber base grows
Seems to work best when "proxy" servers subscribe to multicast message server, and clients communicate with these proxy servers

Proxy servers, gateways, brokers, as well as routers, firewalls, etc.

A proxy server (or gateway, or broker) is just a special-purpose or limited-purpose server that can filter or aggregate data transactions or data flow between clients and application/storage servers
Proxy servers are commonly used to isolate and hide a DBMS server behind a firewall to protect against attacks.
Coordinating activities, application services or data across N-tier system architectures is major technical problem
Coordination demands tend to consume processing resources and available bandwidth

Other Server Platforms

Query servers: virtual server that routes queries and manages connections and data from multiple DBMSs.
Connection-less servers

The World-Wide Web is a repository architecture organized as an uncoordinated, multiple-server information sharing system

The HyperText Transfer Protocol (HTTP) implements a connection-less data communication protocol
A communication protocol is a computational framework that implements a particular scheme for controlling the exchange of data between communicating processors.

Telnet and SMTP (simple mail transfer protocol) are examples of connection-oriented data communication protocols.
Microsoft .NET -- Software/DBM applications as Web-based application services

HTTP -- Web-based object transfer protocol for transmitting views of remote objects (Web pages, data entry forms, etc.)
XML -- eXtensible Markup Language for publishing or sharing database schemas (data models) or data across repositories
SOAP -- Simple Object Access Protocol allows clients to access remote/networked applications as services
UDDI -- Universal Description, Discovery, and Integration is a distributed Web directory services (a registry or shared repository) used to discover one another in order to interact and share information
WSDL -- Web Service Definition Language, a relatively new ("untried") approach to specifying application/DB services, where data are specified using XML, data are transported via HTTP/SOAP, and UDDI provides a registry which indicates the "address" (e.g., URL) for the data, applications, users, repositories, etc.

At present, an "unproven" technology

To make .NET work requires the following software systems

.NET enterprise servers (SQL Server, Exchange Server, .etc)
.NET Web-based service framework (runtime environment, class libraries, advanced ASP (HTTP+XML))
.NET application building block services (middleware for identity (UDDI), notification (SOAP), schematized storage (ODBC))
MS Passport (password, license server, user profile, personal calendar, contact list, your current location, etc.)
.NET does not provide for application data/service routing, which is necessary for Web-based workflow or EBusiness.
Does this look like a vendor "lock-in" strategy?

The mono project is developing an open source version of .NET

(Optional) Advanced Hybrid Peer-Server -- (Bleeding Edge!!!)

Peer-to-Peer and Peer-Server and Client-Server, together.

Peers act as "servents" (SERVers and cliENTS) which may only differ by access PORT attribute on URL
Example: http://www.gsm.uci.edu:80 (where "80" is a port id)

Clients coordinate through servers to determine which peers to interact with directly.
Requires concurrent use of multiple data communication protocols (e.g., UDP, HTTP, Telnet, SMTP).

DB Application Integration Mechanisms

Enterprise Application Integration Connectors

Goal: Maintain autonomy or isolate underlying database to hide data heterogeniety (application data model, DBMS data model, format, layout, etc.) when integrating to other databases or repositories, in order to provide access transparency and scalability.
DBMS as the ultimate "fat" multi-application architecture connector, via SQL-based API
Middleware: ORBs and Connectors that isolate or "pave over" differences when accessing multiple remote DBMSs that embody vendor-proprietary differences (e.g., MS SQL 7--RDBMS vs. Oracle 9i--ORDBMS)

CORBA Object Request Brokers for handling remote database procedure (transaction) invocation:
API-style connectors
Open DataBase Connection (ODBC): an API-style connector
Java DataBase Connection (JDBC): a Java-based API-style connector
Compare API-style middleware connectors to protocols like HTTP and connection-less servers.

Integration process overview:

Create database tables or schemas
Develop and compile "servlet" code
Register application and associated transaction type, query type or application data type in integration directory server

Multimedia Databases:

Add their own special integration constraints, due to their support and use of:

Multi-media data object types (e.g., audio (MP3, .wav, .au)) and video (.mov, .avi, .mpeg)
Special-purpose multi-media processors

Streaming media servers and clients
Real-time data streaming protocols (e.g., RSVP, RTSP)