|
|
Integration and the interoperability of heterogeneous and
distributed information systems
|
Table of Contents |
The
Problem - Accessing and managing data from several existing
independent databases
The
Arcitecture - Architecture of Federated Databases
The
Methodology - Reverse-engineering, schemas integration and mappings
building
The
Case-tool - Support of the methodology and for the architecture
components generating
|
The Problem |
Accessing and managing data from several existing independent databases
pose complex problems that can be classified into platform, DMS, location
and semantic levels. The platform level copes with the fact that databases
reside on different brands of hardware, under different operating systems,
and interacting through various network protocols. Leveling these differences
leads to platform independence. DMS level independence allows programmers
to ignore the technical details of data implementation in a definite family
of models. It can also hide the model that the DMS implements by providing
a more abstract model. Location independence isolates the user from knowing
where the data reside. Finally, semantic level independence solves the
problem of multiple, replicated and conflicting representations of similar
facts.
Current technologies such as de facto standards (e.g. ODBC and JDBC),
or formal bodies proposals (e.g. CORBA),
now ensure a high level of platform independence at a reasonable cost,
so that this level can be ignored from now on. DMS level independence is
effective for some families of DBMS (e.g. through ODBC or JDBC for RDB),
but the general problem is still unsolved when several DMS models are to
cooperate. Location independence is addressed either by specific DBMS (e.g.
distributed RDBMS) or through distributed object managers such CORBA middleware
products. Despite much effort spent by the scientific community, semantic
independence still is an open and largely unsolved problem.
The InterDB project proposes a general
architecture, a methodology and a CASE
environment intended to address the problem of providing users and
programmers with an abstract interface to independent, heterogeneous and
distributed databases.
|
The Architecture |
The architecture comprises a hierarchy of mediators that dynamically
transform actual data into a virtual homogeneous database. Each layer of
mediators provides a certain kind of independence. DMS independence
is provided by local servers dedicated to each database. A local server
comprises two modules, namely the logical and the conceptual modules. A
logical module hides the syntactic idiosyncrasies and the technical details
of the DMS of a given model family. All the logical modules offer a common
interface and present the physical data according to a common data model
called the generic logical model. For instance, relational databases and
flat COBOL files appear as ODBC/JDBC components. Each model family is defined
as a specialisation of the generic logical model. A conceptual module expresses
the logical data according to a high level Object/Relationship conceptual
model. Therefore, each local server appears as a conceptual database that
can be processed through a unique interface. Both logical and conceptual
interfaces include a variant of the OQL language, so that they can be integrated
into a great variety of architectures.
Location and semantic independence's are ensured by a global server.
This module processes the global queries, that is, queries addressing the
data independently of their distribution across the different sites. The
module is based on a repository that describes the conceptual schema of
each local server, its location, and the relationships between their data
structures. Information such as data replication, semantic conflicts and
data heterogeneity allows the server to interpret and distribute the global
queries, and to collect and integrate the result sets sent back by the
local servers.
Finally, platform independence is ensured by both the locals servers
and ad hoc middleware such as commercial ORB
|
The methodology |
Such an architecture involves controlling complex mappings: physical/logical,
logical/conceptual, local/global. The problem is complicated by the fact
that the databases have been developed independently, and naturally suffer
from sever problems of replication and semantic conflicts. In addition,
most legacy databases have no documentation any more, just like programs.
Recovering the logical and conceptual schemas of an existing database
is the main goal of database reverse engineering, an important software
engineering that can now be considered mature.
Solving the syntactic and semantic conflicts of independent schemas
has long been studied in the database realm. However, coping with conceptual
schemas form populated databases brings new problems. A complete methodology,
encompassing schema recovery and database integration is provided to praticioners.
|
The Case support |
Deriving a common, abstract and conflict-free image of independent
databases and defining the mappings between the specification layers are
complex tasks. Building the local servers and writing the interface components
for the application programs are also two complex and error-prone activities.
All these processes are supported by the DB-Main
CASE tool. This graphical, repository-based, software engineering environment
includes, among others, a sophisticated reverse engineering toolkit, schema
mapping specification facilities and a generator development environment.
The generation of local servers (logical and conceptual) is automated.
It relies on the results of reverse engineering activities through which
the exact logical and conceptual structures of each database have been
elicited. The global server exploits the same repository, in which the
inter-database mappings have been made explicit.
DB-MAIN
programme - Head of DB-MAIN programme : Jean-Luc Hainaut jlh@info.fundp.ac.be
Address: Dept. of Computer Science, University of Namur,
21, rue grandgagnage, B-5000 Namur, BELGIUM
Telephone: Tel (+32) 81 72 49 96 (jlh) or 72 49 85 (staff)
Fax (+32) 81 72 49 67 |