A metadata modeling approach to database interoperability

Date of Completion

January 1998


Computer Science




With evolution of scientific knowledge, knowledge within a scientific domain is accumulated within multiple databases maintained by separate groups of researchers. While information across all these databases is semantically related, the individual databases are characterized by different data models, query languages, and user interfaces. There is a growing need for exchanging such semantically related data for detection of redundancies and inconsistencies in both the primary (experimental) data as well as the secondary (inferred) data contained in the databases. However, due to the heterogeneities involved, it is difficult for users of these structurally different databases to query each other's data. To address such a problem, this dissertation proposes a system that allows users to query a set of semantically similar databases based on their existing schema knowledge. ^ The proposed system consists of a query mapping module that automatically maps a (source) query issued against one database schema to an equivalent (target) query against another database schema. The source query may be composed directly by the user using a query language such as SQL. Alternatively, the user can specify the query via a graphical front end (e.g., a query form). Such a graphical query interface is generated dynamically and is tailored to the schema knowledge of the user. The user input, which is accepted by the graphical interface, is converted into the corresponding database query by a “query generator”. ^ Underlying our system is a metadata, model that extends the entity-relationship (ER) model to describe and establish mappings between components among individual databases. The metadata model also includes constructs such as inheritance and aggregation. To facilitate database comparisons, additional information, which is not modeled explicitly in the original database schemas, is described in the metamodel. Such additional information is modeled in the form of hidden entities/attributes/relationships, meta-attributes, and domain constraints. ^ In addition to describing database objects, the model captures metadata that are used to dynamically generate schema-specific graphical query interfaces. As a demonstration, we apply our system to perform query mappings between two genome databases, namely, DB/12 and GDB. ^