Data-mart integration of the proteome

Date of Completion

January 2012


Biology, Bioinformatics|Computer Science




A broad range of tasks in modern bioinformatics analysis require integration of data from disparate sources. The explosion of data in the post-genomic era blazes a trail that for integrative bioinformatics: the use of disparate information repositories to solve problems in data visualization, interpretation, and normalization which have previously been difficult to address. In order to integrate such repositories, we must maintain a dynamic data-integration framework that is capable of processing large amounts of data in an optimal manner. Although these requirements may be opposed, we can reconcile them by combining the attributes of a federated database environment with data marts: high-performance, task-specific databases which can be rapidly generated and torn down, due to their small footprint. This thesis reveals the power of data marts for solving emergent problems ^ in protein bioinformatics over a broad range, including functional annotation, the use of integrated methods for data visualization and interpretation of biomolecular data, and protein sequence mining. The broad range of examples demonstrate that data mart integration of the proteome is an efficient and practical alternative to monolithic approaches for integration. ^