Advancement of support vector clustering and interval analysis technologies in bioinformatics

Date of Completion

January 2007


Engineering, Chemical|Biology, Bioinformatics




Bioinformatics has emerged due to the increase in experimental data in the molecular biology field. Clustering methods are one of the popular bioinformatics tools and are widely used in testing microarray datasets. Microarray experiments allow analysis of expression profiles of thousands of genes simultaneously, with the goal of finding genes or group of genes with similar responses under prescribed circumstances.^ In most of the clustering algorithms, tunable parameters are set arbitrarily or by trial and error, resulting in less than optimal clustering. We formed a global optimization strategy for the systematic and optimal selection of parameter values associated with a clustering. We tested the framework using Support Vector Clustering on lymphoma and breast cancer data. We determined the optimal tuning parameters efficiently, while significant reductions in computational effort (CPU time) are observed especially for large datasets with a new cluster labeling approach we proposed, namely the contour plotting. ^ ODE systems arising in biological, chemical, physical and other engineering areas usually contain approximately known parameters. Interval analysis enables bounding point values of variables and parameters and plays a vital role in solving validated solutions of initial value problems (IVPs). A significant advantage over standard numerical methods is that the problem is guaranteed to have an enclosure of the true solution.^ There are various available algorithms for the solution of IVPs based on interval analysis. The common problem seen is that each technique converges in a limited time interval. In this dissertation work, we tested a new approach by combining Taylor model integration methods with an interval solver that employs constraint satisfaction techniques. Improved results were obtained compared with the classical methods but convergence problem still persisted. We analyzed a biological model describing the infection dynamics between a lytic RNA phage (MS2) and its host (E.coli). For the accurate estimation of the most sensitive parameters in the model, we employed an interval global optimization algorithm (branch-and-bound). The number of parameters that could be studied simultaneously was limited due to the convergence problems of the interval ODE solver. ^