Fast and cost-effective algorithms for information extraction in some computational domains

Date of Completion

January 2008


Computer Science




In this research, we propose and test algorithms for several problems of interest in the areas of computational biology and data mining, as follows.^ Privacy-Preserving Association Rule Mining in Vertically Partitioned Data. Privacy-Preserving data mining has recently become an attractive research area, mainly due to its numerous applications. Within this area, privacy-preserving association rule mining has received considerable attention, and most algorithms proposed in the literature have focused on the case when the database to be mined is distributed, usually horizontally or vertically. In this research, we focus on the case when the database is distributed vertically. First, we propose an efficient multi-party protocol for evaluating itemsets that preserves the privacy of the individual parties. The proposed protocol is algebraic and recursive in nature, and is based on a recently proposed two-party protocol for the same problem. It is not only shown to be much faster than similar protocols, but also more secure. Second, we propose two cryptographic protocols for the same problem that are shown to be secure and fast.^ Haplotype Reconstruction. The haplotype reconstruction problem has received a great deal of attention in the bioinformatics literature. The algorithms proposed thus far for this problem can be grouped into two main categories: statistical and combinatorial. In this research, we focus on combinatorial algorithms for haplotype reconstruction based on Clark's method. Specifically, we propose several self-optimizing parallel algorithms for haplotype reconstruction based on Clark's method. Experimental results show that the proposed self-optimizing parallel algorithms can be viewed as an efficient alternative not only to Clark's algorithm, but also to some previously proposed (integer) linear programming formulations.^ DNA Probe Placement. One of the current problems in the area of microarray design and manufacturing is the border length minimization problem (BLMP). We propose two parallel algorithms for the BLMP. The proposed parallel algorithms have the local-search paradigm at their core, but are slightly more complicated, and especially developed for the BLMP. The results reported show that, for small microarrays with at most 1156 probes, the proposed parallel algorithms perform better than the best previous algorithms We also discuss some possible extensions of the proposed algorithms.^ We also point out possible future work directions in all of these areas. ^