Grammatical formalisms for RNA structure analysis

Date of Completion

January 2010


Biology, Bioinformatics|Computer Science




Since the function of a molecular sequence depends on its structure, analyzing RNA structures is essential to create new drugs and understand genetic diseases. Pseudoknots are one type of RNA structures that have attracted a lot of interest in recent years, especially as it became possible to address the computational complexity associated with modeling this type of structures. Pseudoknot structures have functional importance since they appear, for example, in viral genome RNAs and ribozyme active sites. In predicting RNA structures, computational methods are less expensive than other methods such as nuclear magnetic resonance and x-ray crystallography. A relatively new approach to structure analysis, namely, the grammatical approach has attracted the attention of many researchers, because it can model long range interactions. Grammars offer a natural and concise way to model DNA, RNA, and protein sequences. In this research, we aim to facilitate for biologists the use of grammatical models for RNA structure analysis through the automation of the grammar building step. We focus on grammatical models capable of representing pseudoknots. ^ The main contribution of this research is the development of an RNA structure analysis framework, TAGRNAInf. The framework is capable of analyzing RNA structures including pseudoknots. It currently addresses two RNA structure analysis problems: structure identification and RNA folding, and it can be expanded to address other problems like structural classification and motif search. The approach adapted in this solution is a grammatical inference approach that has a learning algorithm for a grammatical model capable of representing RNA pseudoknots (Tree Adjoining Grammars for RNA, TAG RNA) at the core of its learning phase. There has been previous research on the use of grammatical approach for RNA structure analysis including pseudoknots in which a specific model is built for a certain family of RNAs. However, there has been limited research on the use of grammatical inference for RNA structure analysis. TAGRNAInf. is the first complete framework for RNA structure analysis including pseudoknot, based on grammatical inference, that has been experimentally tested and yielded results competitive to other available methods. ^ As a part of this research, we also developed a new grammatical model, Linked Single Adjoining-Tree Adjoining Grammars (LSA–TAG), capable of representing pseudoknots. We have developed a grammatical inference algorithm for LSA–TAG that can learn the grammar for a family of RNA structures from example sequences. This inference algorithm has proven to be mainly of theoretical interest. ^