Genetic code evolution and amino acid composition analysis

Date of Completion

January 2009


Biology, Genetics|Biology, Bioinformatics




The genetic code is extremely old, predating the time of the most recent common ancestor of the three domains of life, as well as almost every other biological system considered essential for life as we know it today. However, like any other component of a living system, the code itself must also have evolved, along with the components of its interpretation and actualization, the "translation machinery". Since the origin of the translation machinery also marks the origin of anything resembling today's life (likely evolving from an "RNA world", which was subsequently nearly completely superceded), this is a unique and especially difficult evolutionary problem. Typically, evolutionary events are studied within the context of a genetic system with specific rules; how, then, does one study the evolution of the rules themselves? ^ Despite these difficulties, an understanding of genetic code evolution is absolutely necessary for any true understanding of the origin of life on Earth. For this reason, it has been the subject of numerous investigations in the last several decades, with a wide variety of methodologies and assumptions generating diverse lines of evidence, frequently resulting in highly speculative models that often conflict with one another. ^ Here, I present a novel empirical approach for studying the evolution of the genetic code, relying on the amino acid composition of ancient fixed positions in a subset of proteins conserved across all domains of life. Using ribosomal proteins, amino acyl-tRNA synthetases, and ATP synthases, I show that a subset of amino acids are consistently underrepresented at these positions, suggesting they were later additions to the code, and had less opportunity for widespread fixation by the time of the most recent common ancestor. Conversely, I show that some other amino acids show a significant over-representation at ancient positions, supporting their presence in an earlier version of the code. Various extensions of this approach are also presented here, including mapping amino acid usage trends across a chronological ordering of ribosomal protein subunits, inferring ancestral specificities of paralogous aminoacyl-tRNA synthetases, and rooting the ribosomal tree of life through analysis of unique compositional biases within deep branches. ^