Date of Completion


Embargo Period


Major Advisor

J. Peter Gogarten

Associate Advisor

R. Thane Papke

Associate Advisor

Joerg Graf

Associate Advisor

Jonathan Klassen

Associate Advisor

Spencer Nyholm

Field of Study

Molecular and Cell Biology


Doctor of Philosophy

Open Access

Open Access


Whole genome sequencing has opened enormous worlds of opportunity in recent years as the number of sequenced organisms has continued to skyrocket. Keeping track of what we have sequenced and added to our databases, as the cornucopia of data grows ever larger, is essential. Maintaining order in our classification of genomes and understanding how they relate to each other is probable to only continue to grow in importance as time progresses. A second major consequence of the explosion in genome sequencing is the ever-increasing opportunity to explore the distribution and role of rare genes in the pan-genomes of phylogenetic groups and communities as never before. Such insights offer us opportunities to glean how these genes at the seeming periphery of a group can direct organismal interactions and perhaps shape or repress the emergence of new lineages.

The first section of this thesis discusses how established methodologies can elucidate both phylogeny and taxonomy. Tools such as multi-locus sequence analysis, average nucleotide identity (ANI), and core genome phylogenies are shown to converge on the same answers to how genomes relate to one another and how they should be classified. The second section discusses a novel extension to the ANI concept. This extension allows the inference of statistically supported phylogenies from whole genome data. As a byproduct of this new method deeper taxonomic ranks can now be delimited by in silico genomic distance. A detection and identification pipeline for restriction-methylation systems in the class Halobacteria is presented in the third section. Additionally, the strong proclivity of these genes to be transferred across the breadth of the class is also analyzed. Finally, the last section discusses a hypothesis of how apparently mutualistic interactions could arise through a process of mutual cheating. Furthermore, the hypothesis is compared with prior hypotheses that also invoke distributed genomes and shared functions.