Date of Completion
8-21-2017
Embargo Period
8-20-2017
Advisors
Dr. Jill Wegrzyn, Dr. Kevin Brown, Dr. Rachel O'Neill
Field of Study
Biomedical Engineering
Degree
Master of Science
Open Access
Open Access
Abstract
EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model Eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs much faster than comparable functional annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.
Recommended Citation
Hart, Alexander, "EnTAP: Software to Improve the Quality and Functional Annotation of De Novo Assembled Non Model Eukaryotic Transcriptomes" (2017). Master's Theses. 1127.
https://digitalcommons.lib.uconn.edu/gs_theses/1127
Major Advisor
Dr. Jill Wegrzyn