Date of Completion

Spring 5-1-2018

Thesis Advisor(s)

Leighton Core

Honors Major

Molecular and Cell Biology


Gene expression is essential for every cellular process in our bodies. Improper regulation of gene expression can cause a variety of serious conditions such as cancer. Single nucleotide polymorphisms (SNPs) are single base pairs that vary based on the individual. Some SNPs are located in regions of the genome essential for regulation of gene expression, while other SNPs are located in non-coding regions of the genome where they might cause some effect that is currently unknown.

Chronic lymphocytic leukemia (CLL) mainly affects adult populations and has a high familial risk component compared to other cancers. Several SNPs have been associated with the disease, however many are located in non-coding areas of the genome and their function is not yet known. Our hypothesis is that these SNPs function as regulatory elements that affect the transcription of nearby genes. It is believed that not a single SNP, but multiple SNPs work together to generate the CLL disease phenotype. The objective of this research is to locate previously unidentified SNP(s) that are in linkage disequilibrium (LD) with known CLL-associated SNPs, and to identify their function.

Since many CLL-associated SNPs are located in non-coding areas of the genome, functional genomics studies must be done to determine if they are located in areas with some functional activity. Towards this end, precision run-on sequencing (PRO-seq) was performed on 5 normal and 18 CLL subject samples. Discriminatory regulatory-element detection from GRO-seq (dREG) allowed us to identify transcription regulatory elements (TREs) and to quantify their activity from the PRO-seq data (Danko et al., 2015). TREs with a genotypic specific change in activity became candidates for further analysis. Primers were designed around TREs in regions close to three known CLL-associated SNPs that were also in areas of LD. PCR and Sanger Sequencing were performed on these regions to identify other SNPs in the region.

Additional SNPs were identified during the sequencing of the candidate TREs, and transcription factor (TF) binding site analysis was performed. A SNP on chr16, position 85,934,116, was identified in a TRE in LD with the known SNP rs305088. TF binding site analysis showed that this SNP acts as a binding site for multiple TFs. As the SNP is in an area of LD with rs305088, the two SNPs are more likely to segregate together during meiosis. This suggests that multiple SNPs segregate together during meiosis and together contribute to the genetic inheritance of CLL, possibly through the regulation of TF binding sites. The linked segregation and inheritance of this SNP may contribute to the genetic component and the progression of CLL by preventing or enhancing TF binding to the region, thereby preventing normal function or enhancing a new, abnormal function.