A novel forensic approach to DNA database construction and population genetic analysis

Date of Completion

January 2008


Biology, Genetics




With the increased use of genetic markers such as short tandem repeats (STRs) for human identification, concerns of possible population stratification have arisen. Autosomal STRs are the most commonly used form of DNA-based human identification as they offer several advantages for forensic applications: high levels of polymorphism, relatively small sizes, and the requirement for small amounts of DNA. The forensic use of Y chromosomal markers for human identification purposes has also increased rapidly. The issue of population substructure has been addressed and we conclude that there may be substructure associated with autosomal STRs, but the statistical difference is negligible when dealing with forensically relevant samples. Consideration of population stratification and the appropriateness of databases are particularly important for Y STR data as they occur as lineage-specific haplotypes, which can theoretically be differentially partitioned in populations. We have addressed possible substructure and demonstrated the need for larger databases. ^ In this study we have compiled a novel database comprised of STR profiles and allelic frequencies from a Connecticut subpopulation to determine the level of stratification from national databases. We have used a novel approach in sample acquisition and data grouping that takes into account an individual's geographical genetic history in addition to their self-identified race. We have examined the construction of STR allelic frequency databases as it relates to marker frequency bias and conclude that race may be an adequate criterion for grouping individuals but using deep ancestral origins may be more important for statistically determining the meaning of a match. Also, we have demonstrated that the molecular genetic basis of variant alleles may be due to an ancestral type perpetuating in the population through drift or selection but the resulting impact on major forensic statistics is minimal. We have applied a molecular genetics spin on the current uses of DNA for forensic identification and overall, will help to accurately convict the guilty and exonerate the innocent. ^