RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Advancements in DNA sequencing over the past decade have transformed our ability to characterize genetic variation in large populations and study the genetics of many complex traits. For population geneticists, information on the genetic variation (i.e., which sites in the genome are mutated and at what frequency) alone is interesting as it allows for studying aspects of a population (e.g., demographic history, natural selection, and mutation rates). For statistical geneticists and genetic epidemiologists, the availability of phenotypic information in the same set of genetically sequenced individuals allow for studying the genetic basis of a complex trait. In this dissertation, I present three separate projects that leverage genetic information originating from DNA sequencing.In the first project I focused on analyzing genetic variation without consideration of a phenotype, as is often done in the field of population genetics to make inferences on demographic history or natural selection. A commonly used summary statistic of genetic variation for population genetics inference is the allele frequency spectrum. However, methods based on the allele frequency spectrum make a simplifying assumption: all sites are interchangeable (i.e., an A->T mutation is the same as a C->T) mutation. In this project, I first extended previous literature to show heterogeneity in the allele frequency spectrum exists across mutation types at finer levels of resolution. I then illustrated how inferences of demographic history and natural selection are impacted by the violation of this assumption.In the second project I focused on combining phenotypic information with genetic data through genome wide association studies (GWAS) and polygenic risk scores (PRS). GWAS estimate per-variant genetic effects on a complex trait, which can be used to summarize the genetic risk of that trait for an individual in PRS (constructed as the GWAS-weighted sum of their risk variants). However, PRS have a portability issue where phenotype predictions worsen as the ancestry of the target sample diverges from that of the GWAS sample. In admixed individuals, genome can be traced back to multiple ancestral populations and ancestry lies on a continuum. Such a continuum causes an ancestry dependence of PRS performance, as the PRS for samples whose ancestry better matches the external GWAS perform better. To help resolve this issue, I developed slaPRS, a stacking-based framework to integrate GWAS from multiple ancestral populations to construct polygenic risk scores (PRS) in admixed individuals. In simulations and real data, slaPRS performed well and reduced the ancestry dependence compared to existing approaches.In the third project I focused on how genetic-phenotypic associations are shared across two more phenotypes through pleiotropy. Pleiotropy can be characterized at resolutions including genome wide, regionally, or at the SNP/gene-level. One approach to studying pleiotropy is local genetic correlation (LGC), which quantifies the extent of genetic sharing in a local region through the similarity in GWAS effect sizes. However, one problem of LGC is that it remains unable to identify SNP or gene-level pleiotropy, making it impossible to identify which variants or genes in a region drive a signal of LGC. To resolve this issue, I developed LDSC-MIX, a Bayesian mixture of regression method to infer latent groups of likely shared causal variants across two traits. In simulations and real data, LDSC-MIX identified SNP sets recovering the true LGC and tested whether genes in a region are enriched for such SNPs.

번역하기

상세검색

RISS 보유자료

상세검색

해외전자자료

Statistical and Machine Learning Methods for the Analysis of Summary Statistics Derived From Large Genomic Datasets.

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료