RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      Statistical and Machine Learning Methods for the Analysis of Summary Statistics Derived From Large Genomic Datasets.

      한글로보기

      https://www.riss.kr/link?id=T17036510

      • 저자
      • 발행사항

        Ann Arbor : ProQuest Dissertations & Theses, 2023

      • 학위수여대학

        University of Michigan Biostatistics

      • 수여연도

        2023

      • 작성언어

        영어

      • 주제어
      • 학위

        Ph.D.

      • 페이지수

        158 p.

      • 지도교수/심사위원

        Advisor: Zollner, Sebastian.

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      Advancements in DNA sequencing over the past decade have transformed our ability to characterize genetic variation in large populations and study the genetics of many complex traits. For population geneticists, information on the genetic variation (i.e., which sites in the genome are mutated and at what frequency) alone is interesting as it allows for studying aspects of a population (e.g., demographic history, natural selection, and mutation rates). For statistical geneticists and genetic epidemiologists, the availability of phenotypic information in the same set of genetically sequenced individuals allow for studying the genetic basis of a complex trait. In this dissertation, I present three separate projects that leverage genetic information originating from DNA sequencing.In the first project I focused on analyzing genetic variation without consideration of a phenotype, as is often done in the field of population genetics to make inferences on demographic history or natural selection. A commonly used summary statistic of genetic variation for population genetics inference is the allele frequency spectrum. However, methods based on the allele frequency spectrum make a simplifying assumption: all sites are interchangeable (i.e., an A->T mutation is the same as a C->T) mutation. In this project, I first extended previous literature to show heterogeneity in the allele frequency spectrum exists across mutation types at finer levels of resolution. I then illustrated how inferences of demographic history and natural selection are impacted by the violation of this assumption.In the second project I focused on combining phenotypic information with genetic data through genome wide association studies (GWAS) and polygenic risk scores (PRS). GWAS estimate per-variant genetic effects on a complex trait, which can be used to summarize the genetic risk of that trait for an individual in PRS (constructed as the GWAS-weighted sum of their risk variants). However, PRS have a portability issue where phenotype predictions worsen as the ancestry of the target sample diverges from that of the GWAS sample. In admixed individuals, genome can be traced back to multiple ancestral populations and ancestry lies on a continuum. Such a continuum causes an ancestry dependence of PRS performance, as the PRS for samples whose ancestry better matches the external GWAS perform better. To help resolve this issue, I developed slaPRS, a stacking-based framework to integrate GWAS from multiple ancestral populations to construct polygenic risk scores (PRS) in admixed individuals. In simulations and real data, slaPRS performed well and reduced the ancestry dependence compared to existing approaches.In the third project I focused on how genetic-phenotypic associations are shared across two more phenotypes through pleiotropy. Pleiotropy can be characterized at resolutions including genome wide, regionally, or at the SNP/gene-level. One approach to studying pleiotropy is local genetic correlation (LGC), which quantifies the extent of genetic sharing in a local region through the similarity in GWAS effect sizes. However, one problem of LGC is that it remains unable to identify SNP or gene-level pleiotropy, making it impossible to identify which variants or genes in a region drive a signal of LGC. To resolve this issue, I developed LDSC-MIX, a Bayesian mixture of regression method to infer latent groups of likely shared causal variants across two traits. In simulations and real data, LDSC-MIX identified SNP sets recovering the true LGC and tested whether genes in a region are enriched for such SNPs.
      번역하기

      Advancements in DNA sequencing over the past decade have transformed our ability to characterize genetic variation in large populations and study the genetics of many complex traits. For population geneticists, information on the genetic variation (i...

      Advancements in DNA sequencing over the past decade have transformed our ability to characterize genetic variation in large populations and study the genetics of many complex traits. For population geneticists, information on the genetic variation (i.e., which sites in the genome are mutated and at what frequency) alone is interesting as it allows for studying aspects of a population (e.g., demographic history, natural selection, and mutation rates). For statistical geneticists and genetic epidemiologists, the availability of phenotypic information in the same set of genetically sequenced individuals allow for studying the genetic basis of a complex trait. In this dissertation, I present three separate projects that leverage genetic information originating from DNA sequencing.In the first project I focused on analyzing genetic variation without consideration of a phenotype, as is often done in the field of population genetics to make inferences on demographic history or natural selection. A commonly used summary statistic of genetic variation for population genetics inference is the allele frequency spectrum. However, methods based on the allele frequency spectrum make a simplifying assumption: all sites are interchangeable (i.e., an A->T mutation is the same as a C->T) mutation. In this project, I first extended previous literature to show heterogeneity in the allele frequency spectrum exists across mutation types at finer levels of resolution. I then illustrated how inferences of demographic history and natural selection are impacted by the violation of this assumption.In the second project I focused on combining phenotypic information with genetic data through genome wide association studies (GWAS) and polygenic risk scores (PRS). GWAS estimate per-variant genetic effects on a complex trait, which can be used to summarize the genetic risk of that trait for an individual in PRS (constructed as the GWAS-weighted sum of their risk variants). However, PRS have a portability issue where phenotype predictions worsen as the ancestry of the target sample diverges from that of the GWAS sample. In admixed individuals, genome can be traced back to multiple ancestral populations and ancestry lies on a continuum. Such a continuum causes an ancestry dependence of PRS performance, as the PRS for samples whose ancestry better matches the external GWAS perform better. To help resolve this issue, I developed slaPRS, a stacking-based framework to integrate GWAS from multiple ancestral populations to construct polygenic risk scores (PRS) in admixed individuals. In simulations and real data, slaPRS performed well and reduced the ancestry dependence compared to existing approaches.In the third project I focused on how genetic-phenotypic associations are shared across two more phenotypes through pleiotropy. Pleiotropy can be characterized at resolutions including genome wide, regionally, or at the SNP/gene-level. One approach to studying pleiotropy is local genetic correlation (LGC), which quantifies the extent of genetic sharing in a local region through the similarity in GWAS effect sizes. However, one problem of LGC is that it remains unable to identify SNP or gene-level pleiotropy, making it impossible to identify which variants or genes in a region drive a signal of LGC. To resolve this issue, I developed LDSC-MIX, a Bayesian mixture of regression method to infer latent groups of likely shared causal variants across two traits. In simulations and real data, LDSC-MIX identified SNP sets recovering the true LGC and tested whether genes in a region are enriched for such SNPs.

      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼