Introduction to statistical methods in genome-wide association studies

doi:10.1017/CBO9781107337459.005

3 - Introduction to statistical methods in genome-wide association studies

from Part I - Genome-wide association studies

Published online by Cambridge University Press: 18 December 2015

Can Yang ,

Cong Li ,

Joel Gelernter and

Edited by

Foreword by

Stephen W. Scherer and

Peter M. Visscher

Show author details

Can Yang: Affiliation:
Hong Kong Baptist University
Cong Li: Affiliation:
Yale University
Dongjun Chung: Affiliation:
Yale University
Mengjie Chen: Affiliation:
Yale University
Joel Gelernter: Affiliation:
Yale University School of Medicine
Hongyu Zhao: Affiliation:
Yale University
Krishnarao Appasani: Affiliation:
GeneExpression Systems, Inc., Massachusetts
Stephen W. Scherer: Affiliation:
University of Toronto
Peter M. Visscher: Affiliation:
University of Queensland

Book contents

Get access

Summary

Introduction

After the completion of the Human Genome Project (Lander et al., 2001; Venter et al., 2001) and initiation of the International HapMap Project (Sachidanandam et al., 2001), genome-wide association studies (GWAS) were designed to survey the role of common genetic variations in complex human diseases. It was expected that GWAS would have the advantage of not relying on prior knowledge of biological pathways compared with “candidate gene” studies (Tabor et al., 2002; Wang et al., 2005), because it assays a dense set of single-nucleotide polymorphisms (SNPs) across the whole genome. This advantage allows GWAS to overcome the bias of “candidate gene” studies due to incomplete prior knowledge. It was also expected that GWAS would have higher power and finer resolution to identify genetic variants of modest effects compared to family-based linkage studies (Risch & Merikangas, 1996).

The success of identifying genes for age-related macular degeneration (AMD) under the GWAS paradigm (Klein et al., 2005) convinced the genetics community on the efficiency and feasibility of the GWAS approach to identify unknown disease-associated variants. This study used a commercial genotyping array and assayed about 100,000 SNPs throughout the human genome. It identified the association of complement factor H (CFH) with AMD. The success of finding a common risk allele with an odds ratio (OR) of 4.6 in a small sample set of 96 cases and 50 controls has generated considerable excitement in the genetics community. The p-value of the strongest SNP association surpassed the genome-wide significance threshold after the Bonferroni correction. More importantly, this finding was replicated in the following-up studies (Donoso et al., 2010). Undoubtedly, this encouraging finding raised the confidence among researchers to detect genetic variants that underlie various complex diseases through GWAS. In 2007, the Wellcome Trust Case Control Consortium (WTCCC) published the results of seven GWAS, including Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, and Type 2 Diabetes (The Wellcome Trust Case Control Consortium, 2007). The WTCCC study is considered the starting point of large-scale GWAS (Visscher et al., 2012). Since then, an increasing number of GWAS have been conducted and over 10,000 loci have been reported to be significantly associated with at least one complex trait (see the web resource of GWAS catalog (Hindorff et al., 2009), http://www.genome.gov/gwastudies/).

Type: Chapter
Information: Genome-Wide Association Studies
From Polymorphism to Personalized Medicine
, pp. 26 - 52

DOI: https://doi.org/10.1017/CBO9781107337459.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Allen, H.L., Estrada, K., Lettre, G., et al. (2010). Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467(7317), 832–838.Google Scholar

Andreassen, O.A., Djurovic, S., Thompson, W.K., et al. (2013). Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. Am. J. Hum. Genet., 92(2), 97–109.CrossRef Google Scholar PubMed

Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annu. Rev. Genet., 44, 293–308.CrossRef Google Scholar PubMed

Balding, D. (2006). A tutorial on statistical methods for population association studies. Nature Rev. Genet., 7(10), 781–791.CrossRef Google Scholar PubMed

Bansal, V., Libiger, O., Torkamani, A. and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet., 11(11), 773–785.CrossRef Google Scholar PubMed

Bishop, C.M. and Nasrabadi, N.M. (2006). Pattern Recognition and Machine Learning (Vol. 1). Springer, New York.Google Scholar

Boyle, A., Hong, E., Hariharan, M., et al. (2012). Annotation of functional variation in personal genomes using RegulomeDB. Genome Res., 22(9), 1790–1797.CrossRef Google Scholar PubMed

Browning, S.R. and Browning, B.L. (2013). Identity-by-descent-based heritability analysis in the northern Finland birth cohort. Hum. Genet., 132(2), 129–138.Google Scholar PubMed

Cantor, R., Lange, K. and Sinsheimer, J. (2010). Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet., 86(1), 6–22.CrossRef Google Scholar PubMed

Chatterjee, N., Wheeler, B., Sampson, J., et al. (2013). Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature Genet., 45(4), 400–405.CrossRef Google Scholar PubMed

Chen, G. and Witte, J. (2007). Enriching the analysis of genomewide association studies with hierarchical modeling. Am. J. Hum. Genet., 81(2), 397–404.CrossRef Google Scholar PubMed

Cordell, H.J. (2009 ). Detecting gene–gene interactions that underlie human diseases. Nature Rev. Genet., 10, 392–404.CrossRef Google Scholar PubMed

Cowper-Sal-lari, R., Zhang, X., Wright, J., et al. (2012). Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nature Genet., 44(11), 1191–1200.Google Scholar PubMed

Cross-Disorder Group of the Psychiatric Genomics Consortium. (2013a). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genet., 45(9), 984–994.

Cross-Disorder Group of the Psychiatric Genomics Consortium. (2013b). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 381(9875), 1371–1379.

de los Campos, G., Gianola, D. and Allison, D.B. (2010). Predicting genetic predisposition in humans: the promise of whole-genome markers. Nature Rev. Genet., 11(12), 880–886.CrossRef Google Scholar PubMed

de los Campos, G., Vazquez, A. I., Fernando, R., Klimentidis, Y.C. and Sorensen, D. (2013). Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet., 9(7), e1003608.CrossRef Google Scholar PubMed

Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics, 55(4), 997–1004.CrossRef Google Scholar PubMed

Donoso, L.A., Vrabec, T. and Kuivaniemi, H. (2010). The role of complement Factor H in age-related macular degeneration: a review. Surv. Ophthalmol., 55(3), 227–246.CrossRef Google Scholar PubMed

Falconer, D.S. (1996). Introduction to Quantitative Genetics (ed.). Longman, London.Google Scholar

Fisher, R.A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb., 52(2), 399–433.Google Scholar

Goddard, M.E., Wray, N.R., Verbyla, K. and Visscher, P.M. (2009). Estimating effects and making predictions from genome-wide marker data. Statist. Sci., 24(4), 517–529.CrossRef Google Scholar

Golan, D. and Rosset, S. (2013). Narrowing the gap on heritability of common disease by direct estimation in case-control GWAS. arXiv preprint arXiv:1305.5363.

Hartley, S.W. and Sebastiani, P. (2013). PleioGRiP: genetic risk prediction with pleiotropy. Bioinformatics, 29(8), 1086–1088.CrossRef Google Scholar PubMed

Hartley, S.W., Monti, S., Liu, C.-T., Steinberg, M.H. and Sebastiani, P. (2012). Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front. Genet., 3, 176.CrossRef Google Scholar PubMed

Hastie, T. and Tibshirani, R. (2004). Efficient quadratic regularization for expression arrays. Biostatistics, 5(3), 329–340.CrossRef Google Scholar PubMed

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (ed.). Springer, New York.CrossRef Google Scholar

Hindorff, L., Sethupathy, P., Junkins, H., et al. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA, 106(23), 9362–9367.CrossRef Google Scholar PubMed

Hopper, J. (1993). Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Statist. Meth. Med. Res., 2(3), 199–223.CrossRef Google Scholar PubMed

Hopper, J. and Mathews, J.D. (1982). Extensions to multivariate normal models for pedigree analysis. Ann. Hum. Genet., 46(4), 373–383.CrossRef Google Scholar PubMed

Hopper, J. and Mathews, J.D. (1983). Extensions to multivariate normal models for pedigree analysis: II. Modeling the effect of shared environment in the analysis of variation in blood lead levels. Am. J. Epidemiol., 117(3), 344–355.CrossRef Google Scholar PubMed

Jiang, J., Li, C., Debashis, P., Yang, C. and Zhao, H. (2013). High dimensional genome-wide association study and mis-specified mixed model analysis. arXiv preprint: arXiv.1404.2355 [math.ST].

Kang, H.M., Sul, J.H., Zaitlen, N.A., et al. (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genet., 42(4), 348–354.CrossRef Google Scholar PubMed

Klein, R., Zeiss, C., Chew, E., et al. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308(5720), 385–389.CrossRef Google Scholar PubMed

Korte, A., Vilhjálmsson, B.J., Segura, V., et al. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature Genet., 44(9), 1066–1071.CrossRef Google Scholar PubMed

Lander, E., Linton, L., Birren, B., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.CrossRef Google Scholar PubMed

Lange, K., WestLake, J. and Spence, M. (1976). Extensions to pedigree analysis III. Variance components by the scoring method. Ann. Hum. Genet., 39(4), 485–491.CrossRef Google Scholar PubMed

Lee, S.H., Wray, N.R., Goddard, M.E. and Visscher, P.M. (2011). Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet., 88(3), 294–305.CrossRef Google Scholar PubMed

Lee, S.H., DeCandia, T.R., Ripke, S., et al. (2012). Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genet., 44(3), 247–250.Google Scholar PubMed

Lee, S.-I., Dudley, A., Drubin, D., et al. (2009). Learning a prior on regulatory potential from eQTL data. PLoS Genet., 5(1), e1000358.CrossRef Google Scholar PubMed

Li, C., Yang, C., Gelernter, J. and Zhao, H. (2013). Improving genetic risk prediction by leveraging pleiotropy. arXiv preprint arXiv:1304.7417.

Li, M., Wang, L., Xia, Z., Sham, P. and Wang, J. (2013). GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucl. Acids Res., 41, W150–W158.CrossRef Google Scholar PubMed

Lippert, C., Listgarten, J., Liu, Y., et al. (2011). Fast linear mixed models for genome-wide association studies. Nature Meth., 8(10), 833–835.CrossRef Google Scholar PubMed

Lippert, C., Quon, G., Kang, E.Y., et al. (2013). The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep., 3.CrossRef Google Scholar PubMed

Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.Google Scholar

Maher, B. (2008). Personal genomes: the case of the missing heritability. Nature, 456(7218), 18–21.CrossRef Google Scholar PubMed

Manolio, T. (2010). Genomewide association studies and assessment of the risk of disease. New Engl. J. Med., 363(2), 166–176.CrossRef Google Scholar

Manolio, T.A., Collins, F.S., Cox, N.J., et al. (2009). Finding the missing heritability of complex diseases. Nature, 461(7265), 747–753.CrossRef Google Scholar PubMed

Marchini, J., Cardon, L.R., Phillips, M.S. and Donnelly, P. (2004). The effects of human population structure on large genetic association studies. Nature Genet., 36(5), 512–517.CrossRef Google Scholar PubMed

Morris, A.P., Voight, B.F., Teslovich, T.M., et al. (2012). Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genet., 44(9), 981–990.Google Scholar PubMed

Price, A.L., Patterson, N.J., Plenge, R.M., et al. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet., 38(8), 904–909.CrossRef Google Scholar PubMed

Price, A.L., Zaitlen, N.A., Reich, D. and Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nature Rev. Genet., 11(7), 459–463.CrossRef Google Scholar PubMed

Purcell, S., Neale, B., Todd-Brown, K., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81(3), 559–575.CrossRef Google Scholar PubMed

Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science, 273(5281), 1516–1517.CrossRef Google Scholar PubMed

Sachidanandam, R., Weissman, D., Schmidt, S., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409(6822), 928–933.CrossRef Google Scholar PubMed

Sakoda, L.C., Jorgenson, E. and Witte, J.S. (2013). Turning of COGS moves forward findings for hormonally mediated cancers. Nature Genet., 45(4), 345–348.CrossRef Google Scholar PubMed

Schaub, M., Boyle, A., Kundaje, A., Batzoglou, S. and Snyder, M. (2012). Linking disease associations with regulatory information in the human genome. Genome Res., 22, 1748–1759.CrossRef Google Scholar PubMed

Searle, S.R., Casella, G. and McCulloch, C.E. (2006). Variance Components. Wiley-Interscience, New York, NY.Google Scholar

Sivakumaran, S., Agakov, F., Theodoratou, E., et al. (2011). Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet., 89(5), 607–618.CrossRef Google Scholar PubMed

Speed, D., Hemani, G., Johnson, M.R. and Balding, D.J. (2012). Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet., 91(6), 1011–1021.CrossRef Google Scholar PubMed

Stahl, E.A., Wegmann, D., Trynka, G., et al. (2012). Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genet., 44(5), 483–489.CrossRef Google Scholar PubMed

Sul, J.H. and Eskin, E. (2013). Mixed models can correct for population structure for genomic regions under selection. Nature Rev. Genet., 14(4), 300.CrossRef Google Scholar PubMed

Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., Duijn, C.M. and Aulchenko, Y.S. (2012). Rapid variance components-based method for whole-genome association analysis. Nature Genet., 44(10), 1166–1170.CrossRef Google Scholar PubMed

Tabor, H., Risch, N. and Myers, R. (2002). Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet., 3(5), 391–397.CrossRef Google Scholar PubMed

The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74.

The Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661–678.

Thompson, R. (1977a). The estimation of heritability with unbalanced data: II. Data available on more than two generations. Biometrics, 33(3), 497–504.Google Scholar

Thompson, R. (1977b). The estimation of heritability with unbalanced data: I. observations available on parents and offspring. Biometrics, 33(3), 485–495.Google Scholar

Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res., 1, 211–244.Google Scholar

Tipping, M.E. and Faul, A.C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Bishop, C.M. and Frey, M. (Eds), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (Vol. 1), Jan 3–6, Key West, FL.Google Scholar

Vattikuti, S., Guo, J., and Chow, C.C. (2012). Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet., 8(3), e1002637.CrossRef Google Scholar PubMed

Venter, J., Adams, M., Myers, E., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304–1351.CrossRef Google Scholar PubMed

Veyrieras, J.-B., Kudaravalli, S., Kim, S., et al. (2008). High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet., 4(10), e1000214.CrossRef Google Scholar PubMed

Visscher, P.M. (2008). Sizing up human height variation. Nature Genet., 40(5), 489–490.CrossRef Google Scholar PubMed

Visscher, P.M., Hill, W.G. and Wray, N.R. (2008). Heritability in the genomics era 0150 concepts and misconceptions. Nature Rev. Genet., 9(4), 255–266.CrossRef Google Scholar

Visscher, P.M., Brown, M.A., McCarthy, M.I. and Yang, J. (2012). Five years of GWAS discovery. Am. J. Hum. Genet., 90(1), 7–24.CrossRef Google Scholar PubMed

Wan, X., Yang, C., Yang, Q., et al. (2010). Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 26, 30–37.Google Scholar PubMed

Wang, W., Barratt, B., Clayton, D. and Todd, J. (2005). Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet., 6(2), 109–118.CrossRef Google Scholar PubMed

Ward, L. and Kellis, M. (2012a). HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucl. Acids Res., 40(D1), D930–D934.CrossRef Google Scholar PubMed

Ward, L. and Kellis, M. (2012b). Interpreting noncoding genetic variation in complex traits and human disease. Nature Biotechnol., 30, 1095–1106.CrossRef Google Scholar PubMed

Wray, N.R., Yang, J., Hayes, B.J., et al. (2013). Pitfalls of predicting complex traits from SNPs. Nature Rev. Genet., 14(7), 507–515.Google Scholar PubMed

Yang, J., Benyamin, B., McEvoy, B.P., et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genet., 42(7), 565–569.CrossRef Google Scholar PubMed

Yang, J., Manolio, T.A., Pasquale, L.R., et al. (2011a). Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genet., 43(6), 519–525.CrossRef Google Scholar PubMed

Yang, J., Weedon, M.N., Purcell, S., et al. (2011b). Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet., 19(7), 807–812.CrossRef Google Scholar PubMed

Zaitlen, N., Kraft, P., Patterson, N., et al. (2013). Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet., 9(5), e1003520.CrossRef Google Scholar PubMed

Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genet., 44(7), 821–824.CrossRef Google Scholar PubMed

Zhu, X., Zhang, S., Zhao, H. and Cooper, R. S. (2002). Association mapping, using a mixture model for complex traits. Genet. Epidemiol., 23(2), 181–196.CrossRef Google Scholar PubMed