Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-04T21:36:53.748Z Has data issue: false hasContentIssue false

3 - Introduction to statistical methods in genome-wide association studies

from Part I - Genome-wide association studies

Published online by Cambridge University Press:  18 December 2015

Can Yang
Affiliation:
Hong Kong Baptist University
Cong Li
Affiliation:
Yale University
Dongjun Chung
Affiliation:
Yale University
Mengjie Chen
Affiliation:
Yale University
Joel Gelernter
Affiliation:
Yale University School of Medicine
Hongyu Zhao
Affiliation:
Yale University
Krishnarao Appasani
Affiliation:
GeneExpression Systems, Inc., Massachusetts
Stephen W. Scherer
Affiliation:
University of Toronto
Peter M. Visscher
Affiliation:
University of Queensland
Get access

Summary

Introduction

After the completion of the Human Genome Project (Lander et al., 2001; Venter et al., 2001) and initiation of the International HapMap Project (Sachidanandam et al., 2001), genome-wide association studies (GWAS) were designed to survey the role of common genetic variations in complex human diseases. It was expected that GWAS would have the advantage of not relying on prior knowledge of biological pathways compared with “candidate gene” studies (Tabor et al., 2002; Wang et al., 2005), because it assays a dense set of single-nucleotide polymorphisms (SNPs) across the whole genome. This advantage allows GWAS to overcome the bias of “candidate gene” studies due to incomplete prior knowledge. It was also expected that GWAS would have higher power and finer resolution to identify genetic variants of modest effects compared to family-based linkage studies (Risch & Merikangas, 1996).

The success of identifying genes for age-related macular degeneration (AMD) under the GWAS paradigm (Klein et al., 2005) convinced the genetics community on the efficiency and feasibility of the GWAS approach to identify unknown disease-associated variants. This study used a commercial genotyping array and assayed about 100,000 SNPs throughout the human genome. It identified the association of complement factor H (CFH) with AMD. The success of finding a common risk allele with an odds ratio (OR) of 4.6 in a small sample set of 96 cases and 50 controls has generated considerable excitement in the genetics community. The p-value of the strongest SNP association surpassed the genome-wide significance threshold after the Bonferroni correction. More importantly, this finding was replicated in the following-up studies (Donoso et al., 2010). Undoubtedly, this encouraging finding raised the confidence among researchers to detect genetic variants that underlie various complex diseases through GWAS. In 2007, the Wellcome Trust Case Control Consortium (WTCCC) published the results of seven GWAS, including Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, and Type 2 Diabetes (The Wellcome Trust Case Control Consortium, 2007). The WTCCC study is considered the starting point of large-scale GWAS (Visscher et al., 2012). Since then, an increasing number of GWAS have been conducted and over 10,000 loci have been reported to be significantly associated with at least one complex trait (see the web resource of GWAS catalog (Hindorff et al., 2009), http://www.genome.gov/gwastudies/).

Type
Chapter
Information
Genome-Wide Association Studies
From Polymorphism to Personalized Medicine
, pp. 26 - 52
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allen, H.L., Estrada, K., Lettre, G., et al. (2010). Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467(7317), 832–838.Google Scholar
Andreassen, O.A., Djurovic, S., Thompson, W.K., et al. (2013). Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. Am. J. Hum. Genet., 92(2), 97–109.CrossRefGoogle ScholarPubMed
Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annu. Rev. Genet., 44, 293–308.CrossRefGoogle ScholarPubMed
Balding, D. (2006). A tutorial on statistical methods for population association studies. Nature Rev. Genet., 7(10), 781–791.CrossRefGoogle ScholarPubMed
Bansal, V., Libiger, O., Torkamani, A. and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet., 11(11), 773–785.CrossRefGoogle ScholarPubMed
Bishop, C.M. and Nasrabadi, N.M. (2006). Pattern Recognition and Machine Learning (Vol. 1). Springer, New York.Google Scholar
Boyle, A., Hong, E., Hariharan, M., et al. (2012). Annotation of functional variation in personal genomes using RegulomeDB. Genome Res., 22(9), 1790–1797.CrossRefGoogle ScholarPubMed
Browning, S.R. and Browning, B.L. (2013). Identity-by-descent-based heritability analysis in the northern Finland birth cohort. Hum. Genet., 132(2), 129–138.Google ScholarPubMed
Cantor, R., Lange, K. and Sinsheimer, J. (2010). Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet., 86(1), 6–22.CrossRefGoogle ScholarPubMed
Chatterjee, N., Wheeler, B., Sampson, J., et al. (2013). Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature Genet., 45(4), 400–405.CrossRefGoogle ScholarPubMed
Chen, G. and Witte, J. (2007). Enriching the analysis of genomewide association studies with hierarchical modeling. Am. J. Hum. Genet., 81(2), 397–404.CrossRefGoogle ScholarPubMed
Cordell, H.J. (2009 ). Detecting gene–gene interactions that underlie human diseases. Nature Rev. Genet., 10, 392–404.CrossRefGoogle ScholarPubMed
Cowper-Sal-lari, R., Zhang, X., Wright, J., et al. (2012). Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nature Genet., 44(11), 1191–1200.Google ScholarPubMed
Cross-Disorder Group of the Psychiatric Genomics Consortium. (2013a). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genet., 45(9), 984–994.
Cross-Disorder Group of the Psychiatric Genomics Consortium. (2013b). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 381(9875), 1371–1379.
de los Campos, G., Gianola, D. and Allison, D.B. (2010). Predicting genetic predisposition in humans: the promise of whole-genome markers. Nature Rev. Genet., 11(12), 880–886.CrossRefGoogle ScholarPubMed
de los Campos, G., Vazquez, A. I., Fernando, R., Klimentidis, Y.C. and Sorensen, D. (2013). Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet., 9(7), e1003608.CrossRefGoogle ScholarPubMed
Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics, 55(4), 997–1004.CrossRefGoogle ScholarPubMed
Donoso, L.A., Vrabec, T. and Kuivaniemi, H. (2010). The role of complement Factor H in age-related macular degeneration: a review. Surv. Ophthalmol., 55(3), 227–246.CrossRefGoogle ScholarPubMed
Falconer, D.S. (1996). Introduction to Quantitative Genetics (ed.). Longman, London.Google Scholar
Fisher, R.A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb., 52(2), 399–433.Google Scholar
Goddard, M.E., Wray, N.R., Verbyla, K. and Visscher, P.M. (2009). Estimating effects and making predictions from genome-wide marker data. Statist. Sci., 24(4), 517–529.CrossRefGoogle Scholar
Golan, D. and Rosset, S. (2013). Narrowing the gap on heritability of common disease by direct estimation in case-control GWAS. arXiv preprint arXiv:1305.5363.
Hartley, S.W. and Sebastiani, P. (2013). PleioGRiP: genetic risk prediction with pleiotropy. Bioinformatics, 29(8), 1086–1088.CrossRefGoogle ScholarPubMed
Hartley, S.W., Monti, S., Liu, C.-T., Steinberg, M.H. and Sebastiani, P. (2012). Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front. Genet., 3, 176.CrossRefGoogle ScholarPubMed
Hastie, T. and Tibshirani, R. (2004). Efficient quadratic regularization for expression arrays. Biostatistics, 5(3), 329–340.CrossRefGoogle ScholarPubMed
Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (ed.). Springer, New York.CrossRefGoogle Scholar
Hindorff, L., Sethupathy, P., Junkins, H., et al. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA, 106(23), 9362–9367.CrossRefGoogle ScholarPubMed
Hopper, J. (1993). Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Statist. Meth. Med. Res., 2(3), 199–223.CrossRefGoogle ScholarPubMed
Hopper, J. and Mathews, J.D. (1982). Extensions to multivariate normal models for pedigree analysis. Ann. Hum. Genet., 46(4), 373–383.CrossRefGoogle ScholarPubMed
Hopper, J. and Mathews, J.D. (1983). Extensions to multivariate normal models for pedigree analysis: II. Modeling the effect of shared environment in the analysis of variation in blood lead levels. Am. J. Epidemiol., 117(3), 344–355.CrossRefGoogle ScholarPubMed
Jiang, J., Li, C., Debashis, P., Yang, C. and Zhao, H. (2013). High dimensional genome-wide association study and mis-specified mixed model analysis. arXiv preprint: arXiv.1404.2355 [math.ST].
Kang, H.M., Sul, J.H., Zaitlen, N.A., et al. (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genet., 42(4), 348–354.CrossRefGoogle ScholarPubMed
Klein, R., Zeiss, C., Chew, E., et al. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308(5720), 385–389.CrossRefGoogle ScholarPubMed
Korte, A., Vilhjálmsson, B.J., Segura, V., et al. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature Genet., 44(9), 1066–1071.CrossRefGoogle ScholarPubMed
Lander, E., Linton, L., Birren, B., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.CrossRefGoogle ScholarPubMed
Lange, K., WestLake, J. and Spence, M. (1976). Extensions to pedigree analysis III. Variance components by the scoring method. Ann. Hum. Genet., 39(4), 485–491.CrossRefGoogle ScholarPubMed
Lee, S.H., Wray, N.R., Goddard, M.E. and Visscher, P.M. (2011). Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet., 88(3), 294–305.CrossRefGoogle ScholarPubMed
Lee, S.H., DeCandia, T.R., Ripke, S., et al. (2012). Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genet., 44(3), 247–250.Google ScholarPubMed
Lee, S.-I., Dudley, A., Drubin, D., et al. (2009). Learning a prior on regulatory potential from eQTL data. PLoS Genet., 5(1), e1000358.CrossRefGoogle ScholarPubMed
Li, C., Yang, C., Gelernter, J. and Zhao, H. (2013). Improving genetic risk prediction by leveraging pleiotropy. arXiv preprint arXiv:1304.7417.
Li, M., Wang, L., Xia, Z., Sham, P. and Wang, J. (2013). GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucl. Acids Res., 41, W150–W158.CrossRefGoogle ScholarPubMed
Lippert, C., Listgarten, J., Liu, Y., et al. (2011). Fast linear mixed models for genome-wide association studies. Nature Meth., 8(10), 833–835.CrossRefGoogle ScholarPubMed
Lippert, C., Quon, G., Kang, E.Y., et al. (2013). The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep., 3.CrossRefGoogle ScholarPubMed
Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.Google Scholar
Maher, B. (2008). Personal genomes: the case of the missing heritability. Nature, 456(7218), 18–21.CrossRefGoogle ScholarPubMed
Manolio, T. (2010). Genomewide association studies and assessment of the risk of disease. New Engl. J. Med., 363(2), 166–176.CrossRefGoogle Scholar
Manolio, T.A., Collins, F.S., Cox, N.J., et al. (2009). Finding the missing heritability of complex diseases. Nature, 461(7265), 747–753.CrossRefGoogle ScholarPubMed
Marchini, J., Cardon, L.R., Phillips, M.S. and Donnelly, P. (2004). The effects of human population structure on large genetic association studies. Nature Genet., 36(5), 512–517.CrossRefGoogle ScholarPubMed
Morris, A.P., Voight, B.F., Teslovich, T.M., et al. (2012). Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genet., 44(9), 981–990.Google ScholarPubMed
Price, A.L., Patterson, N.J., Plenge, R.M., et al. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet., 38(8), 904–909.CrossRefGoogle ScholarPubMed
Price, A.L., Zaitlen, N.A., Reich, D. and Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nature Rev. Genet., 11(7), 459–463.CrossRefGoogle ScholarPubMed
Purcell, S., Neale, B., Todd-Brown, K., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81(3), 559–575.CrossRefGoogle ScholarPubMed
Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science, 273(5281), 1516–1517.CrossRefGoogle ScholarPubMed
Sachidanandam, R., Weissman, D., Schmidt, S., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409(6822), 928–933.CrossRefGoogle ScholarPubMed
Sakoda, L.C., Jorgenson, E. and Witte, J.S. (2013). Turning of COGS moves forward findings for hormonally mediated cancers. Nature Genet., 45(4), 345–348.CrossRefGoogle ScholarPubMed
Schaub, M., Boyle, A., Kundaje, A., Batzoglou, S. and Snyder, M. (2012). Linking disease associations with regulatory information in the human genome. Genome Res., 22, 1748–1759.CrossRefGoogle ScholarPubMed
Searle, S.R., Casella, G. and McCulloch, C.E. (2006). Variance Components. Wiley-Interscience, New York, NY.Google Scholar
Sivakumaran, S., Agakov, F., Theodoratou, E., et al. (2011). Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet., 89(5), 607–618.CrossRefGoogle ScholarPubMed
Speed, D., Hemani, G., Johnson, M.R. and Balding, D.J. (2012). Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet., 91(6), 1011–1021.CrossRefGoogle ScholarPubMed
Stahl, E.A., Wegmann, D., Trynka, G., et al. (2012). Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genet., 44(5), 483–489.CrossRefGoogle ScholarPubMed
Sul, J.H. and Eskin, E. (2013). Mixed models can correct for population structure for genomic regions under selection. Nature Rev. Genet., 14(4), 300.CrossRefGoogle ScholarPubMed
Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., Duijn, C.M. and Aulchenko, Y.S. (2012). Rapid variance components-based method for whole-genome association analysis. Nature Genet., 44(10), 1166–1170.CrossRefGoogle ScholarPubMed
Tabor, H., Risch, N. and Myers, R. (2002). Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet., 3(5), 391–397.CrossRefGoogle ScholarPubMed
The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74.
The Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661–678.
Thompson, R. (1977a). The estimation of heritability with unbalanced data: II. Data available on more than two generations. Biometrics, 33(3), 497–504.Google Scholar
Thompson, R. (1977b). The estimation of heritability with unbalanced data: I. observations available on parents and offspring. Biometrics, 33(3), 485–495.Google Scholar
Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res., 1, 211–244.Google Scholar
Tipping, M.E. and Faul, A.C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Bishop, C.M. and Frey, M. (Eds), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (Vol. 1), Jan 3–6, Key West, FL.Google Scholar
Vattikuti, S., Guo, J., and Chow, C.C. (2012). Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet., 8(3), e1002637.CrossRefGoogle ScholarPubMed
Venter, J., Adams, M., Myers, E., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304–1351.CrossRefGoogle ScholarPubMed
Veyrieras, J.-B., Kudaravalli, S., Kim, S., et al. (2008). High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet., 4(10), e1000214.CrossRefGoogle ScholarPubMed
Visscher, P.M. (2008). Sizing up human height variation. Nature Genet., 40(5), 489–490.CrossRefGoogle ScholarPubMed
Visscher, P.M., Hill, W.G. and Wray, N.R. (2008). Heritability in the genomics era 0150 concepts and misconceptions. Nature Rev. Genet., 9(4), 255–266.CrossRefGoogle Scholar
Visscher, P.M., Brown, M.A., McCarthy, M.I. and Yang, J. (2012). Five years of GWAS discovery. Am. J. Hum. Genet., 90(1), 7–24.CrossRefGoogle ScholarPubMed
Wan, X., Yang, C., Yang, Q., et al. (2010). Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 26, 30–37.Google ScholarPubMed
Wang, W., Barratt, B., Clayton, D. and Todd, J. (2005). Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet., 6(2), 109–118.CrossRefGoogle ScholarPubMed
Ward, L. and Kellis, M. (2012a). HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucl. Acids Res., 40(D1), D930–D934.CrossRefGoogle ScholarPubMed
Ward, L. and Kellis, M. (2012b). Interpreting noncoding genetic variation in complex traits and human disease. Nature Biotechnol., 30, 1095–1106.CrossRefGoogle ScholarPubMed
Wray, N.R., Yang, J., Hayes, B.J., et al. (2013). Pitfalls of predicting complex traits from SNPs. Nature Rev. Genet., 14(7), 507–515.Google ScholarPubMed
Yang, J., Benyamin, B., McEvoy, B.P., et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genet., 42(7), 565–569.CrossRefGoogle ScholarPubMed
Yang, J., Manolio, T.A., Pasquale, L.R., et al. (2011a). Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genet., 43(6), 519–525.CrossRefGoogle ScholarPubMed
Yang, J., Weedon, M.N., Purcell, S., et al. (2011b). Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet., 19(7), 807–812.CrossRefGoogle ScholarPubMed
Zaitlen, N., Kraft, P., Patterson, N., et al. (2013). Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet., 9(5), e1003520.CrossRefGoogle ScholarPubMed
Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genet., 44(7), 821–824.CrossRefGoogle ScholarPubMed
Zhu, X., Zhang, S., Zhao, H. and Cooper, R. S. (2002). Association mapping, using a mixture model for complex traits. Genet. Epidemiol., 23(2), 181–196.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×