Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genome-wide association studies for complex traits: consensus, uncertainty and challenges

Key Points

  • Genome-wide association studies are systematic, well-powered surveys to explore the relationships between sites of common genome sequence variation and disease predisposition on a genome-wide scale.

  • The capacity to undertake genome-wide association studies has resulted in spectacular advances in the understanding of the genetic basis of common phenotypes of biomedical importance, such as diabetes, asthma and some cancers.

  • Application of this approach to large, well-characterized data sets has revealed over 50 disease-susceptibility loci and has provided valuable insights into the allelic architecture of multifactorial traits.

  • The implementation of such studies requires meticulous attention to all stages of the experimental process, from the ascertainment of the samples through to analysis and interpretation of the findings. There is considerable potential for a wide variety of errors and biases to result in spurious associations if precautions are not taken.

  • Extensive replication of positive findings remains the best guarantee against erroneous claims of association. The demand for large-scale replication is leading to extensive international collaborations between groups.

  • Nonetheless, substantial challenges remain as researchers seek more complete descriptions of the susceptibility architecture of traits of interest, and to translate the information gathered into improvements in clinical management.

Abstract

The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). In this study, high density, genome-wide association data on 17,000 individuals identified many novel complex-trait susceptibility loci and explored key methodological and technical issues relevant to the GWA approach.

  2. Todd, J. A. et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 39, 857–864 (2007).

    Article  CAS  PubMed  Google Scholar 

  3. Hakonarson, H. et al. A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature 448, 591–594 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

    Article  CAS  PubMed  Google Scholar 

  5. Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Diabetes Genetics Initiative. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).

  8. Steinthorsdottir, V. et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nature Genet. 39, 770–775 (2007).

    Article  CAS  PubMed  Google Scholar 

  9. Zeggini, E., Scott, L. J., Saxena, R., Voight, B. & DIAGRAM Consortium. Meta-analysis of genome-wide association data and large-scale replication identifies several additional susceptibility loci for type 2 diabetes. Nature Genet. 30 Mar 2008 (doi:10.1038/ng.120).

  10. Parkes, M. et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nature Genet. 39, 830–832 (2007).

    Article  CAS  PubMed  Google Scholar 

  11. Duerr, R. H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Rioux, J. D. et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genet. 39, 596–604 (2007).

    Article  CAS  PubMed  Google Scholar 

  13. Libioulle, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007).

    Article  CAS  PubMed  Google Scholar 

  14. Hampe, J. et al. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nature Genet. 39, 207–211 (2007).

    Article  CAS  PubMed  Google Scholar 

  15. Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature Genet. 39, 631–637 (2007).

    Article  CAS  PubMed  Google Scholar 

  16. Gudmundsson, J. et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genet. 39, 977–983 (2007). This paper is one of the clearest demonstrations so far of the potential for pleiotropy: the same variants in TCF2 influence risk to both type 2 diabetes and prostate cancer.

    Article  CAS  PubMed  Google Scholar 

  17. Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genet. 39, 645–649 (2007).

    Article  CAS  PubMed  Google Scholar 

  18. Thomas, G. et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genet. 40, 310–315 (2008).

    Article  CAS  PubMed  Google Scholar 

  19. Gudmundsson, J. et al. Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nature Genet. 40, 281–283 (2008).

    Article  CAS  PubMed  Google Scholar 

  20. Eeles, R. A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genet. 40, 316–321 (2008).

    Article  CAS  PubMed  Google Scholar 

  21. Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genet. 39, 870–874 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Stacey, S. N. et al. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nature Genet. 39, 865–869 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Moffatt, M. F. et al. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448, 470–473 (2007).

    Article  CAS  PubMed  Google Scholar 

  25. Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).

    Article  CAS  PubMed  Google Scholar 

  26. McPherson, R. et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Samani, N. J. et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).

    Article  CAS  PubMed  Google Scholar 

  29. Willer, C. J. et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genet. 40, 161–169 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Kathiresan, S. et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nature Genet. 40, 189–197 (2008).

    Article  CAS  PubMed  Google Scholar 

  31. Kooner, J. S. et al. Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nature Genet. 40, 149–151 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Weedon, M. N. et al. A common variant of HMGA2 is associated with adult and childhood height in the general population. Nature Genet. 39, 1245–1250 (2007). This paper demonstrates the power of the GWA approach to identify genes influencing continuous biomedical phenotypes, in this case, height.

    Article  CAS  PubMed  Google Scholar 

  33. Sanna, S. et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nature Genet. 40, 198–203 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Weedon, M. N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genet. (in the press).

  35. Lettre, G. et al. Genome-wide association studies identify 10 novel loci for height and highlight new biological pathways in human growth. Nature Genet. (in the press).

  36. Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Scuteri, A et al. Genome-wide association scans shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 3, e115 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Loos, R. J. F. et al. Association studies involving over 90,000 people demonstrate that common variants near to MC4R influence fat mass, weight and risk of obesity. Nature Genet. (in the press).

  39. Altshuler, D. & Daly, M. Guilt beyond a reasonable doubt. Nature Genet. 39, 813–815 (2007).

    Article  CAS  PubMed  Google Scholar 

  40. Li, M., Boehnke, M. & Abecasis, G. R. Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. Am. J. Hum. Genet. 78, 778–792 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Howson, J. M., Barratt, B.J., Todd, J. A. & Cordell, H. J. Comparison of population- and family-based methods for genetic association analysis in the presence of interacting loci. Genet. Epidemiol. 29, 51–67 (2005).

    Article  PubMed  Google Scholar 

  42. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  43. Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case–control association studies. PLoS Genet. 1, e32 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Zheng, G., Freidlin, B. & Gastwirth, J. L. Robust genomic control for association studies. Am. J. Hum. Genet. 78, 350–356 (2006).

    Article  CAS  PubMed  Google Scholar 

  45. Paschou, P. et al. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet. 3, e160 (2007).

    Article  CAS  PubMed Central  Google Scholar 

  46. Tian, C. et al. Analysis and application of European genetic substructure using 300K SNP information. PLoS Genet. 4, e4 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Fellay, J. et al. A whole-genome association study of major determinants for host control of HIV-1. Science 317, 944–947 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  50. Laird, N. M. & Lange, C. Family-based designs in the age of large-scale gene-association studies. Nature Rev. Genet. 7, 385–394 (2006).

    Article  CAS  PubMed  Google Scholar 

  51. Chen, W. M. & Abecasis, G. R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Clayton, D. G. et al. Population structure, differential bias and genomic control in a large-scale, case–control association study. Nature Genet. 37, 1243–1246 (2005). This paper presents a detailed description of the potential for bias and error to complicate the analysis of large-scale genetic association data.

    Article  CAS  PubMed  Google Scholar 

  53. Plagnol, V., Cooper, J. D., Todd, J. A. & Clayton D. G. A method to address differential bias in genotyping in large-scale association studies. PLoS Genet. 3, e74 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Cupples, L. A. et al. The Framingham Heart Study 100k SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 8, S1 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Ridker, P. M. et al. Rationale, design, and methodology of the Women's Genome Health Study: A genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).

    Article  CAS  PubMed  Google Scholar 

  56. Li, S. et al. The GLUT9 gene is associated with serum uric acid levels in Sardinia and Chianti cohorts. PLoS Genet. 3, e194 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cordell, H. J. & Clayton, D. G. Genetic association studies. Lancet 366, 1121–1131 (2005).

    Article  PubMed  Google Scholar 

  58. Wong, M. Y., Day, N. E., Luan, J. A., Chan, K. P & Wareham, N. J. The detection of gene–environment interaction for continuous traits: should we deal with measurement error by bigger studies or better measurement? Int. J. Epidemiol. 32, 51–57 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Wong, M. Y., Day, N. E., Luan, J. A. & Wareham, N. J. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat. Med. 23, 987–998 (2004).

    Article  CAS  PubMed  Google Scholar 

  60. Burke, W., Khoury, M. J., Stewart, A., Zimmern, R. L. & Bellagio Group. The path from genome-based research to population health: development of an international public health genomics network. Genet. Med. 8, 451–458 (2006).

    Article  PubMed  Google Scholar 

  61. Barrett, J. C. & Cardon, L. R. Evaluating coverage of genome-wide association studies. Nature Genet. 38, 659–662 (2006).

    Article  CAS  PubMed  Google Scholar 

  62. Pe'er, I. et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genet. 38, 663–667 (2006).

    Article  CAS  PubMed  Google Scholar 

  63. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).

    Article  CAS  PubMed  Google Scholar 

  64. Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. McCarroll, S. A. & Altshuler, D. M. Copy-number variation and association studies of human disease. Nature Genet. 39, S37–S42 (2007). This paper gives an excellent summary of the challenges to be addressed if large-scale genetic association studies are to be extended to CNVs.

    Article  CAS  PubMed  Google Scholar 

  66. Scherer, S. W. et al. Challenges and standards in integrating surveys of structural variation. Nature Genet. 39, S7–S15 (2007).

    Article  CAS  PubMed  Google Scholar 

  67. Weiss, L. A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).

    Article  CAS  PubMed  Google Scholar 

  68. Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).

    Article  CAS  PubMed  Google Scholar 

  69. Cargill, M. et al. A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am. J. Hum. Genet. 80, 273–290 (2007).

    Article  CAS  PubMed  Google Scholar 

  70. Wang, W. Y., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).

    Article  CAS  PubMed  Google Scholar 

  71. Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005).

    Article  CAS  PubMed  Google Scholar 

  72. Nicolae, D. L,. Wu, X., Miyake, K. & Cox, N. J. GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 22, 1942–1947 (2006).

    Article  CAS  PubMed  Google Scholar 

  73. Rabbee, N. & Speed, T. P. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006).

    Article  CAS  PubMed  Google Scholar 

  74. Xiao, Y., Segal, M. R., Yang, Y. H. & Yeh, R. F. A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays. Bioinformatics 23, 1459–1467 (2007).

    Article  CAS  PubMed  Google Scholar 

  75. Wittke-Thompson, J. K., Pluzhnikov, A. & Cox, N. J. Rational inferences about departures from Hardy–Weinberg equilibrium. Am. J. Hum. Genet. 76, 967–986 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Cox, D. G. & Kraft, P. Quantification of the power of Hardy–Weinberg equilibrium testing to detect genotyping error. Hum. Hered. 61, 10–14 (2006).

    Article  PubMed  Google Scholar 

  77. Smyth, D. J. et al. A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nature Genet. 38, 617–619 (2006).

    Article  CAS  PubMed  Google Scholar 

  78. Lettre, G., Lange, C. & Hirschhorn, J. N. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet. Epidemiol. 31, 358–362 (2007).

    Article  PubMed  Google Scholar 

  79. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    Article  CAS  PubMed  Google Scholar 

  80. Hoggart, C. J. et al. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).

    Article  PubMed  Google Scholar 

  81. Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004). This is an influential paper setting out the rationale for a Bayesian interpretation of genetic association findings, focusing on methods for establishing the confidence with which any given positive association can be regarded.

    Article  PubMed  PubMed Central  Google Scholar 

  82. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. De Bakker, P. I. et al. Efficiency and power in genetic association studies. Nature Genet. 37, 1217–1223 (2005).

    Article  CAS  PubMed  Google Scholar 

  84. Morris, A. P. A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am. J. Hum. Genet. 79, 679–694 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. De Bakker, P. I. et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nature Genet. 38, 1298–1303 (2006).

    Article  CAS  PubMed  Google Scholar 

  86. Service, S. et al. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nature Genet. 38, 556–560 (2006).

    Article  CAS  PubMed  Google Scholar 

  87. Zeggini, E. et al. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nature Genet. 37, 1320–1322 (2005).

    Article  CAS  PubMed  Google Scholar 

  88. Easton, D. F. et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am. J. Hum. Genet. 81, 873–883 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005).

    Article  CAS  PubMed  Google Scholar 

  90. Hirschhorn, J.N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).

    Article  CAS  PubMed  Google Scholar 

  91. NCI-NHGRI Working Group on Replication in Association Studies. Replicating genotype–phenotype associations: what constitutes replication of a genotype–phenotype association, and how best can it be achieved? Nature 447, 655–660 (2007). This feature article is a thoughtful summary of the main issues relating to replication of genetic association studies.

  92. Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003).

    Article  CAS  PubMed  Google Scholar 

  93. Clarke, G. M., Carter, K. W., Palmer, L. J., Morris, A. P. & Cardon, L. R. Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81, 995–1007 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke M. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31, 766–788 (2007).

    Article  Google Scholar 

  95. Wang, H., Thomas, D. C., Pe'er, I. & Stram, D. O. Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 30, 356–368 (2006).

    Article  PubMed  Google Scholar 

  96. Müller, H. H., Pahl, R. & Schäfer, H. Including sampling and phenotyping costs into the optimization of two stage designs for genome wide association studies. Genet. Epidemiol. 31, 844–852 (2007).

    Article  PubMed  Google Scholar 

  97. Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Yu, K et al. Flexible design for following up positive findings. Am. J. Hum. Genet. 81, 540–551 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Gorrochurn, P., Hodge, S. E., Heiman, G. A., Durner, M. & Greenberg, D. A. Non-replication of association studies: 'pseudo-failures' to replicate? Genet. Med. 9, 325–331 (2007).

    Article  Google Scholar 

  100. Ioannidis J. P., Patsopoulos, N. A. & Evangelou, E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Ioannidis J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).

    Article  CAS  PubMed  Google Scholar 

  102. Moonesinghe, R., Khoury, M. J., Liu, T. & Ioannidis, J. P. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl Acad. Sci. USA 105, 617–622 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  103. The GAIN Collaborative Research Group. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature Genet. 39, 1045–1051 (2007).

  104. Egger, M., Schneider, M. & Davey Smith, G. Spurious precision? Meta-analysis of observational studies. BMJ 316, 140–144 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Helgason, A. et al. Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nature Genet. 39, 218–225 (2007).

    Article  CAS  PubMed  Google Scholar 

  106. Locke, D. P., et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). This is a detailed examination of the functional annotation of a subset of the human genome, which reveals the complexity of genomic organization.

  108. Stranger, B. et al. Population genomics of human gene expression. Nature Genet. 39, 1217–1224 (2007).

    Article  CAS  PubMed  Google Scholar 

  109. Dixon, A. L. et al. A genome-wide association study of global gene expression. Nature Genet. 39, 1202–1207 (2007).

    Article  CAS  PubMed  Google Scholar 

  110. Goring, H. H. et al. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genet. 39, 1208–1216 (2007).

    Article  CAS  PubMed  Google Scholar 

  111. Ioannidis, J. P. & Kavvoura, F. K. Concordance of functional in vitro data and epidemiological associations in complex disease genetics. Genet. Med. 8, 583–593 (2006).

    Article  PubMed  Google Scholar 

  112. Lowe, C. E. et al. Large-scale genetic fine mapping and genotype–phenotype associations implicate polymorphism in the IL2RA region in type 1 diabetes. Nature Genet. 39, 1074–1082 (2007).

    Article  CAS  PubMed  Google Scholar 

  113. Ioannidis, J. P. et al. Assessment of cumulative evidence on genetic associations: interim guidelines. Int. J. Epidemiol. 37, 120–132 (2008).

    Article  PubMed  Google Scholar 

  114. Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  Google Scholar 

  115. Zheng, S. L. et al. Cumulative association of five genetic variants with prostate cancer. N. Engl. J. Med. 358, 910–919 (2008).

    Article  CAS  PubMed  Google Scholar 

  116. Stratton, M. R. & Rahman, N. The emerging landscape of breast cancer susceptibility. Nature Genet. 40, 17–22 (2008).

    Article  CAS  PubMed  Google Scholar 

  117. Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

    Article  CAS  PubMed  Google Scholar 

  118. Zheng, S. L. et al. Association between two unlinked loci at 8q24 and prostate cancer risk among European Americans. J. Natl Cancer Inst. 99, 1499–1501 (2007).

    Article  CAS  Google Scholar 

  119. Von Elm, E. & Egger, M. The scandal of poor epidemiological research. BMJ 329, 868–869 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 356–371 (2001).

    Article  CAS  Google Scholar 

  121. Altman, D. & Moher, D. Developing guidelines for reporting healthcare research: scientific rationale and procedures. Med. Clin. (Barc). 125, 8–13 (2005).

    Article  PubMed  Google Scholar 

  122. Gludd, L. L. Bias in clinical intervention research. Am. J. Epidemiol. 163, 493–501 (2006).

    Article  Google Scholar 

  123. Altman, D. G. et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann. Intern. Med. 134, 663–694 (2001).

    Article  CAS  PubMed  Google Scholar 

  124. Von Elm, E. et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 370, 1453–1457 (2007).

    Article  PubMed  Google Scholar 

  125. Seminara, D. et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology 18, 1–8 (2007).

    Article  PubMed  Google Scholar 

  126. Ge, D. et al. WGAViewer: a software for genomic annotation of whole genome association studies. Genome Res. 3 Mar 2008 (doi:10.1101/gr.071571.107).

  127. Janssens, A. C. J. W, Gwinn, M., Subramonia-Iyer, S. & Khoury, M. J. Does genetic testing really improve the prediction of future type 2 diabetes? PLOS Med. 3, e114 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Preparation of this article was supported by funding from the European Commission to the MolPAGE Consortium (LSHG-CT-2004-512066: MMcC) and by research grants from the National Institutes for Health (NHGRI and NHLBI; GRA). We thank our colleagues — particularly P. Donnelly, J. Marchini, J. Barrett, E. Zeggini, C. Lindgren, M. Boehnke, F. Collins, C. Spencer and D. Altshuler for discussions and the reviewers for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark I. McCarthy.

Ethics declarations

Competing interests

After the competion of this work, Professor Cardon accepted the post of Head of Genetics at GlaxoSmithKline. Although this move occurred after the completion of the paper, it nonetheless might be perceived as a conflict with industry and thus we wish to declare it explicitly.

Related links

Related links

FURTHER INFORMATION

The McCarthy Group homepage

Catalog of published genome-wide association studies

Consolidated standards of reporting trials (CONSORT)

dbGAP

ENCODE project

European Genotyping Archive (EGA)

Genetic Association Information Network (GAIN)

Human Genome Epidemiology Network (HuGeNet)

International HapMap Consortium

National Cancer Institute's cancer genetic markers of susceptibility (CGEMS) study

Policy for sharing of data obtained in NIH supported or conducted GWA studies

Strengthening the reporting of observational studies in epidemiology (STROBE)

Wellcome Trust Case Control Consortium

WGAViewer

Glossary

Genome-wide association (GWA) studies

Studies in which a dense array of genetic markers, which captures a substantial proportion of common variation in genome sequence, is typed in a set of DNA samples that are informative for a trait of interest. The aim is to map susceptibility effects through the detection of associations between genotype frequency and trait status.

Case–control design

An association study design in which the primary comparison is between a group of individuals (cases), ascertained for the phenotype of interest and that are presumed to have a high prevalence of susceptibility alleles for that trait, and a second group (controls), not ascertained for the phenotype and considered likely to have a lower prevalence of such alleles.

Selection bias

Bias arising from the fact that the samples ascertained for the study (particularly controls) might not be representative of the wider population that they are purported to represent.

Misclassification bias

Bias resulting from the failure to correctly assign individuals to the relevant group in a casecontrol study; for example, the presence of some individuals who meet the criteria for being cases in a population-based control sample.

Population stratification

The presence in study samples of individuals with different ancestral and demographic histories: if cases and controls differ with respect to these features, markers that are informative for them might be confounded with disease status and lead to spurious associations.

Cryptic relatedness

Evidence typically gained from analysis of GWA data that, despite allowance for known family relationships, individuals in the study sample have residual, non-trivial degrees of relatedness, which can violate the independence assumptions of standard statistical techniques.

Family-based association methods

A suite of analytical approaches in which association testing is performed within families: such approaches offer protection from population substructure effects but at the price of reduced power.

Pleiotropy

The phenomenon whereby a single allele can affect several distinct aspects of the phenotype of an organism, often traits not previously thought to be mechanistically related.

Linkage disequilibrium

(LD). The nonrandom allocation of alleles at nearby variants to individual chromosomes as a result of recent mutation, genetic drift or selection, manifest as correlations between genotypes at closely linked markers.

Copy number variant

(CNV). A class of DNA sequence variant (including deletions and duplications) in which the result is a departure from the expected diploid representation of DNA sequence.

DNA pooling approaches

Association studies that are conducted using estimates of allele frequencies derived from pools of DNA compiled from multiple subjects rather than individual DNA samples.

Informative missingness

If patterns of missing data are nonrandom with respect to both genotype and trait status, then analysis of the available genotypes can result in misleading associations where none truly exists.

Signal intensity (cluster) plots

Plots of raw intensity data for individual variants that are generated by the genotyping platform and represent the extent to which the various genotypes can be discriminated: these provide a useful visual diagnostic for the genotyping data quality.

Hardy–Weinberg equilibrium

(HWE). A theoretical description of the relationship between genotype and allele frequencies that is based on expectation in a stable population undergoing random mating in the absence of selection, new mutations and gene flow: in the context of genetic studies, departures from equilibrium can be used to highlight genotyping errors.

Quantile-quantile plot

(Q-Q plot). In the context of GWA studies, a Q-Q plot is a diagnostic plot that compares the distribution of observed test statistics with the distribution expected under the null.

Cochran–Armitage test

A genotype-based contingency-table test for association that is well suited to the detection of trends across ordinal categories (in this case, genotypes).

Frequentist

A school of statistics that uses p values and combines them with hypothesis testing to make inferences.

Bayes' factors

The Bayesian alternative to classical frequentist approaches to hypothesis testing, essentially equivalent to likelihood ratio tests: prior and posterior information are combined in a ratio that measures the strength of the evidence in favour of one model rather than the other.

False-positive report probability

The probability that a reported association between a genetic variant and a trait of interest is not true.

Haplotype-based methods

Association methods that rely on the relationship between the distribution of estimated haplotype frequencies and trait status, rather than each individual variant in turn.

Imputation methods

A set of approaches for filling in missing genotype data using a sparse set of genotypes (for example, from a GWA scan) and a scaffold of linkage disequilibrium relationships (as provided by the HapMap).

Mendelian randomization

An analytical approach that allows one to test for a causal relationship between two phenotypes that show observational associations, but are subject to confounding: Mendelian randomization makes use of the random segregation of susceptibility alleles at meiosis to explore causality in a model that is freed from most sources of confounding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCarthy, M., Abecasis, G., Cardon, L. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356–369 (2008). https://doi.org/10.1038/nrg2344

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2344

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing