Elsevier

The Lancet

Volume 361, Issue 9357, 15 February 2003, Pages 598-604
The Lancet

Review
Population stratification and spurious allelic association

https://doi.org/10.1016/S0140-6736(03)12520-2Get rights and content

Summary

Great efforts and expense have been expended in attempts to detect genetic polymorphisms contributing to susceptibility to complex human disease. Concomitantly, technology for detection and scoring of single nucleotide polymorphisms (SNPs) has undergone rapid development, extensive catalogues of SNPs across the genome have been constructed, and SNPs have been increasingly used as a means for investigation of the genetic causes of complex human diseases. For many diseases, population-based studies of unrelated individuals—in which case-control and cohort studies serve as standard designs for genetic association analysis—can be the most practical and powerful approach. However, extensive debate has arisen about optimum study design, and considerable concern has been expressed that these approaches are prone to population stratification, which can lead to biased or spurious results. Over the past decade, a great shift has been noted, away from case-control and cohort studies, towards family-based association designs. These designs have fewer problems with population stratification but have greater genotyping and sampling requirements, and data can be difficult or impossible to gather. We discuss past evidence for population stratification on genotype-phenotype association studies, review methods to detect and account for it, and present suggestions for future study design and analysis.

Section snippets

Genetic association studies

Statistical evidence for an association between an allele and a phenotype comes from one of three situations.39 First, the allele itself might be functional and directly affect expression of the phenotype. Second, the allele might be correlated with, or be in linkage disequilibrium with, a causative allele located nearby. Third, the association could be attributable to chance or artifact—eg, confounding or selection bias.

Many study designs are available for association analyses, which can be

Treatment of population stratification in association studies

The problem of population stratification can be viewed essentially as one of sample matching. In general, for any well-designed epidemiological case-control study, the source population from which controls are sampled should be that from which cases are also sampled.21 Population stratification can arise when the genetic background of the source populations differs between cases and controls.

One obvious solution to the difficulty of stratification is to carefully match cases and controls on the

Controlling for stratification with families

The most widespread study design for genetic matching includes use of relatives as controls. There are many family-based matching designs and corresponding statistical methods for discrete and continuous traits.46, 62, 63 The most popular method, and that from which most others are derived, is the transmission-disequilibrium test (TDT).64, 65 The TDT design requires an affected individual and his or her parents, and uses the mendelian principle that for any polymorphic marker, each parent

Controlling for stratification with anonymous genetic markers

There are several methods that protect against population stratification-related drawbacks but do not need family samples. Pritchard and Rosenberg51 popularised the notion of using anonymous genetic markers scattered throughout the genome as indicators of the amount of background diversity in cases and controls. They reasoned that as long as the markers were independent of those affecting the disease of interest, and largely did not correlate with each other, they should reflect baseline

Implications for pharmacogenetic studies

An expanding area of interest in application of SNPs to investigations of disease pathophysiology is stratification of populations by their genetically determined response to therapeutic drugs (pharmacogenetics).83 Ideally, one would like to be able to stratify a population needing treatment into those likely, or unlikely, to respond to treatment and those likely, or unlikely, to have adverse side-effects. One of the primary goals of pharmacogenetics is to understand the role that sequence

Conclusions

Failure to replicate genetic association studies is a genuine concern,9, 34, 44 yet more often it involves poor study design and execution—in particular an absence of appreciation for the sample sizes needed to detect modest genetic effects and overinterpretation of marginal results—than undetected population stratification. For most complex human diseases, the reality of multiple disease-predisposing genes of modest individual effect, gene-gene interactions, gene-environment interactions,

Search strategy and selection criteria

Reference material for this review was selected on the basis of its relevance for specifically addressing the effects, outcomes, or effect of population stratification on allelic association studies. We used our own reference compilations and PubMed to identify the references cited in this work. Beyond our own material, our search terms included “population stratification”, “admixture”, “spurious association”, “genomic control”, and “pharmacogenetic association”. For inclusion, recent

References (92)

  • PritchardJK et al.

    Case-control studies of association in structured or admixed populations

    Theor Popul Biol

    (2001)
  • WeeksD et al.

    Polygenic disease: methods for mapping complex disease traits

    Trends Genet

    (1995)
  • NielsenDM et al.

    Association studies under general disease models

    Theor Popul Biol

    (2001)
  • GordonD et al.

    A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data

    Am J Hum Genet

    (2001)
  • BacanuSA et al.

    The power of genomic control

    Am J Hum Genet

    (2000)
  • DevlinB et al.

    Genomic control, a new approach to genetic-based association studies

    Theor Popul Biol

    (2001)
  • PritchardJK et al.

    Association mapping in structured populations

    Am J Hum Genet

    (2000)
  • SattenGA et al.

    Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model

    Am J Hum Genet

    (2001)
  • RosesAD

    Pharmacogenetics and future drug development and delivery

    Lancet

    (2000)
  • DrewsJ et al.

    The role of innovation in drug development

    Nat Biotechnol

    (1997)
  • RosesAD

    Pharmacogenetics and the practice of medicine

    Nature

    (2000)
  • RischNJ

    Searching for genetic determinants in the new millennium

    Nature

    (2000)
  • PalmerLJ et al.

    Atopy and asthma

  • PalmerLJ et al.

    Genomic approaches to understanding asthma

    Genome Res

    (2000)
  • McCarthyMI

    Susceptibility gene discovery for common metabolic and endocrine traits

    J Mol Endocrinol

    (2002)
  • CardonLR et al.

    Association study designs for complex diseases

    Nat Rev Genet

    (2001)
  • LanderES et al.

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • CollinsFS et al.

    Variations on a theme: cataloging human DNA sequence variation

    Science

    (1997)
  • SachidanandamR et al.

    A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

    Nature

    (2001)
  • Randomised trial of intravenous atenolol among 16 027 cases of suspected acute myocardial infarction

    Lancet

    (1986)
  • Design and characteristics of the study population

    Breast Cancer Res

    (1999)
  • WallaceH

    The need for independent scientific peer review of Biobank UK

    Lancet

    (2002)
  • WrightAF et al.

    Gene-environment interactions: the BioBank UK study

    Pharm J

    (2002)
  • WangDG et al.

    Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome

    Science

    (1998)
  • SchlesselmanJJ

    Case-control studies: design, conduct, analysis

    (1982)
  • Cavalli-SforzaLL et al.

    History and geography of human genes

    (1994)
  • SlatkinM

    Inbreeding coefficients and coalescence times

    Genet Res

    (1991)
  • StephensJC et al.

    Haplotype variation and linkage disequilibrium in 313 human genes

    Science

    (2001)
  • WacholderS et al.

    Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias

    J Natl Cancer Inst

    (2000)
  • ChakrabortyR et al.

    Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci

    Proc Natl Acad Sci USA

    (1988)
  • StephensJC et al.

    Mapping by admixture linkage disequilibrium in human populations: limits and guidelines

    Am J Hum Genet

    (1994)
  • McKeiguePM

    Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations

    Am J Hum Genet

    (1997)
  • McKeiguePM et al.

    Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations

    Ann Hum Genet

    (2000)
  • TaborHK et al.

    Myers RM. Candidate-gene approaches for studying complex genetic traits: practical considerations

    Nat Rev Genet

    (2002)
  • WeissKM et al.

    How many diseases does it take to map a gene with SNPs?

    Nat Genet

    (2000)
  • TerwilligerJD et al.

    Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design

    Hum Biol

    (2000)
  • Cited by (0)

    View full text