Article Text

Download PDFPDF

Original article
Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema
  1. Peter J Castaldi1,2,
  2. Jennifer Dy3,
  3. James Ross4,
  4. Yale Chang3,
  5. George R Washko5,
  6. Douglas Curran-Everett6,
  7. Andre Williams6,
  8. David A Lynch7,
  9. Barry J Make8,
  10. James D Crapo8,
  11. Russ P Bowler8,
  12. Elizabeth A Regan8,
  13. John E Hokanson9,
  14. Greg L Kinney9,
  15. Meilan K Han10,
  16. Xavier Soler11,
  17. Joseph W Ramsdell11,
  18. R Graham Barr12,
  19. Marilyn Foreman13,
  20. Edwin van Beek14,
  21. Richard Casaburi15,
  22. Gerald J Criner16,
  23. Sharon M Lutz17,
  24. Steven I Rennard18,19,
  25. Stephanie Santorico20,
  26. Frank C Sciurba21,
  27. Dawn L DeMeo1,5,
  28. Craig P Hersh1,5,
  29. Edwin K Silverman1,5,
  30. Michael H Cho1,5
  1. 1Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
  2. 2Divison of General Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
  3. 3Department of Computer Science, Northeastern University, Boston, Massachusetts, USA
  4. 4Surgical Planning Laboratory and Laboratory of Mathematics in Imaging, Brigham and Women's Hospital, Boston, Massachusetts, USA
  5. 5Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
  6. 6Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, Massachusetts, USA
  7. 7Department of Radiology, National Jewish Health, Denver, Massachusetts, USA
  8. 8Department of Medicine, National Jewish Health, Denver, Massachusetts, USA
  9. 9Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Denver, Colorado, USA
  10. 10Department of Internal Medicine, Division of Pulmonary and Critical Care, University of Michigan Health System, Ann Arbor, Michigan, USA
  11. 11Department of Medicine, Division of Pulmonary and Critical Care Medicine, University of California, San Diego, USA
  12. 12Departments of Medicine and Epidemiology, Columbia University Medical Center, New York, USA
  13. 13Morehouse School of Medicine, Atlanta, USA
  14. 14Clinical Research Imaging Centre, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
  15. 15Rehabilitation Clinical Trials Center, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA
  16. 16Temple University, Philadelphia, Pennsylvania, USA
  17. 17Department of Biostatistics, University of Colorado, Denver, USA
  18. 18Anschutz Medical Campus, Aurora, USA
  19. 19University of Nebraska Medical Center, Omaha, USA
  20. 20Department of Mathematics and Statistical Sciences, University of Colorado, Denver, USA
  21. 21Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh, Pittsburgh, USA
  1. Correspondence to Dr Peter Castaldi, Channing Division of Network Medicine, Brigham and Women's Hospital, 181 Longwood Ave., Boston, MA 02115, USA; peter.castaldi{at}


Background There is notable heterogeneity in the clinical presentation of patients with COPD. To characterise this heterogeneity, we sought to identify subgroups of smokers by applying cluster analysis to data from the COPDGene study.

Methods We applied a clustering method, k-means, to data from 10 192 smokers in the COPDGene study. After splitting the sample into a training and validation set, we evaluated three sets of input features across a range of k (user-specified number of clusters). Stable solutions were tested for association with four COPD-related measures and five genetic variants previously associated with COPD at genome-wide significance. The results were confirmed in the validation set.

Findings We identified four clusters that can be characterised as (1) relatively resistant smokers (ie, no/mild obstruction and minimal emphysema despite heavy smoking), (2) mild upper zone emphysema-predominant, (3) airway disease-predominant and (4) severe emphysema. All clusters are strongly associated with COPD-related clinical characteristics, including exacerbations and dyspnoea (p<0.001). We found strong genetic associations between the mild upper zone emphysema group and rs1980057 near HHIP, and between the severe emphysema group and rs8034191 in the chromosome 15q region (p<0.001). All significant associations were replicated at p<0.05 in the validation sample (12/12 associations with clinical measures and 2/2 genetic associations).

Interpretation Cluster analysis identifies four subgroups of smokers that show robust associations with clinical characteristics of COPD and known COPD-associated genetic variants.

  • COPD epidemiology
  • Emphysema
View Full Text

Statistics from

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Airwaves
    Andrew Bush Ian Pavord