PT - JOURNAL ARTICLE AU - Peter J Castaldi AU - Jennifer Dy AU - James Ross AU - Yale Chang AU - George R Washko AU - Douglas Curran-Everett AU - Andre Williams AU - David A Lynch AU - Barry J Make AU - James D Crapo AU - Russ P Bowler AU - Elizabeth A Regan AU - John E Hokanson AU - Greg L Kinney AU - Meilan K Han AU - Xavier Soler AU - Joseph W Ramsdell AU - R Graham Barr AU - Marilyn Foreman AU - Edwin van Beek AU - Richard Casaburi AU - Gerald J Criner AU - Sharon M Lutz AU - Steven I Rennard AU - Stephanie Santorico AU - Frank C Sciurba AU - Dawn L DeMeo AU - Craig P Hersh AU - Edwin K Silverman AU - Michael H Cho TI - Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema AID - 10.1136/thoraxjnl-2013-203601 DP - 2014 May 01 TA - Thorax PG - 416--423 VI - 69 IP - 5 4099 - http://thorax.bmj.com/content/69/5/416.short 4100 - http://thorax.bmj.com/content/69/5/416.full SO - Thorax2014 May 01; 69 AB - Background There is notable heterogeneity in the clinical presentation of patients with COPD. To characterise this heterogeneity, we sought to identify subgroups of smokers by applying cluster analysis to data from the COPDGene study. Methods We applied a clustering method, k-means, to data from 10 192 smokers in the COPDGene study. After splitting the sample into a training and validation set, we evaluated three sets of input features across a range of k (user-specified number of clusters). Stable solutions were tested for association with four COPD-related measures and five genetic variants previously associated with COPD at genome-wide significance. The results were confirmed in the validation set. Findings We identified four clusters that can be characterised as (1) relatively resistant smokers (ie, no/mild obstruction and minimal emphysema despite heavy smoking), (2) mild upper zone emphysema-predominant, (3) airway disease-predominant and (4) severe emphysema. All clusters are strongly associated with COPD-related clinical characteristics, including exacerbations and dyspnoea (p<0.001). We found strong genetic associations between the mild upper zone emphysema group and rs1980057 near HHIP, and between the severe emphysema group and rs8034191 in the chromosome 15q region (p<0.001). All significant associations were replicated at p<0.05 in the validation sample (12/12 associations with clinical measures and 2/2 genetic associations). Interpretation Cluster analysis identifies four subgroups of smokers that show robust associations with clinical characteristics of COPD and known COPD-associated genetic variants.