Article Text

Download PDFPDF

Turning subtypes into disease axes to improve prediction of COPD progression
  1. Junxiang Chen1,
  2. Michael Cho2,3,
  3. Edwin K Silverman2,3,
  4. John E Hokanson4,
  5. Greg L Kinney4,
  6. James D Crapo5,
  7. Stephen Rennard6,7,
  8. Jennifer Dy1,
  9. Peter Castaldi2,8
  1. 1 Department of Electrical and Computer Engineering, Northeastern University, Boston, United States
  2. 2 Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, United States
  3. 3 Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, United States
  4. 4 Colorado School of Public Health, University of Colorado Denver, Aurora, Colorado, USA
  5. 5 Department of Medicine, National Jewish Health, Denver, Colorado, USA
  6. 6 Division of Pulmonary, Critical Care, Sleep and Allergy, University of Nebraska Medical Center, Omaha, United States
  7. 7 IMED Biotech Unit, AstraZeneca, Cambridge, United Kingdom
  8. 8 Division of Primary Care and Internal Medicone, Brigham and Women's Hospital, Boston, United States
  1. Correspondence to Dr Peter Castaldi, Brigham and Women's Hospital Department of Medicine, Boston, MA 02115, USA; peter.castaldi{at}


Chronic obstructive pulmonary disease (COPD) is an umbrella definition encompassing multiple disease processes. COPD heterogeneity has been described as distinct subgroups of individuals (subtypes) or as continuous measures of COPD variability (disease axes). There is little consensus on whether subtypes or disease axes are preferred, and the relative value of disease axes and subtypes for predicting COPD progression is unknown. Using a propensity score approach to learn disease axes from pairs of subtypes, we demonstrate that these disease axes predict prospective forced expiratory volume in 1 s decline and emphysema progression more accurately than the subtype pairs from which they were derived.

  • copd epidemiology
  • emphysema

Statistics from


The heterogeneity of chronic obstructive pulmonary disease (COPD) obscures our understanding of its natural history and molecular mechanisms. COPD heterogeneity is often represented as distinct subgroups of subjects (subtypes), but it can also be represented as continuous axes of variability, that is, disease axes.1 A multicohort study demonstrated that subtypes identified by clustering were not reproducible across cohorts, whereas disease axes from the same cohorts were more consistent.2 There is currently no consensus on the best approach to characterise COPD heterogeneity.

We define a subtype as a single subgroup of subjects and a COPD disease axis as any continuous representation of COPD heterogeneity. We describe a method, similar in concept to propensity scores,3 where a pair of COPD subtypes can be used to define a single disease axis by using the subtype pair as the response in a logistic regression model that predicts the likelihood of subtype membership. These predictions constitute a subtype-defined disease axis. For example, in the case of chronic bronchitis (CB), the CB subtype is a binary yes/no classification based on patient symptoms. Conversely, the CB disease axis is a continuous measure derived from a predictive model that describes the propensity of each subject to have CB. Using longitudinal data from the Genetic Epidemiology of COPD (COPDGene) Study, we demonstrate that subtype-defined disease axes provide better prediction of prospective COPD progression than the original subtype pairs from which they were derived.


Subjects in COPDGene with complete 5-year follow-up data were analysed (n=4726). Four general subtype classes were selected for study: CB per the American Thoracic Society for the Division of Lung Diseases (ATS-DLD) definition,4 the pink puffer (PP)/blue bloater (BB) subtype,5 frequent exacerbators (≥2 COPD exacerbations over the previous 12 months)6 and upper/lower lobe emphysema predominant subjects with a log U/L ratio >1.5 for upper lobe or <-1.5 for lower lobe predominance. We refer to a subtype pair as two subtypes that are conceptually related and therefore used to construct a disease axis. For example, the CB subtype class yields a single subtype pair (CB present vs absent), whereas the PP/BB subtype class yields two pairs (PP/neither and BB/neither).

For each subtype pair, we used weighted logistic regression to identify a linear combination of predictors that provide optimal classification for that pair. The beta coefficients of this regression were used to calculate the disease axis value for each analysed subject. This software is available at We selected the baseline values of 27 variables to serve as the predictors in the regression models (see online supplementary materials for variables used). Disease axes were generated only from visit 1 data.

Supplemental material

For the analyses of COPD progression, separate regression models were used to relate subtypes or disease axis scores to either 5-year change in forced expiratory volume in 1 s (FEV1%) of predicted or change in emphysema. To formally test for whether disease axes provide incremental improvement in prediction beyond that provided by a subtype pair, we constructed nested regression models in which a disease axis was added to a base model containing the original subtype pair. Additional information is included in the online supplementary file 1.


A conceptual overview of our approach is shown in figure 1. Subtype definitions and characteristics of the subjects are shown in online supplementary tables 1 and 2. One disease axis was identified for each subtype pair, resulting in a total of six disease axes (one each for frequent/non-frequent exacerbators and presence/absence of CB, and two each for the PP/BB and upper/lower emphysema subtypes). To determine how well the disease axes could correctly classify their original subtypes, we examined the discrimination performance which was excellent for the PP/BB and upper/lower emphysema subgroups (Area under the receiver operating characteristic curve (AUC) >0.98), and reasonable for the frequent exacerbator (AUC=0.79) and CB subgroups (AUC=0.67). The predictors and beta coefficients from these models are shown in online supplementary tables 3 and 4.

Figure 1

Overview of subtype-oriented disease axis approach. Chronic bronchitis (CB) subtypes are used as the response variable for a predictive model that uses 27 predictors (P1–P27) to classify subjects into the proper CB subtype group. The resulting predicted values constitute a continuous CB disease axis. Both subtype assignment and disease axis values are used as predictors in separate regression models in which 5-year change in FEV1 or emphysema serve as the response variables. FEV1, forced expiratory volume in 1 s.

We then studied how well each subtype pair and its respective disease axis could predict two measures of COPD progression, change in FEV1 and quantitative CT emphysema progression. For both outcomes, we observed that regression models containing disease axes as predictors explained a greater proportion of the variance of COPD progression than similar models containing the subtype pair, with particularly marked improvement noted for emphysema progression (table 1). To formally test for significant improvement in prediction from disease axes, for each subtype-containing model, we added the corresponding disease axes and compared the two models (table 2).

Table 1

Regression models using either subtypes or disease axes to predict change in emphysema or change in FEV1% of predicted

Table 2

Regression models using both subtypes and disease axes to predict change in emphysema or change in FEV1

We also examined how well baseline disease axis values predicted the consistency of subtype assignment over time, which is an important issue for the CB and frequent exacerbator subtypes. We classified subjects as persistent or intermittent members of these subtypes according to their status at both COPDGene Study visits, and we observed that persistent subjects had higher disease axis values than intermittent subjects (online supplementary figures 1 and 2, p<0.001 for CB and p=0.007 for frequent exacerbators).


Previous work has shown that COPD variability typically occurs along a continuum.2 Thus, while subtypes may have intuitive appeal, disease axes are more accurate. The method presented here turns subtypes into disease axes, providing representations of COPD heterogeneity that represent a continuum defined by two COPD subtypes. These disease axes were more predictive of COPD progression than the subtypes from which they were derived; because (1) disease axes ‘expand’ subtype information to all subjects in a dataset and (2) disease axes extract subtype-related information from a large number of input variables and thus contain more COPD-related information than subtypes alone.

Since this method uses predefined subtypes to guide data-driven analysis, the strengths of this approach are the interpretability of the disease axes and the improved prediction of disease progression. However, when the sole goal is prediction, purely data-driven methods may yield superior performance. These disease axes were generated in a single cohort, so independent assessment of their generalisability is needed. These results provide proof of concept that subtype-defined disease axes provide more powerful prediction of COPD progression. In the future, it would be useful to define disease axes that can be produced from readily available variables, which would allow disease axes to be generated in a larger set of COPD studies.

In summary, relative to subtypes, disease axes provide more accurate clinical predictions, and in the future, disease axes may improve our clinical characterisation of COPD and enable more powerful biological discovery.



  • Collaborators Administrative Center: James D. Crapo; Edwin K. Silverman; Barry J. Make; Elizabeth A. Regan; Genetic Analysis Center: Terri Beaty; Ferdouse Begum; Peter J. Castaldi; Michael Cho; Dawn L. DeMeo; Adel R. Boueiz; Marilyn G. Foreman; Eitan Halper-Stromberg; Lystra P. Hayden; Craig P. Hersh; Jacqueline Hetmanski; Brian D. Hobbs; John E. Hokanson; Nan Laird; Christoph Lange; Sharon M. Lutz; Merry-Lynn McDonald; Margaret M. Parker; Dandi Qiao; Elizabeth A. Regan; Edwin K. Silverman; Emily S. Wan; Sungho Won; Phuwanat Sakornsakolpat; Dmitry Prokopenko; Imaging Center: Mustafa Al Qaisi; Harvey O. Coxson; Teresa Gray; MeiLan K. Han; Eric A. Hoffman; Stephen Humphries; Francine L. Jacobson; Philip F. Judy; Ella A. Kazerooni; Alex Kluiber; David A. Lynch; John D. Newell; Elizabeth A. Regan; James C. Ross; Raul San Jose Estepar; Joyce Schroeder; Jered Sieren; Douglas Stinson; Berend C. Stoel; Juerg Tschirren; Edwin Van Beek; Bram van Ginneken; Eva van Rikxoort; George Washko; Carla G. Wilson; PFT QA Center, Salt Lake City, UT: Robert Jensen; Data Coordinating Center and Biostatistics, National Jewish Health, Denver, CO: Douglas Everett; Jim Crooks; Camille Moore; Matt Strand; Carla G. Wilson; Epidemiology Core, University of Colorado Anschutz Medical Campus, Aurora, CO: John E. Hokanson; John Hughes; Gregory Kinney; Sharon M. Lutz; Katherine Pratte; Kendra A. Young; Mortality Adjudication Core: Surya Bhatt; Jessica Bon; MeiLan K. Han; Barry Make; Carlos Martinez; Susan Murray; Elizabeth Regan; Xavier Soler; Carla G. Wilson; Biomarker Core: Russell P. Bowler; Katerina Kechris; Farnoush Banaei-Kashani; COPDGene® Investigators – Clinical Centers: Ann Arbor VA: Jeffrey L. Curtis; Carlos H. Martinez; Perry G. Pernicano; Baylor College of Medicine, Houston, TX: Nicola Hanania; Philip Alapat; Mustafa Atik; Venkata Bandi; Aladin Boriek; Kalpatha Guntupalli; Elizabeth Guy; Arun Nachiappan; Amit Parulekar; Brigham and Women’s Hospital, Boston, MA: Dawn L. DeMeo; Craig Hersh; Francine L. Jacobson; George Washko; Columbia University, New York, NY: R. Graham Barr; John Austin; Belinda D’Souza; Gregory D.N. Pearson; Anna Rozenshtein; Byron Thomashow; Duke University Medical Center, Durham, NC: Neil MacIntyre; H. Page McAdams; Lacey Washington; Health Partners Research Institute, Minneapolis: Charlene McEvoy; Joseph Tashjian; Johns Hopkins University, Baltimore: Robert Wise; Robert Brown; Nadia N. Hansel; Karen Horton; Allison Lambert; Nirupama Putcha; Los Angeles Biomedical Research Institute at Harbor UCLA Medical Center, Torrance, CA: Richard Casaburi; Alessandra Adami; Matthew Budoff; Hans Fischer; Janos Porszasz; Harry Rossiter; William Stringer; Michael E. DeBakey VAMC, Houston, TX: Amir Sharafkhaneh; Charlie Lan; Minneapolis VA: Christine Wendt; Brian Bell; Morehouse School of Medicine, Atlanta, GA: Marilyn G. Foreman; Eugene Berkowitz; Gloria Westney; National Jewish Health, Denver, CO: Russell Bowler; David A. Lynch; Reliant Medical Group, Worcester, MA: Richard Rosiello; David Pace; Temple University, Philadelphia, PA: Gerard Criner; David Ciccolella; Francis Cordova; Chandra Dass; Gilbert D’Alonzo; Parag Desai; Michael Jacobs; Steven Kelsen; Victor Kim; A. James Mamary; Nathaniel Marchetti; Aditi Satti; Kartik Shenoy; Robert M. Steiner; Alex Swift; Irene Swift; Maria Elena Vega-Sanchez; University of Alabama, Birmingham, AL: Mark Dransfield; William Bailey; Surya Bhatt; Anand Iyer; Hrudaya Nath; J. Michael Wells; University of California, San Diego, CA: Joe Ramsdell; Paul Friedman; Xavier Soler; Andrew Yen; University of Iowa, Iowa City, IA: Alejandro P. Comellas; Karin F. Hoth; John Newell; Brad Thompson; University of Michigan, Ann Arbor, MI: MeiLan K. Han; Ella Kazerooni; Carlos H. Martinez; University of Minnesota, Minneapolis, MN: Joanne Billings; Abbie Begnaud; Tadashi Allen; University of Pittsburgh, Pittsburgh, PA: Frank Sciurba; Jessica Bon; Divay Chandra; Carl Fuhrman; Joel Weissfeld; University of Texas Health Science Center at San Antonio, San Antonio, TX: Antonio Anzueto; Sandra Adams; Diego Maselli-Caceres; Mario E. Ruiz.

  • Contributors Study concept and design: JC, GLK, JEH and PC. Acquisition, analysis or interpretation of data: JC, MC, EKS, JDC, SR, JD and PC. Drafting of the manuscript: JC and PC. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: JC, JD and PC. Obtained funding: EKS, JDC and PC.

  • Funding The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. The project described was supported by award number U01 HL089897 and award number U01 HL089856 from the National Heart, Lung and Blood Institute.

  • Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.

  • Competing interests JC reports consulting fees and grant support from GSK and Novartis outside the submitted work. MC has received grant support from GSK. In the past 3 years, EKS received honoraria from Novartis for Continuing Medical Education Seminars and grant and travel support from GlaxoSmithKline.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Airwaves
    The Triumvirate