Article Text

Download PDFPDF
Original research
Pulmonary emphysema subtypes defined by unsupervised machine learning on CT scans
  1. Elsa D Angelini1,2,3,
  2. Jie Yang1,
  3. Pallavi P Balte4,
  4. Eric A Hoffman5,
  5. Ani W Manichaikul6,
  6. Yifei Sun7,
  7. Wei Shen8,9,
  8. John H M Austin10,
  9. Norrina B Allen11,
  10. Eugene R Bleecker12,
  11. Russell Bowler13,
  12. Michael H Cho14,15,
  13. Christopher S Cooper16,
  14. David Couper17,
  15. Mark T Dransfield18,
  16. Christine Kim Garcia4,
  17. MeiLan K Han19,
  18. Nadia N Hansel20,
  19. Emlyn Hughes21,
  20. David R Jacobs22,
  21. Silva Kasela23,24,
  22. Joel Daniel Kaufman25,
  23. John Shinn Kim4,26,
  24. Tuuli Lappalainen23,
  25. Joao Lima20,
  26. Daniel Malinsky7,
  27. Fernando J Martinez27,
  28. Elizabeth C Oelsner4,
  29. Victor E Ortega28,
  30. Robert Paine29,
  31. Wendy Post20,
  32. Tess D Pottinger4,
  33. Martin R Prince30,
  34. Stephen S Rich6,
  35. Edwin K Silverman14,
  36. Benjamin M Smith4,31,
  37. Andrew J Swift4,32,
  38. Karol E Watson16,
  39. Prescott G Woodruff33,
  40. Andrew F Laine1,9,10,
  41. R Graham Barr4,34
  1. 1 Department of Biomedical Engineering, Columbia University, New York, New York, USA
  2. 2 LTCI, Institut Polytechnique de Paris, Telecom Paris, Palaiseau, France
  3. 3 NIHR Imperial Biomedical Research Centre, ITMAT Data Science Group, Imperial College, London, UK
  4. 4 Department of Medicine, Columbia University Irving Medical Center, New York, New York, USA
  5. 5 Departments of Radiology, Medicine and Biomedical Engineering, University of Iowa, Iowa City, Iowa, USA
  6. 6 Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA
  7. 7 Department of Biostatistics, Columbia University Irving Medical Center, New York, New York, USA
  8. 8 Department of Pediatrics, Institute of Human Nutrition, Columbia University Irving Medical Center, New York, New York, USA
  9. 9 Columbia Magnetic Resonance Research Center (CMRRC), Columbia University Irving Medical Center, New York, New York, USA
  10. 10 Department of Radiology, Columbia University Irving Medical Center, New York, New York, USA
  11. 11 Institute for Public Health and Medicine (IPHAM) - Center for Epidemiology and Population Health, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
  12. 12 Department of Medicine, University of Arizona Health Sciences, Tucson, Arizona, USA
  13. 13 Department of Medicine, National Jewish Health, Denver, Colorado, USA
  14. 14 Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
  15. 15 Harvard Medical School, Boston, Massachusetts, USA
  16. 16 Department of Medicine, University of California, Los Angeles, California, USA
  17. 17 Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA
  18. 18 Lung Health Center, University of Alabama, Birmingham, Alabama, USA
  19. 19 Department of Medicine, University of Michigan, Ann Arbor, Michigan, USA
  20. 20 Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
  21. 21 Department of Physics, Columbia University, New York, New York, USA
  22. 22 Division of Epidemiology and Community Public Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
  23. 23 Department of Systems Biology, Columbia University Irving Medical Center, New York, New York, USA
  24. 24 New York Genome Center, New York, New York, USA
  25. 25 Departments of Environmental & Occupational Health Sciences, Medicine, and Epidemiology, University of Washington, Seattle, Washington, USA
  26. 26 Department of Medicine, University of Virginia School of Medicine, Charlottesville, Virginia, USA
  27. 27 Department of Medicine, Cornell University Joan and Sanford I Weill Medical College, New York, New York, USA
  28. 28 Department of Pulmonary Medicine, Mayo Clinic, Phoenix, Arizona, USA
  29. 29 Department of Medicine, University of Utah, Salt Lake City, Utah, USA
  30. 30 Department of Radiology, Cornell University Joan and Sanford I Weill Medical College, New York, New York, USA
  31. 31 Department of Medicine, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
  32. 32 Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, UK
  33. 33 Department of Medicine, University of California, San Francisco, California, USA
  34. 34 Department of Epidemiology, Columbia University Irving Medical Center, New York, New York, USA
  1. Correspondence to Dr R Graham Barr, Department of Medicine, Columbia University Irving Medical Center, New York, NY 10032, USA; rgb9{at}columbia.edu

Abstract

Background Treatment and preventative advances for chronic obstructive pulmonary disease (COPD) have been slow due, in part, to limited subphenotypes. We tested if unsupervised machine learning on CT images would discover CT emphysema subtypes with distinct characteristics, prognoses and genetic associations.

Methods New CT emphysema subtypes were identified by unsupervised machine learning on only the texture and location of emphysematous regions on CT scans from 2853 participants in the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS), a COPD case–control study, followed by data reduction. Subtypes were compared with symptoms and physiology among 2949 participants in the population-based Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study and with prognosis among 6658 MESA participants. Associations with genome-wide single-nucleotide-polymorphisms were examined.

Results The algorithm discovered six reproducible (interlearner intraclass correlation coefficient, 0.91–1.00) CT emphysema subtypes. The most common subtype in SPIROMICS, the combined bronchitis-apical subtype, was associated with chronic bronchitis, accelerated lung function decline, hospitalisations, deaths, incident airflow limitation and a gene variant near DRD1, which is implicated in mucin hypersecretion (p=1.1 ×10−8). The second, the diffuse subtype was associated with lower weight, respiratory hospitalisations and deaths, and incident airflow limitation. The third was associated with age only. The fourth and fifth visually resembled combined pulmonary fibrosis emphysema and had distinct symptoms, physiology, prognosis and genetic associations. The sixth visually resembled vanishing lung syndrome.

Conclusion Large-scale unsupervised machine learning on CT scans defined six reproducible, familiar CT emphysema subtypes that suggest paths to specific diagnosis and personalised therapies in COPD and pre-COPD.

  • Emphysema
  • Imaging/CT MRI etc
  • COPD epidemiology

Data availability statement

SPIROMICS and MESA data are available to the scientific community as described in the Acknowledgements section and on the study websites.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

SPIROMICS and MESA data are available to the scientific community as described in the Acknowledgements section and on the study websites.

View Full Text

Footnotes

  • Twitter @jskim8223

  • EDA and JY contributed equally.

  • AFL and RGB contributed equally.

  • Contributors EA, JY, WS, AL and RGB contributed to the machine learning; PB, YS, DC, JSK, DM and RGB contributed to the epidemiological analyses; EAH, NA, CSC, MTD, CKG, EH, MKH, NNH, DRJ, JDK, JL, FJM, EO, RP, MRP, WP, BS, KEW, PGW and RGB contributed to data collection or funding; AWM, ERB, RB, MC, SK, TL, VEO, TP, SSR and EKS contributed to the genomic analyses; JHMA and AJS provided radiologist interpretations; EA and JY drafted the manuscript; all authors contributed to revisions and provided final approval.

  • Funding This work was supported by NIH/NHLBI R01-HL121270, R01-HL077612, R01-HL093081, R01-HL142028, R01-HL130506, R01-HL131565, R01-HL103676 and T32-HL144442. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159-69, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881 and DK063491. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. SPIROMICS was supported by contracts from NIH/NHLBI (HHSN268200900013C-20C), which were supplemented by contributions made through the Foundation for the NIH and COPD Foundation from AstraZeneca; Bellerophon Pharmaceuticals; Boehringer-Ingelheim Pharmaceuticals; Chiesi Farmaceutici SpA; Forest Research Institute; GSK; Grifols Therapeutics; Ikaria Nycomed; Takeda Pharmaceutical Company; Novartis Pharmaceuticals Corporation; Regeneron Pharmaceuticals and Sanofi. The COPD Gene Study was supported by NIH grants K12HL120004, R01HL113264, U01HL089856 and P01HL105339. The COPD Gene Study is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion.

  • Competing interests EDA, PPB, AM, YS, WS, JHMA, MHC, DC, EH, DRJ, SK, JDK, TL, JL, ECO, WP, MRP, SSR, EKS, KEW and AFL reports receiving grants from the National Institutes of Health (NIH). JY performed the work at Columbia University but is now an employee of Google. EAH reports receiving grants from the NIH; being a founder and shareholder of VIDA Diagnostics; and holding patents for an apparatus for analysing CT images to determine the presence of pulmonary tissue pathology, an apparatus for image display and analysis, and a method for multiscale meshing of branching biological structures. EBA reports receiving grants from the American Heart Association and the NIH. CBC reports receiving personal fees from GlaxoSmithKline. MTD reports receiving a grant from the NHLBI and personal fees from AstraZeneca, GlaxoSmithKline, Pulmonx, PneumRx/BTG and Quark. MKH reports consulting for GlaxoSmithKline, AstraZeneca and Boehringer Ingelheim receiving research support from Novartis and Sunovion. NNH reports receiving grants from the NIH, Boehringer Ingelheim, and the COPD Foundation. JDK reports receiving grants from US Environmental Protection Agency and the NIH. FJM reports serving on COPD advisory boards for AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Sunovion and Teva; serving as a consultant for ProterixBio and Verona; serving on the steering committees of studies sponsored by the NHLBI, AstraZeneca, and GlaxoSmithKline; having served on data safety and monitoring boards of COPD studies supported by Genentech and GlaxoSmithKline. BMS reports receiving grants from the NIH, Canadian Institutes of Health Research (CIHR), Fonds de la recherche en santé du Québec (FRQS), the Research Institute of the McGill University Health Centre, the Quebec Lung Association and AstraZeneca. PGW reports receiving personal fees for consultancy from Theravance, AstraZeneca, Regeneron, Sanofi, Genentech, Roche and Janssen. RGB reports receiving grants from the COPD Foundation, the US Environmental Protection Agency (EPA), the American Lung Association and the NIH.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles