Background Chronic obstructive pulmonary disease (COPD) is increasingly considered a heterogeneous condition. It was hypothesised that COPD, as currently defined, includes different clinically relevant subtypes.
Methods To identify and validate COPD subtypes, 342 subjects hospitalised for the first time because of a COPD exacerbation were recruited. Three months after discharge, when clinically stable, symptoms and quality of life, lung function, exercise capacity, nutritional status, biomarkers of systemic and bronchial inflammation, sputum microbiology, CT of the thorax and echocardiography were assessed. COPD groups were identified by partitioning cluster analysis and validated prospectively against cause-specific hospitalisations and all-cause mortality during a 4 year follow-up.
Results Three COPD groups were identified: group 1 (n=126, 67 years) was characterised by severe airflow limitation (postbronchodilator forced expiratory volume in 1 s (FEV1) 38% predicted) and worse performance in most of the respiratory domains of the disease; group 2 (n=125, 69 years) showed milder airflow limitation (FEV1 63% predicted); and group 3 (n=91, 67 years) combined a similarly milder airflow limitation (FEV1 58% predicted) with a high proportion of obesity, cardiovascular disorders, diabetes and systemic inflammation. During follow-up, group 1 had more frequent hospitalisations due to COPD (HR 3.28, p<0.001) and higher all-cause mortality (HR 2.36, p=0.018) than the other two groups, whereas group 3 had more admissions due to cardiovascular disease (HR 2.87, p=0.014).
Conclusions In patients with COPD recruited at their first hospitalisation, three different COPD subtypes were identified and prospectively validated: ‘severe respiratory COPD’, ‘moderate respiratory COPD’, and ‘systemic COPD’.
- Pulmonary disease
- chronic obstructive
- cluster analysis
- COPD epidemiology
Statistics from Altmetric.com
Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease with pulmonary and extrapulmonary manifestations involving a complex array of cellular, organic, functional and clinical events.1–3 Understanding the phenotypic heterogeneity of COPD and resolving it in well defined subgroups might result in improved prognostic tools and better mechanistic studies,4–6 and may prevent potentially useful treatments from being discarded on the basis of trials that mixed up various types of COPD.7
The interest in understanding COPD heterogeneity is not new. In the past, several COPD subtypes were described based on clinical8 and epidemiological9 observations. More recent studies used factor analysis to identify independent factors in COPD, such as exercise capacity and dyspnoea, airflow limitation, hyperinflation, airway inflammation or asthma features,10–16 or cluster analysis to search for subgroups of COPD.17 18 These studies, however, have important limitations. First, most of them used a very limited range of variables to describe COPD heterogeneity, and this carries the risk of providing spurious results.6 Secondly, only one of them18 included information on the extrapulmonary manifestations of COPD, a domain of the disease with a well established clinical relevance today.2 3 Finally, and more importantly, none of them assessed the predictive validity of the new subtypes, so their clinical relevance could not be shown.
To address all these limitations, the ‘Phenotype and Course of COPD (PAC-COPD)’ study sought to identify clinically and epidemiologically meaningful COPD subtypes and to validate them by assessing their relationship with clinically relevant outcomes (hospitalisation and death) during a 4 year follow-up.19 To this end, the PAC-COPD study adopted a wide multidimensional approach with a comprehensive clinical, functional, biological and imaging characterisation of a well defined cohort of patients with COPD at the time of their first hospital admission because of an exacerbation of the disease.20
The present study includes a cross-sectional analysis to identify COPD groups and a 4 year prospective assessment of their relationship with cause-specific admissions and all-cause mortality.
We recruited all subjects hospitalised for the first time because of a COPD exacerbation in nine teaching hospitals in Spain between January 2004 and March 2006. The diagnosis of COPD was confirmed by spirometry when the patient had reached clinical stability (see below), according to the criteria of the American Thoracic Society (ATS) and the European Respiratory Society (ERS) (postbronchodilator forced expiratory volume in 1 s (FEV1) to forced vital capacity (FVC) ratio ≤0.7).3 The protocol was approved by the Ethics Committees of all the participating hospitals, and written informed consent was obtained from all subjects. Detailed information about recruitment methods has been previously published.21
Detailed information about the methods, questionnaires, standardisation of the tests and fieldwork supervision is provided in the Supplement material and has been previously reported.19 Briefly, 3 months after discharge, when clinically stable, patients underwent a comprehensive characterisation that required from two to four visits, on separate days, to the participating hospitals, and included: (1) questionnaires covering sociodemographic and environmental exposure data, smoking habits, dietary habits, self-reported co-morbidities, previous treatments and diagnoses, respiratory symptoms, health-related quality of life, activities of daily living, sleepiness and psychological status; (2) detailed physical examination and Charlson co-morbidity index obtained by a respiratory physician participating in the study; (3) body composition determination by bioelectric impedance; (4) complete lung function tests including forced spirometry, bronchodilator test, body plethysmography, carbon monoxide diffusing capacity, resting arterial blood gases, respiratory and peripheral muscle strength, and night-time pulse oximetry; (5) 6-minute walk test (6MWT); (6) chest radiographs; (7) skin prick tests; (8) induced sputum; (9) blood laboratory analyses, including peripheral blood cell counts, cholesterol, triglycerides and total immunoglobulin E (IgE); (10) measurements of serum inflammatory and oxidative stress markers, centralised in a single laboratory; and (11) Doppler echocardiography evaluation. All these measurements were done in the total sample of patients (n=342). Additionally, due to their technical and logistical requirements, the following measurements could be performed only in some of the participating hospitals: (12) lung density and emphysema quantification from CT by a centralised evaluation using PulmoCT (Siemens, Munich, Germany) (subsample 1: three hospitals, 102 subjects); (13) semiquantitative evaluation of bronchial wall thickness on CT scans (subsample 2: four hospitals, 148 subjects); (14) microbiological culture of sputum (subsample 3: 8 hospitals, 224 subjects); (15) centralised measurement of sputum inflammatory markers and differential cell counts (subsample 4: 7 hospitals, 181 subjects); and (16) a cardiopulmonary incremental exercise test (CPET) with cycloergometer (subsample 5: 6 hospitals, 200 subjects). The overlap between patients in subsamples is shown in Supplementary table 1.
COPD and cardiovascular admissions up to 31 December 2007 were obtained from national administrative databases. Survival status until 31 December 2008 was obtained from direct interview of the patients or their relatives.
A detailed version of the statistical analysis and power estimation is available in the Supplement material.
From a total of 536 variables obtained, 224 were considered after excluding those with additive relationships or resulting from categorisations. Since missing values made up a small proportion and were considered to be either completely random or random,22 multiple imputation through chained equations23 was used to avoid losing data. Partitioning cluster analysis to group subjects according to distribution of variables (standardised using Z-scores) was done by the k-means method,24 thus sorting participants into groups in a way that maximises differences between groups and minimises differences within groups. Six cluster analyses were built, one for the total sample and one for each of the five subsamples. Clusters in the subsamples included the 150 variables of the total sample in addition to the specific variables of each subsample. We compared the profile of the distribution of the 150 variables common to cluster analyses in the total sample and in each subsample across the three groups identified separately in each of the six cluster analyses, and tested the individual agreement in the assigned cluster group using Kappa statistics.
To display graphically the COPD groups identified by cluster analysis, we used standardised values of all variables, keeping or changing (multiplying by −1) their sign for all values ranging from +1 (less impairment) to −1 (more impairment), and plotted these values with a colour intensity scale spanning from yellow to red, respectively. To present the relative relevance of each variable to the separation into cluster groups, F values were computed as the ratio of the variance of the group means (between-group variance) over the overall variance of the variable (higher values meaning higher relevance), and was obtained by means of linear regression models using the variable under study as the outcome and the cluster group as the exposure. Sociodemographic characteristics, lifestyle factors and environmental exposures were tested as potential determinants of cluster groups.
To validate the clinical relevance of the groups identified by cluster analysis, admissions and mortality during follow-up were compared across groups using Kaplan–Meier curves and log-rank tests. The influence of COPD severity in the association between cluster groups and longitudinal outcomes was assessed including either ATS/ERS severity stages3 or the U-BODE (updated body mass index, airflow obstruction, dyspnoea and exercise capacity) index20 in multivariate models. As sensitivity analyses, we repeated cluster analyses excluding subjects with pneumonia as the concomitant cause for their first COPD hospitalisation, as well as restricting it to males. We also implemented a previously published variable selection procedure that allows dropping noisy, non-informative or redundant variables, thus keeping a smaller number of variables that were used to reconduct the cluster analysis.25 Analyses were performed with Stata release 10.0 (2008, StataCorp LP), and R 2.6.2 (2008, R Foundation for Statistical Computing, Vienna, Austria).
Among 604 elegible patients with COPD, the PAC-COPD cohort included a total of 342 (57%) participants. There were no relevant differences between participants and non-participants. Supplementary figure 1 shows details about exclusions and non-participation. PAC-COPD subjects were mainly men (93%), with a mean age of 68 years and a mean postbronchodilator FEV1 during clinical stability of 52% predicted (table 1).
The number of cluster groups when using k-means needs to be prespecified. Since we had no a priori information for a rational choice, we looked at the results of two and three cluster groups. Figure 1 presents the distribution of variables according to both options. Classifying participants in three groups maximised the differences for systemic inflammation, nutritional status, cardiovascular disorders and lung density, while keeping a similar contrast for the remaining dimensions. Classification in four or more groups did not result in meaningful patterns or statistical advantage. Thus, separation of the study subjects into three different COPD groups, including 126 (37%), 125 (36%) and 91 (27%) patients, was considered the most clinically meaningful option.
Group 1 was characterised by a higher prevalence and severity of respiratory symptoms, poorer quality of life, worse lung function, lower exercise capacity, lower lung density and thickened bronchial walls (table 2). Group 2 exhibited less overall respiratory impairment than group 1 except for lung density, which was similar in both groups. Group 3 showed a respiratory profile similar to group 2 but with a greater prevalence of overweight, systemic inflammation, cardiovascular diseases and diabetes. The complete distribution of values for the 224 variables is provided in Supplementary table 3. Sensitivity analyses showed very similar results after excluding patients with pneumonia as the concomitant cause of their first COPD hospitalisation, or after restricting the analysis to males. The variable selection procedure identified that 10 variables (dyspnoea (modified Medical Research Council scale), St George's Respiratory Questionnaire-Activity, prebronchodilator and postbronchodilator FEV1 (% predicted), thoracic gas volume (TGV; % predicted), inspiratory to total lung capacity (IC/TLC) ratio, Pao2, peripheral blood neutrophil count, body weight and body mass index (BMI)) could be used to obtain the same cluster groups with a mean concordance of 81% (table 2).
No relevant differences in potential risk factors were observed between the three COPD groups (table 3). Remarkably, all groups were of similar age. Smoking status was very similar among them, but smoking history showed that group 2 quit smoking at a younger age (a mean of 3 years earlier), and that group 3 smoked more pack years.
During follow-up, group 1 had more frequent hospitalisations due to COPD, and the highest all-cause mortality, whereas group 3 had more admissions due to cardiovascular disease (figure 2). After adjusting for disease severity, the observed differences in admissions were unchanged (risk of COPD admission in group 1 compared with group 2: unadjusted HR=3.28, ATS/ERS stages adjusted HR=2.89 and U-BODE adjusted HR=2.45; risk of cardiovascular admission in group 3 compared with group 2: unadjusted HR=2.87, ATS/ERS stages adjusted HR=2.88 and U-BODE adjusted HR=2.27). The differences in mortality remained similar but lost statistical significance (risk of mortality in group 1 compared with group 2: unadjusted HR=2.36, ATS/ERS stages adjusted HR=2.01 and U-BODE adjusted HR=1.41) (Supplementary table 3).
In this cohort of 342 patients recruited at the time of their first COPD admission three different groups of COPD patients were identified. Because each group included a large proportion of subjects, and the groups exhibited marked clinical differences, and were associated with different longitudinal outcomes, we propose that they are clinically relevant COPD subtypes. Subtype 1 showed the worst status in most of the respiratory domains of the disease and exercise capacity. Subtype 2 was characterised by a milder respiratory status than subtype 1, and subtype 3 also by a milder respiratory status but a higher prevalence of obesity, cardiovascular disease and diabetes, and higher levels of systemic inflammatory markers.
Previous literature has suggested that part of the phenotypic heterogeneity in COPD is due to a divergent distribution of bronchial airway (chronic bronchitis) and parenchymal disease (emphysema).1 In our study this was observed only in subtype 2, with a substantial degree of emphysema in the absence of increased wall thickness, both according to CT measures. We also found that asthma-like variables did not contribute to phenotypic heterogeneity, contrasting with other publications.9 Our results cannot be directly compared with the studies applying factor analysis in COPD,10–17 since these studies identified a relatively large number of independent disease components from a reduced number of variables. Cluster analysis, instead, allows segregating subjects in different groups.27 Its recent use18 in patients with COPD identified four groups, one of them including obesity and chronic heart failure, in keeping with our subtype 3.
Most of the features of the so-called metabolic syndrome, including obesity, higher levels of triglycerides, diabetes, ischaemic heart disease, arterial hypertension and elevated serum levels of C-reactive protein and fibrinogen, were clustered in subtype 3. Fabbri and Rabe28 proposed that a ‘chronic systemic inflammatory syndrome’ may contribute to the co-morbidities that frequently co-exist in patients with COPD. Our results give preliminary support to this concept because, first, we found that subtype 3 was characterised by both elevated markers of systemic inflammation and a high prevalence of co-morbidities; of note, 17% of patients in subtype 3 had congestive heart failure, a figure that contrasts markedly with that seen in subtypes 1 (5%) and 2 (1%). Secondly, obesity might have contributed to systemic inflammation in subtype 3, as previously suggested.29 Finally, consistently with the hypothesis of the ‘chronic systemic inflammatory syndrome’,28 subtype 3 had a higher risk of COPD and cardiovascular admissions than subtype 2. In contrast, the fact that we did not identify significant differences among subtypes in bronchial inflammatory markers and, nonetheless, subtype 3 showed more systemic inflammation than the other two, does not support previous claims that pulmonary inflammation could ‘spill-over’ from the lungs.30
Despite the fact that airflow limitation in subtype 3 was moderate and similar to that seen in subtype 2, patients in the former subtype showed more dyspnoea and poorer health-related quality of life and exercise capacity (actually quite similar to those seen in subtype 1 with much more severe airflow limitation). This is in agreement with the currently accepted apparent paradox that lung function and other clinical parameters are weakly correlated in COPD,31 32 and might be explained by the combined cardiorespiratory morbidity in subtype 3. As a consequence, it is possible that in clinical trials, this subtype may respond poorly to interventions targeting airflow limitation.7 Equally important is that, in clinical practice, these patients may benefit from an integrated cardiorespiratory assessment and therapeutic plan.
One salient feature of our study is the large differences in airflow limitation seen across the three identified subtypes, since all patients were hospitalised for the first time due to a COPD exacerbation and had similar age and smoking trajectories. The possibility that patients in subtype 1 might be poor perceivers of dyspnoea, a hypothesis that has not been formally explored in COPD but is well established in asthma,33 should not be excluded. The fact that patients in subtype 2 require hospitalisation despite much milder airflow limitation may be explained by a remarkable degree of emphysema—according to CT scan—which could result in a different clinical presentation at the emergency room, leading to an admission. This subtype reported the lowest proportion of respiratory diagnosis and treatment before recruitment (data not shown) and could represent the largest window of opportunity for improving early diagnosis and management of COPD.
Although smoking is the main risk factor for COPD,2 we found no evidence that it may be an important source of COPD heterogeneity, though the larger number of pack-years reported by subtype 3 may have contributed to their higher prevalence of cardiovascular and metabolic disease. Likewise, other exposures to occupational or environmental factors were not significantly different among COPD phenotypes although, admittedly, limited information was available. Finally, it is worth noting that patients in subtype 1 were, on average, 4 cm shorter than those in subtype 3. Since height is considered a marker of in utero and childhood lung growth, we hypothesise that subjects in subtype 1 might have had impaired lungs since early life and, for the same amount of smoking, developed a more severe respiratory status at the time of their first hospital admission. The latter observation is consistent with recent reports on the early origin of COPD.34 As height was included in the cluster analysis, its interpretation as a potential risk factor can be argued.
Some potential limitations of our study also need discussion. The proportion of women included in the study was low, so its results should be extrapolated to females with caution. The intensive characterisation of this study was not feasible outside a hospital setting, so identification and validation of COPD subtypes in different COPD populations is needed. Because of such intensive characterisation, we did not include patients with severe co-morbidities, although we consider it unlikely that this would have distorted the observed subtypes given that they were of similar ages. Our study was restricted to patients who were hospitalised for COPD and thus our findings may be considered to apply only to relatively severe COPD. On the other hand, the selection of patients at their first hospital admission due to COPD should be seen as a strength that allows for a more valid study of the phenotypic heterogeneity of COPD.35 Finally, our cluster analysis was cross-sectional, and assessing the temporal stability of the identified COPD subtypes remains an important issue.
The most obvious clinical implication of our study is that the first COPD hospitalisation offers an opportunity for a wider than usual characterisation of patients with COPD. Actually, that the longitudinal association between subtype 1 and COPD admissions remained after adjusting for the two currently used criteria to assess COPD severity3 32 suggests that our subtype classification builds relevant prognostic information that is not accounted for in these criteria. Importantly, only 10 variables that are usually collected in clinical practice would be enough to separate patients into the described cluster groups. Further research aiming to develop an allocating algorithm is needed. Of course, experimental studies will be needed to assess whether therapeutic strategies tailored to these subtypes can modify the natural history of the disease.
In conclusion, in patients with COPD recruited at their first hospitalisation, three different COPD subtypes have been identified and prospectively validated, which we propose to label as ‘severe respiratory COPD’, ‘moderate respiratory COPD’, and ‘systemic COPD’.
The authors wish to thank the Conjunto Mínimo Básico de Datos de Altas Hospitalarias (CMBDAH) from Catalunya, Euskadi and Illes Balears for providing the information on hospitalisation data, Professor Patrick Royston for his help with multiple imputation commands in Stata, Jose Barrera-Gómez for his help with the variable selection procedure, and Dr Francine Kauffmann and Dr Milo Puhan for their helpful comments on a previous version of the manuscript.
FPG and MB contributed equally to this work.
The ‘Phenotype and Course of COPD (PAC-COPD)’ Study Group: Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Josep M Antó (Principal Investigator), Judith Garcia-Aymerich (project coordinator), Marta Benet, Jordi de Batlle, Ignasi Serra, David Donaire-Gonzalez, Stefano Guerra; Hospital del Mar-IMIM, Barcelona, Joaquim Gea (centre coordinator), Eva Balcells, Àngel Gayete, Mauricio Orozco-Levi, Ivan Vollmer; Hospital Clínic-Institut D'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Joan Albert Barberà (centre coordinator), Federico P Gómez, Carles Paré, Josep Roca, Robert Rodriguez-Roisin, Àlvar Agustí, Xavier Freixa, Diego A Rodriguez, Elena Gimeno, Karina Portillo; Hospital General Universitari Vall D'Hebron, Barcelona, Jaume Ferrer (centre coordinator), Jordi Andreu, Esther Pallissa, Esther Rodríguez; Hospital de la Santa Creu i Sant Pau, Barcelona, Pere Casan (centre coordinator), Rosa Güell, Ana Giménez; Hospital Universitari Germans Trias i Pujol, Badalona, Eduard Monsó (centre coordinator), Alicia Marín, Josep Morera; Hospital Universitari de Bellvitge, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Eva Farrero (centre coordinator), Joan Escarrabill; Hospital de Sabadell, Corporació Parc Taulí, Institut Universitari Parc Taulí (Universitat Autònoma de Barcelona), Sabadell, Antoni Ferrer (centre coordinator); Hospital Universitari Son Dureta, Palma de Mallorca, Jaume Sauleda (centre coordinator), Bernat Togores; Hospital Universitario de Cruces, UPV, Barakaldo, Juan Bautista Gáldiz (centre coordinator), Lorena López; Instituto Nacional de Silicosis, Oviedo, José Belda.
Funding The PAC-COPD Study is funded by grants from Fondo de Investigación Sanitaria (FIS PI020541), Ministry of Health, Spain; Agència d'Avaluació de Tecnologia i Recerca Mèdiques (AATRM 035/20/02), Catalonia government; Spanish Society of Pneumology and Thoracic Surgery (SEPAR 2002/137); Catalan Foundation of Pneumology (FUCAP 2003 Beca Marià Ravà); Red RESPIRA (RTIC C03/11); Red RCESP (RTIC C03/09); Fondo de Investigación Sanitaria (PI052486); Fondo de Investigación Sanitaria (PI052302); Fondo de Investigación Sanitaria (PI060684); Fundació La Marató de TV3 (no. 041110); and Novartis Farmacèutica, Spain. CIBERESP and CIBERES are funded by the Instituto de Salud Carlos III, Ministry of Health, Spain. JG-A has a researcher contract from the Instituto de Salud Carlos III (CP05/00118), Ministry of Health, Spain. FPG was supported by a Programme of Established Career Scientist Award from the Generalitat de Catalunya. No involvement of funding sources in study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication. Researchers are independent of funders.
Competing interests JMA reports that his institution received a €12 000 grant from Novartis Farmacèutica, Spain. No other potential conflict of interest relevant to this article was reported.
Ethics approval This study was conducted with the approval of the Ethics Committees of the participating hospitals and coordinating centre.
Provenance and peer review Not commissioned; externally peer reviewed.