Statistics from Altmetric.com
The heterogeneous nature of chronic obstructive pulmonary disease (COPD) was recognised long before the term was popularised by Briscoe and Nash,1 and the original classifications of chronic bronchitis and emphysema remain more recognisable to the general public than the term COPD. The challenge in recent years has been to better characterise the different phenotypes that make up the syndrome of COPD and so develop a new classification and terminology for COPD. This is not an esoteric pursuit but a worthwhile endeavour which has the potential to shed light on the underlying pathophysiology, risk factors, natural history and treatment responses of the specific phenotypes. Ultimately, this has the potential to enable tailoring of treatment regimes to individual patients. Currently this is not possible as the treatment guidelines for COPD do not differ according to phenotype, other than by severity. Furthermore, the randomised controlled trials on which the guidelines are based study highly selected COPD subgroups, for which only a minority of people with COPD would have been eligible for inclusion.2 As a result, the findings have limited external validity and are poorly generalisable to patients with COPD managed in the community. This means not only that is there an inadequate evidence base for the majority of patients with COPD, but also that treatments which may provide benefit only to certain phenotypes are unlikely to be identified.
A variety of methods have been used to explore the different phenotypes of COPD. Early studies constructed groups based on recognised clinical patterns, informed by those variables which were significantly associated with outcome. For example, in 1987 Burrows et al3 described three groups of patients with chronic airways obstruction, one considered to have features most characteristic of chronic asthma, a second comprising non-atopic smokers without known asthma and an intermediate third group. Subsequently the classic Venn diagram was proposed and incorporated into the American Thoracic Society (ATS) guidelines.4 5 Later refinements of the Venn diagram described at least 15 phenotypes, the response to treatment and pathogenesis of which are not well understood.6
Defining phenotypic groups through subjective interpretation of data tends to describe groups which match existing beliefs about the patterns of disease. The validity of identified groups can then be tested, but their origins remain vulnerable to personal bias. More recently there have been attempts to explore phenotypes with methods which are less reliant on a priori assumptions. Due to the large number of potentially important but overlapping variables, the focus has turned to multivariate statistical approaches such as principal component analysis, factor analysis and, more recently, cluster analysis.
Principal component analysis and factor analysis are statistical methods which can be applied to large data sets containing multiple variables. Components and factors are derived from a combination of scaled original variables which each describe a proportion of the variability within the study population. The combination of particular variables within the same component or factor may indicate a relationship due to a common underlying pathophysiological process and hence may support the description of particular phenotypes based on the different factors. The allocation of variables to factors is determined by the statistical procedure used rather than according to existing hypotheses, and so is less susceptible to bias. The choice of variables can however affect the outcome and so it is not immune to a priori assumptions. Use of factor analysis has provided evidence for measures of obstruction,7–9 hyperinflation,7 exercise tolerance,7 10 airway hyper-responsiveness or bronchodilator reversibility,9 inflammation,9 11 dyspnoea and health-related quality of life7 8 12 as independent components of COPD phenotypes.
Cluster analysis methods aim to group individuals by measured characteristics, such that differences between groups are maximised and those within groups minimised. The major strength of cluster analysis methodology is that it minimises a priori assumptions about the groups contained within the data and it may therefore be less susceptible to bias. Key elements in the design of any cluster analysis are the methods of recruitment, choice of variables and number of variables.13 If study participants are too similar, then clusters described may not reflect true phenotypes; conversely very heterogeneous groups can lead to multiple very small clusters of doubtful significance. The selection and number of variables involve compromises. Selecting a smaller number of variables considered to be clinically important risks bias towards preconceived phenotypes. However, uncritical inclusion of a large number of variables risks reducing the ability to detect clinically meaningful phenotypes among the noise.
In this issue Garcia-Aymerich and colleagues report a cluster analysis on behalf of the PAC-COPD Study Group (see page 430).14 They collected detailed data on 342 patients presenting to nine teaching hospitals in Spain with a first hospitalisation due to an exacerbation of COPD. Data collected included measures of lung function, exercise tolerance, inflammation, atopy, symptoms, quality of life, nutritional status and arterial blood gases. In addition, subsets had data from CT evaluation of lung density and bronchial wall thickness, sputum inflammometry, sputum microbiology and cardiopulmonary exercise testing. Subjects were followed for up to 4 years to obtain morbidity and mortality data. Three clinically relevant COPD phenotypes were identified: ‘severe respiratory COPD’ which showed the worse status in most of the respiratory domains and exercise capacity, ‘moderate respiratory COPD’, which was characterised by a milder respiratory status, and ‘systemic COPD’, which also had a milder respiratory status but had a higher prevalence of obesity and cardiovascular disease, and higher levels of systemic inflammatory markers. The natural history of the groups differed, with the ‘severe respiratory COPD’ group having more frequent hospitalisations due to COPD and increased mortality, whereas the ‘systemic COPD’ group had more admissions due to cardiovascular disease. The authors suggest that their phenotype of systemic COPD may be consistent with the concept of a ‘chronic systemic inflammatory syndrome’.15 Intriguingly, their observation that this group did not demonstrate greater measures of bronchial inflammation suggests that systemic inflammation may be due to co-morbidities rather than ‘spill-over’ from the lungs. The potential of cluster analysis to inform on risk factors was shown by their observation that whereas smoking, occupational and environmental factors were similar between the phenotypic groups, those with ‘severe respiratory disease’ were shorter. As height can be considered a marker of in utero and childhood lung growth, this observation suggests that early life events may be important in the pathogenesis of this phenotype.
Cluster analyses reported so far have studied very different populations, including a random community-based sample,16 patients enrolled from secondary or tertiary hospital clinics17–20 and a cohort with a first hospital admission for COPD.14 These studies have also used different methods, including variations in clustering algorithms and distance measures, and the number of variables used has varied from 420 to 224.14 While these studies provide complementary information, the substantial differences in baseline characteristics and methodology make it hard to compare the phenotypes produced directly. However, there are areas of agreement, such as the identification of the ‘overlap’ group described by both Wardlaw et al17 and Weatherall et al.16 This group is characterised by severe and markedly variable airflow obstruction with features of atopic asthma, chronic bronchitis and emphysema in smokers. Pistolesi et al19 and Cho et al20 both describe groups approximating to classical emphysema and chronic bronchitis phenotypes, with Weatherall et al16 also describing an emphysema-predominant group. Encompassing the chronic systemic inflammatory syndrome, Burgel et al18 and Garcia-Aymerich et al14 describe a group characterised by obesity and chronic heart failure. It is unclear where the concept of a frequent exacerbator phenotype21–23 fits in the patterns described so far.
However, methods such as factor analysis and cluster analysis ultimately describe associations within data but do not prove that they represent distinct disorders that are clinically meaningful. It has therefore been suggested that these methods are best viewed as hypothesis generating.17 Indeed it has recently been proposed that the term phenotype be reserved for patterns of disease attributes that describe differences between individuals with COPD as they relate to clinically meaningful outcomes such as symptoms, exacerbations, response to treatment, rate of disease progression or death.24
Finally, once the phenotypic groups have been defined, it is necessary to generate robust allocation rules which allow diagnosis of the specific disorder in clinical practice. Discriminant function analysis has been used to determine the variables responsible for most of the differences between groups and thereby generate allocation rules which, using only a few variables, can correctly allocate a subject in the majority of cases.25 An ideal allocation rule would be simple to administer, using only variables which could be collected in routine clinical care, and yet accurately predict the phenotype for a particular patient. The difficulty inherent in this approach is illustrated by the study of Garcia-Aymerich et al,14 for although it was possible to allocate 80% of patients into described clusters by utilising 10 of the 224 variables measured, not all of the 10 measures would be available in routine primary care practice.
The ongoing challenge we are now facing is to determine the distinct phenotypes which represent the disorders that make up the syndrome of COPD. If the phenotypes are shown to vary in response to different pharmacological and non-pharmacological interventions, with simple validated allocation rules, clinicians would potentially be able to target treatments specifically to individual patients.