Article Text


Original article
Induced sputum genes associated with spirometric and radiological disease severity in COPD ex-smokers
  1. Dave Singh1,
  2. Steven M Fox2,
  3. Ruth Tal-Singer3,
  4. Jonathan Plumb1,
  5. Stewart Bates2,
  6. Peter Broad2,
  7. John H Riley4,
  8. Bartolome Celli5 On behalf of the ECLIPSE Investigators
  1. 1University of Manchester, Medicines Evaluation Unit, University Hospital Of South Manchester Foundation Trust, UK
  2. 2GlaxoSmithKline, Medicines Research Centre, Stevenage, UK
  3. 3GlaxoSmithKline, Respiratory Therapy Area, Philadelphia, USA
  4. 4GlaxoSmithKline, Respiratory Medicines Development Centre,Stockley Park, Uxbridge, UK
  5. 5Pulmonary and Critical Care Division, Brigham and Women's Hospital, Harvard University, Boston, USA
  1. Correspondence to Dave Singh, University of Manchester, Medicines Evaluation Unit, University Hospital of South Manchester Foundation Trust, Manchester M23 9LT, UK; dsingh{at}


Background Induced sputum is used to sample inflammatory cells, predominantly neutrophils and macrophages, from the airways of COPD patients. The author's aim was to identify candidate genes associated with the degree of airflow obstruction and the extent of emphysema by expression profiling, and then to confirm these findings for selected candidates using PCR and protein analysis.

Methods Two sputum studies were performed in Global Initiative for Chronic Obstructive Lung Disease (GOLD) stage 2–4 COPD ex-smokers from the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) cohort. First, gene array profiling at baseline in samples from 148 patients. The findings were replicated in a separate population of 176 patients using real-time PCR. The findings for one selected gene IL-18R were further analysed using immunohistochemistry in lung tissue and induced sputum from patients outside the ECLIPSE cohort.

Results Gene expression profiling revealed changes in 277 genes associated with GOLD stage 2 versus 3 and 4, and 198 genes with changes associated with the degree of emphysema (p<0.01 for each gene). Twelve of these candidate genes were analysed by PCR in the replication cohort, with significant changes (p<0.05) observed for 11 genes. IL-18R protein expression was higher on alveolar macrophages in lung tissue of COPD patients (mean 23.2%) compared to controls (mean ex-smokers 2% and non-smokers 2.5%).

Conclusion Gene expression profiling in sputum cells identified candidate genes that may play roles in molecular mechanisms associated with COPD. The replication by PCR and protein in different studies confirms these findings, and highlights a potential role for IL-18R upregulation in severe COPD.

Statistics from

Key messages

What is the key question?

Are there gene expression changes in induced sputum cells that are associated with worse lung function or more emphysema in COPD patients?

What is the bottom line?

The authors have identified changes in gene expression levels in induced sputum cells that are associated with COPD severity.

Why read on?

Potential molecular mechanisms that are implicated in the progression of COPD are described.


Chronic obstructive pulmonary disease (COPD) is characterised by airflow obstruction associated with an abnormal inflammatory response to inhaled particles,1 most commonly from cigarette smoking. Interestingly, airway inflammation in COPD patients persists after smoking cessation.2 COPD is a complex disease where emphysema resulting from parenchymal destruction and abnormal repair is usually present alongside inflammation.3 4 Furthermore, the degree, extent and location of emphysema varies between patients. The molecular mechanisms underlying persistent inflammation in the absence of continued cigarette smoking and the variable development of emphysema are not fully understood.

COPD patients have increased numbers of neutrophils and macrophages in the airways.3 5–7 These cells release a range of inflammatory mediators and proteases, and are thought to be key players in the pathogenesis of airway inflammation and parenchymal destruction.4 8 These cells can be sampled using induced sputum, which is safe, practical and non-invasive.9

Gene expression profiling using a ‘hypothesis free’ approach may be helpful in COPD to further understanding of the molecular basis of this complex disease. In COPD, genome-wide expression studies have been performed using whole tissue, alveolar macrophages and bronchial epithelial cells.10–15 These studies have identified molecular patterns associated with characteristics of COPD. However, there have been no gene expression profiling studies using inflammatory cells in induced sputum of COPD patients.

The authors studied the gene expression profile of induced sputum cells in ex-smoking COPD patients to investigate genes associated with the degree of airflow obstruction and the extent of emphysema in patients enrolled in the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) cohort.16 The authors first studied a group of patients using gene arrays to identify candidate genes by expression profiling. These findings were then validated in a separate group of patients using specific PCR. This strategy was used to confirm the reproducibility of these findings in different populations, which increases the probability of results being applicable to the wider population of COPD patients. Finally, using a third group of patients not in the ECLIPSE study, protein studies were performed on lung cells to further confirm the findings targeting one specific protein thought to be involved in the pathogenesis of COPD, the IL-18 receptor (IL-18R).



ECLIPSE is a three-year multicentre longitudinal study to identify novel endpoints in COPD.16 The inclusion criteria are in the online supplement (OLS). Sputum induction was performed in a subset of patients at 14 sites. Samples from ex-smokers only were included in this analysis. At the baseline visit, the authors selected 74 patients categorised by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) as stage 2 and 74 patients with GOLD stage 3 or 4, matched for age and gender. At year one the authors studied a separate population of 94 GOLD stage 2 patients and 82 GOLD stage 3 or 4 patients for real-time PCR analysis, who were not matched for age or gender. However these factors were accounted for in the statistical analysis. ECLIPSE is ethically approved and all participants provided written informed consent. ( identifier NCT00292552; GSK Study Identifier SCO104960).

For immunohistochemistry studies, the authors obtained lung tissue from 20 patients at the University Hospital of South Manchester undergoing surgical resection for suspected or confirmed lung cancer, and sputum samples from six COPD patients and six non-smoking controls. Ethical approval was obtained for immunohistochemistry studies and all subjects gave written informed consent.

CT scan

All ECLIPSE subjects underwent a low-dose CT scan of the chest to quantitate the degree of emphysema as described in the OLS. A cut-off of <10% low attenuation area (LAA) was utilised for minimal emphysema (LAAL) and ≥20% for emphysema (LAAH).17

Sputum induction and processing

Sputum induction and processing with dithiothreitol (DTT) was performed using standard methods18 described in the OLS. The cell pellet was suspended in TRIzol Reagent (Invitrogen, Paisley, UK) to lyse the cells and stored at −70°C.

RNA isolation and microarray analysis

Trizol lysates were thawed and RNA extracted before sending to GeneLogic (Maryland) for microarray analysis using HG_Plus_2.0 GeneChips (Affymetrix, Santa Clara, California, USA). Full details are described in the OLS.

Real-time PCR analysis

Taqman real time PCR (RT-PCR) was performed using the ABI 7900HT Sequence Detection System (Applied Biosystems, Foster City, California, USA) as described in the OLS.


Immunohistochemical analysis of IL-18R from tissue blocks and sputum cytospins is fully described in the OLS.

Statistical analysis

Full details of the statistical analysis are in the OLS. A linear model analysis of variance adjusted for age, gender and batch was used to compare microarray gene expression between GOLD stage 3 and 4 against GOLD stage 2, and between LAAH against LAAL. A significant difference between groups was defined as a fold-change (FC) >±2 (expressed as severe COPD relative to moderate COPD) and p-value <0.01. This p value is used in gene array studies to reduce the chance of false positive results (12). The most highly differentially expressed genes (FC±2.3) were subjected to pathway analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID) ( Individual genes were mapped to gene ontology (GO) processes (GeneGo, St. Joseph, MI, and manually by literature searches. The gene array data is accessible at geo{at} (GEO ID GSE22148). Real time PCR data were similarly analysed using a linear model analysis of variance. In this way, p<0.05 was considered statistically significant. All gene expression analysis was performed in SAS version 9.1. Lung tissue immunohistochemistry data were normally distributed, and differences between groups were assessed using unpaired t-tests. Sputum immunohistochemistry data were non-parametric, so differences between groups were assessed using the Mann–Whitney U test. The relationship between LAA score and FEV1% predicted was assessed using Pearson's correlation coefficient.


The clinical characteristics of the patients studied at baseline and year one are shown in table 1. Inhaled medication use was higher in GOLD stage 3 and 4 patients. There was a small increase in the sputum neutrophil percentage in GOLD stage 3 and 4 patients. Within the baseline cohort there were 42 subjects with minimal emphysema defined by LAA <10% (LAAL) and 44 subjects with LAA ≥20% (LAAH). There was a significant association between LAA score and FEV1% predicted (r=−0.049, p<0.0001).

Table 1

Demographics of baseline (Array) and year one (PCR) cohorts

Gene expression profiling

Gene arrays on samples from the 148 patients in the baseline cohort were performed, with 140 passing quality control criteria (69 GOLD stage 2, and 71 GOLD stage 3 & 4). There was a >2 fold difference in the expression levels of 277 probe sets between GOLD stage 2 patients compared to GOLD stages 3 and 4; 120 probe sets were up-regulated and 157 probe sets were down-regulated in GOLD stages 3 and 4. The most highly regulated genes are shown in table 2, with a full list in the OLS.

Table 2

Genes showing significantly different expression according to GOLD stage

There was a >2 fold difference in the expression levels of 198 probe sets between more emphysema (LAAH) patients compared to less emphysema (LAAL) patients; 121 probe sets were up-regulated and 77 probe sets were down-regulated in LAAL patients. The most highly regulated genes are shown in table 3, with a full list in the OLS. The OLS also shows 57 genes differentially expressed due to the degree of emphysema and not airflow obstruction, and 104 genes differentially expressed due to airflow obstruction and not emphysema.

Table 3

Genes showing significantly different expression according to the degree of emphysema

A heatmap was constructed with the most highly regulated genes, using an arbitrary threshold value of >2.3 fold difference in either the comparison of GOLD stages (71 genes) or the comparison of emphysema (51 genes); see figure 1. The fold change of 2.3 was chosen to clearly represent the most highly regulated genes on a heatmap. Lowering the fold change to two would have created a heatmap with too many genes to provide a clear view. The heatmap shows that the majority of the genes that were different in airflow obstruction analysis were also different in the emphysema analysis, with the same direction of change (up- or down-regulation).

Figure 1

Heatmap representation of genes with±2.3 fold change. Positive fold change values indicate increase in GOLD stage 3/4 compared to GOLD stage 2, or LAAH compared to LAAL. The genes are as follows: 1 CPE, 2 FABP4, 3 CADPS2, 4 TPD52L1, 5 VGL-3, 6 ADPN, 7 PRSS21, 8 IGFBP2, 9 SCD, 10 GREB1, 11 BACE1, 12 TCEA3, 13 ITIH5, 14 CD5L, 15 PPIC, 16 EPHX1, 17 ADAMTS15, 18 RBP4, 19 NUDT6, 20 HOXB7, 21 LSAMP, 22 STAC, 23 FLJ10847, 24 ENPP3, 25 TMEM16E, 26 NFIA, 27 SLC19A3, 28 APM-1, 29 CXCL11, 30 CTSW, 31 KCNAB1, 32 ABCB5, 33 FLJ14627, 34 PCOLCE2, 35 EPB41L4A, 36 PTE2B, 37 GCHFR, 38 ACVRL1, 39 SIGLEC11, 40 TRHDE, 41 MGC9564, 42 TRPC6, 43 ELF5, 44 FOLR1, 45 SYTL4, 46 DKFZp762A217, 47 GAB2, 48 SEMG1, 49 TKTL1, 50 EGR3, 51 IL12B, 52 DKFZP564D166, 53 ASPH, 54 LOC284751, 55 KIAA1727, 56 LOC285758, 57 IRAK3, 58 ZHX2, 59 LOC284475, 60 IL18R1, 61 ANKRD33, 62 THBS1, 63 LOC64744, 64 TULP2, 65 CCNA1, 66 C20ORF121, 67 IL1RL1, 68 KIF27, 69 SYNJ2, 70 UPB1, 71 ZNF165, 72 DAAM2, 73 LOC440896, 74 TANK, 75 ZBTB16, 76 TPST1, 77 C5ORF4, 78 SPRR2G, 79 UNQ5782.

The known functions of the genes in the heatmap were determined using DAVID, Gene-Go and literature searches, allowing the identification of groups of genes involved in similar biological processes. Groups of genes with enzymatic functions (hydrolase and zymogen activity), and immune response, signal transduction and cell metabolism were found. These gene groupings are shown in the OLS.

Replication using PCR

Twelve candidate genes were selected from the microarray data set based on a twofold difference in expression between GOLD stage 2 compared to GOLD stages 3 and 4: ADPN, ANKRD33, DAAM2, EDN1, EPHX1, FABP4, IL18R1, IL1R1, PRSS21, SCD, TPST and ZNF165. Eleven of these 12 genes (11/12) were similarly different between GOLD stage 2 and GOLD stage 3 and 4 in the 176 patients of the validating cohort using real time-PCR (p<0.05 in all cases) as shown in table 4.

Table 4

Summary of baseline (gene array) and year one (PCR) results

IL-18R immunohistochemistry

Of the candidate genes with significantly changed expression in both the array profiling and PCR analysis, IL-18R was chosen for protein analysis in a cohort distinct from ECLIPSE. This gene was selected based on existing literature suggesting that IL-18 signalling may be involved in the pathophysiology of COPD.19–22

This cohort comprised eight COPD patients, seven ex-smokers without airflow obstruction and five never smokers undergoing lung surgery (demography shown in table 5). No patients were using inhaled corticosteroids and none were current smokers. IL-18R expression was observed as fine punctuate pattern on the luminal surface of small airway epithelium in all three groups (figure 2A,B). IL-18R expression was significantly increased on alveolar macrophages within COPD lung tissue (mean 23%) compared to controls (means ex-smokers 2%, p=0.01, and non-smokers 2.5%, p=0.03) (figure 2C,D).

Table 5

Characteristics of the lung surgery subjects

Figure 2

Representative images from peripheral lung tissue from a COPD patient (A and C) and a non-smoker control (B and D). A and B show IL-18R expression on small airway epithelial cells. C shows IL-18R expression on COPD alveolar macrophages. D shows no IL-18R expression on control alveolar macrophages (see arrows). E shows the mean number (error bars are SEM) of alveolar macrophages expressing IL-18R in COPD patients (n=8), ES (ex-smokers, n=7) and NS (non-smokers, n=5).

The demography of subjects who provided sputum samples is shown in table 6. IL-18R expression was observed within macrophages from COPD sputum samples (figure 3A,B), with increased expression levels in COPD patients compared to non-smoking controls (p=0.002). Sputum samples from non-smokers were almost devoid of IL-18R expression, even though these samples contained a higher percentage of macrophages than COPD patients (figure 3C).

Table 6

Characteristics of the subjects contributing to sputum immunohistochemistry

Figure 3

Representative images from induced sputum cytospins from a COPD patient (A) and a non-smoker control (B). A shows IL-18R expression on COPD macrophages. B shows no IL-18R expression on control macrophages (see arrows). C shows the median, IQR (boxes) and range (bars) of macrophages expressing IL-18R in COPD patients (n=6) and NS (non-smokers, n=6).


This study of induced sputum gene expression profiling in patients with COPD had three main findings. First, it revealed a large number of candidate genes associated with more severe COPD, whether defined on the basis of airflow obstruction or the degree of emphysema. Second, 11 out of 12 of these candidates observed in a derivative cohort were also differentially expressed in a validating cohort of COPD patients. This replication in two discrete populations using different techniques provides confidence that the results are not false positives. Third, protein studies with the candidate gene IL-18R from a third group of patients showed increased expression in airway macrophages in COPD patients compared to controls. These observations offer potentially important insights into the molecular mechanisms associated with COPD severity.

The authors deliberately used different cohorts of patients for the gene array, PCR and immunohistochemistry studies, because the replication of gene expression findings in distinct cohort increases the likelihood of these results being applicable to wider populations of COPD patients. The sample sizes for the gene array and PCR cohorts were 148 and 176 respectively, which are large for lung gene expression studies.10–15

Previous array studies using human lung tissue to identify COPD gene expression patterns have produced differing results, which are likely to be due at least in part to differences in the types of patients enrolled.10–15 23 24 For example, two studies have shown that early growth response protein 1 (EGR1) is over-expressed in patients with COPD,23 24 but this has not subsequently been reproduced.12 The authors did not find over-expression of EGR1. This may be because sputum samples and lung tissue have different expression profiles. Serpin peptidase inhibitor clade E, member 2 (SERPINE2) has been identified as a COPD susceptibility gene.11 25 In lung tissue samples from COPD patients, there were 65 probe sets with expression levels that correlated to the degree of airflow obstruction, including SERPINE2.12 The authors have compared these 65 probe sets to the results, and did not observe any common probe sets. This underscores the differences that exist in expression profiles between tissue and sputum.

An alternative way to compare the current and previous gene array data is to focus on biological processes rather than individual genes. Using two classification systems (Gene-Go and DAVID), the authors observed common biological processes relating gene expression to more severe airflow obstruction and emphysema. These included cell metabolism, transcription, enzymes/proteolysis and the immune response. An integrative analysis of previous whole lung tissue gene array results has also shown cell metabolism gene expression to be altered in COPD.26 The strong representation of transcription, enzymes/proteolysis and immune response genes in sputum is not surprising, as the predominant cell types, macrophages and neutrophils, are involved in airway immunity.

The authors validated the findings of the baseline cohort in a second replication cohort using PCR for 12 genes, based on the strength of association with degree of airflow limitation. The strength of statistical association between 11 of these genes in both cohorts suggests a possible mechanistic roles for these genes in the pathophysiology of COPD. Tyrosylprotein sulfotransferases (TPSTs) transfer sulphate moieties to tyrosine residues on proteins such as adhesion molecules, G-protein-coupled receptors and extracellular matrix proteins.27 TPSTs alter the sulfation status of the chemokine receptor CCR5,28 suggesting a possible role for TPST1 in inflammatory cell trafficking. Fatty acid binding protein 4 (FABP4) is involved in glucose and lipid homeostasis, particularly in adipocytes and macrophages.29 This protein binds to long-chain fatty acids, co-coordinating the metabolic and inflammatory response of macrophages to such ligands.30 In the context of COPD, it is possible that FABP4 senses lipid mediators within the lungs. The transcription factor gene ZNF165 has increased expression in some cancer types,31 and TRAF (tumour necrosis factor receptor associated factor) family member-associated nuclear factor kappaB activator (TANK) is a negative regulator of Toll receptor signalling.32 For all of these candidate genes, protein studies to confirm gene expression results would be of value.

To validate this concept, the authors performed immunohistochemistry in a third cohort of patients using lung tissue and sputum. The expression levels of IL-18R were increased in COPD airway macrophages. IL-18 promotes T helper 1 development and regulates macrophage and neutrophil chemotaxis and activation.33 Animal and human studies implicate IL-18 in the pathophysiology of COPD. IL-18 causes pulmonary inflammation and emphysema in mice.21 22 The number of IL-18 producing macrophages is increased in COPD patients,20 and IL-18 over-production from pulmonary CD8 and epithelial cells is present in severe COPD.20 Furthermore, IL-18 levels are increased in the induced sputum supernatant19 and serum20 21 of COPD patients compared to controls. IL-18Rα is the extracellular signalling domain of the IL-18R complex.33 IL-18Rα knockout mice do not develop emphysema after exposure to cigarette smoke.21 In healthy humans, IL-18Rα is expressed in the broncho-alveolar epithelium and alveolar macrophages.34 There are no previous reports of IL-18Rα expression in COPD lungs, although increased IL-18Rα has been previously demonstrated in the lungs of patients with idiopathic pulmonary fibrosis.34 IL-18 and IL-18Rα are known therapeutic targets in rheumatoid arthritis,35 and these findings support the case for targeting this signalling axis in COPD as well.

There were some limitations to this study. First, it could be argued that the increased proportion of neutrophils in GOLD stages 3 and 4 relative to GOLD stage 2 in the current study contributed to some of the differences in gene expression observed. However, the difference in neutrophil percentage was small between the two groups (less than 10%) and therefore unlikely to be responsible for the twofold gene expression changes observed. Importantly, the increase in IL-18R expression was driven by airway macrophages, so these findings are clearly not simply dependent on the small increase in neutrophil percentage in GOLD stage 3 and 4 compared to GOLD stage 2. Second, the authors could not match patients for medication use, as GOLD stage 3 and 4 patients are expected to take more medication. These results do not suggest that these medications had an effect on gene expression, as there were no clear signals for inflammatory genes known to be regulated by corticosteroids. The heatmap (figure 1) shows that the majority of the genes that were different in the severity analysis using the GOLD classification were also different in the emphysema analysis, with the same direction of change (up- or down-regulation). This is compatible with the LAA score being significantly associated with FEV1. However, the authors did find a number of genes that were only associated with either the degree of airflow obstruction or emphysema (104 and 57 respectively, listed in the OS). Third, the authors only used airflow obstruction and emphysema to categorise disease severity and did not evaluate other factors known to predict outcome–such as body composition or exercise capacity. More studies are needed to test whether differences in gene expression could be related to differences in those phenotypic expressions of the disease. Fourth, the patients selected from ECLIPSE centres where sputum could be collected, may not be representative of all patients with COPD. However, the study was multicentric, providing some assurance that the subjects are not from a single centre. Finally, the authors did not study current smokers, where findings may be different. The reason for excluding current smokers was to recruit a more homogeneous population, making the data easier to analyse and interpret.

In summary, the authors have shown that there is a shared differential gene expression profile in the sputum samples of ex-smoking patients with different severity of COPD, as quantified by the degree of airflow obstruction and radiologic emphysema. The findings for genes implicated in the degree of airflow obstruction in a derivative cohort were replicated by PCR in a separate cohort of patients. In a third cohort, the up-regulation of IL-18R protein expression on airway macrophages provided validation of gene expression profiling. These observations allow mechanistic insights into biological processes involved in the progression of COPD.


The ECLIPSE Study is funded by GlaxoSmithKline. The authors would like to acknowledge the work of Chris Clayton (GSK) on the gene expression analysis.


View Abstract
  • Web Only Data thx.2010.153767

    Files in this Data Supplement:

  • Web Only Data thx.2010.153767

    Files in this Data Supplement:


  • Funding This study was funded by GlaxoSmithkline.

  • Competing interests Dave Singh has worked on a consultancy basis for GSK, Chiesi Pharmaceuticals, AstraZeneca, CIPLA, Allmiral, ROCHE and Forest. He has received lecture fees from GSK, Chiesi, Boehringer Ingleheim and AstraZeneca. He receives industry sponsored grants from Chiesi Pharmaceuticals, GSK, UCB, AstraZeneca and Novartis. Bartolome Celli conflicts: Grants to the Division I head to complete research studies from GlaxoSmith Kline, Boehringer Ingelheim, Forrest Medical, Astra Zeneca, advisory board payments from GSK, Boehringer Ingelheim, Almirall, Astra Zeneca, and lecture fees from GSK, Boehringer Ingelheim, Astra Zeneca and Almirall. Steven Fox, Ruth Tal-Singer, Stewart Bates, Peter Broad and John Riley are employees of GSK.

  • Ethics approval This study was conducted with the approval of the local ethics committee for each of the participating sites.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.