Article Text

Original article
Measuring respiratory symptoms in clinical trials of COPD: reliability and validity of a daily diary
  1. N K Leidy1,
  2. C C Sexton1,
  3. P W Jones2,
  4. S M Notte1,
  5. B U Monz3,
  6. L Nelsen4,
  7. M Goldman5,
  8. L T Murray1,
  9. S Sethi6
  1. 1Evidera, Bethesda, Maryland, USA
  2. 2St. George's, University of London, London, UK
  3. 3Boehringer Ingelheim GmbH, Ingelheim, Germany
  4. 4Merck & Co Inc, Whitehouse Station, New Jersey, USA
  5. 5AstraZeneca, Wilmington, Delaware, USA
  6. 6University of Buffalo, Buffalo, New York, USA
  1. Correspondence to Dr Nancy Kline Leidy, Sr. Vice President, Scientific Affairs, Evidera, 7101 Wisconsin Ave, Suite 600, Bethesda, MD 20814, USA; nancy.leidy{at}


Background Although respiratory symptoms are characteristic features of COPD, there is no standardised method for quantifying their severity in stable disease.

Objective To evaluate the EXACT-Respiratory Symptom (E-RS) measure, a daily diary comprising 11 of the 14 items in the Exacerbations of Chronic Pulmonary Disease Tool (EXACT).

Methods Qualitative: patient focus group and interviews to address content validity. Quantitative: secondary data analyses to test reliability and validity.

Results Qualitative: n=84; mean (SD) age 65 (10) years, FEV1 1.2(0.4) L; 44% male. Subject descriptions of their respiratory symptoms were consistent with E-RS content and structure. Quantitative: n=188; mean (SD) age 66 (10) years, FEV1 1.2(0.5) L; 50% male. Factor analysis (FA) showed 3 subscales: RS-Breathlessness, RS-Cough & Sputum, and RS-Chest Symptoms; second-order FA supported a general factor and total score. Reliability (total and subscales): 0.88, 0.86, 0.73, 0.81; 2-day test-retest ICC: 0.90, 0.86, 0.87, 0.82, respectively. Validity: Total scores correlated significantly (p < 0.0001) with SGRQ Total (r=0.75), Symptoms (r=0.66), Activity (r=0.57), Impact (r=0.70) scores; subscale correlations were also significant (r=0.26, p < 0.05 (RS-Chest Symptoms with Activity) to r=0.69, p < 0.0001 (RS-Cough & Sputum with Symptoms). RS-Breathlessness correlated with rescue medication use (r=0.32, p < 0.0001), clinician-reported mMRC (r=0.33, p < 0.0001), and FEV1% predicted (r=-0.17, p < 0.05). E-RS scores differentiated groups based on chronic bronchitis diagnosis (p < 0.01–0.001), smoking status (p < 0.05–0.001), and rescue medication use (p < 0.05–0.0001).

Conclusions Results suggest the RS-Total is a reliable and valid instrument for evaluating respiratory symptom severity in stable COPD. Further study of sensitivity to change is warranted.

  • COPD Exacerbations

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Key messages

  • What is the key question?

  • Is there a standardised, reliable and valid diary to evaluate daily respiratory symptoms in clinical studies of stable COPD?

  • What is the bottom line?

  • Yes, the E-RS.

  • Why read on?

  • This paper presents methods and results of two studies showing that RS-Total and subscale scores (RS-Breathlessness, RS-Cough & Sputum, RS-Chest Symptoms) have content validity and are reliable and valid in stable patients with COPD.


COPD is a progressive disease characterised by persistent airflow limitation with varying degrees of airway wall narrowing, inflammation and emphysema. Respiratory symptoms, including breathlessness, cough and sputum production, are characteristic features of the disease and have significant adverse effects on patient functioning and quality of life.1–4 Although spirometric measures are useful for diagnosis and evaluating change in lung function, they do not capture symptom severity or variability; weak correlations between lung function and symptoms show they cannot be used as proxies for one another.5–9 Because respiratory symptoms are patient-experienced, it is important to measure this outcome directly, through a patient-reported outcome (PRO) instrument.

To date, there is no standardised method for evaluating respiratory symptoms of stable COPD with a documented development programme consistent with good research practices 10 ,11 and the Food and Drug Administration (FDA) PRO Guidance.12 Dyspnoea is the most frequently assessed symptom, often measured with the Baseline/Transition Dyspnea Index,13 or modified Medical Research Council Dyspnea Scale (mMRC). Health-status measures with symptom components include the St George's Respiratory Questionnaire (SGRQ),14 Chronic Respiratory Disease Questionnaire,15 Clinical COPD Questionnaire,16 and COPD Assessment Test.17 These questionnaires are completed by subjects intermittently using varied recall periods (eg, since the last visit, past month, previous week, or now).

An alternative to periodic symptom assessment is a patient-completed diary, capturing symptoms each day. This approach is necessary for studies examining the temporal and dynamic nature of respiratory symptoms, including their relationship to other variables such as activity, stress, environmental conditions and rescue medication use. In clinical trials, diaries can be used to evaluate time to symptomatic improvement and/or magnitude and persistence of change over the treatment period. Regulatory agencies have expressed interest in electronic daily symptom assessments in pharmaceutical trials, with their reduced recall bias and technology enabled compliance monitoring.12 ,18 One diary, the Breathlessness, Cough and Sputum Scale, has shown evidence of reliability, validity and responsiveness.19 ,20 However, it is limited to three items, and its development was not based on qualitative data from the target population, raising concerns about content validity.10–12

The Exacerbations of Chronic Pulmonary Disease Tool (EXACT)-Respiratory Symptoms (RS) (E-RS) was designed to meet the need for a standardised respiratory symptom diary with a development history consistent with good research practices and FDA PRO requirements. The E-RS uses 11 respiratory symptom items from the 14-item EXACT,21–24 offering efficiencies for investigators and subjects by permitting two validated uses for one diary: (1) assessment of COPD exacerbations using the EXACT total score21–24; (2) quantification of respiratory symptoms in stable COPD using RS-Total and subscale scores.

This paper describes the methods and results of research to assess content validity, reliability and validity of the E-RS in patients with stable COPD.

Materials and methods

The research was conducted in two phases. Phase I addressed content validity through qualitative research methods. Phase II tested score reliability and validity using an existing dataset.23 All data were gathered in accordance with the amended Declaration of Helsinki, with study protocols approved by an independent institutional review board (IRB) and all participants providing written informed consent prior to data collection (Essex IRB, ID# A2-3864, A2-3864B; Ethical Review Committee, ID# 472-05-09).

In each phase, study participants were recruited from pulmonary and primary care clinics across the USA using inclusion criteria similar to those used in pharmaceutical trials: >40 years of age; diagnosis of stable COPD; emphysema or chronic bronchitis; and ≥10 pack-year smoking history. Exclusion criteria: medical diagnosis of asthma without postbronchodilator airway obstruction; acute congestive heart failure or unstable angina, bronchiectasis, lung cancer, or tuberculosis; or treatment for respiratory infection or pneumonia within the past 60 days. Phase-specific inclusion/exclusion criteria and methods are described below. Following enrolment and consent, each site provided clinical information related to the participant's diagnosis, pulmonary function and clinician rating of disease severity.

Phase I: qualitative: content validity

A two-step process was used to assess and document the extent to which the items comprising the E-RS adequately and accurately reflect respiratory symptoms of COPD in a stable state. Methods are outlined below, with details provided in the online supplementary appendix.

Stage 1: secondary analyses of existing qualitative data

Qualitative analyses were performed on data gathered during the development of the EXACT,21 that is, data from focus groups and interviews with patients with COPD and a history of a clinic visit or hospitalisation for exacerbation in the previous 6 months (n=63). The purpose of the original study was to characterise COPD exacerbations from the patient's perspective. The data included participant descriptions of the nature and severity of symptoms during a stable state presented as part of their characterisations of COPD and/or to facilitate descriptions of exacerbations, that is, relative to their stable state.

Stage 2: new focus groups

Additional focus groups were conducted in a new sample of clinically stable patients (n=21), that is, exacerbation-free for 12 months. The purpose of Stage 2 was to determine if there were new insights or information related to respiratory symptoms in stable COPD not discussed by patients with an exacerbation history and to ensure that saturation had been reached.

Each focus group was led by an experienced study team member using a semistructured discussion guide to elicit information on patient perspectives of respiratory symptoms. To characterise the sample, participants completed the mMRC Dyspnea Scale,25 the SGRQ for COPD (SGRQ-C),26 and a sociodemographic questionnaire.


Descriptive statistics were used to characterise the sample. Atlas.ti 5.0 facilitated thematic analyses of the qualitative data. For each stage of analysis, symptomatic themes were summarised in a saturation grid and mapped to the respiratory symptom items in the EXACT to determine the extent to which this subset of items, named the E-RS, would adequately capture respiratory symptoms in stable disease. Patient understanding of instructions, items, and response categories was addressed using cognitive interviewing methodology during EXACT development.21

Phase II: quantitative: score reliability and validity

Secondary analyses were performed on a subset of data from a previously published prospective observational study used to develop the EXACT.23 Specifically, data from the stable control sample (n=188) were used. These patients had no history of treatment for exacerbation in the preceding 60 days and were considered clinically stable on enrolment. Subjects were recruited through clinical sites in over 20 states across the USA.


All participants completed the SGRQ-C and sociodemographic forms during the enrolment clinic visit; site staff provided clinician ratings of patient dyspnoea (mMRC) and disease severity, and results of the most recent stable-state spirometry. Each subject completed an eDiary for the ensuing 7 days that included EXACT candidate items, rescue medication use and global ratings of change.23

Statistical analyses

Analyses were prespecified in a statistical analysis plan completed following Phase I. Because the intent was to evaluate the performance properties of the E-RS as a daily diary, analyses were conducted with data from Day 1, the same day clinic data were gathered, unless otherwise specified. Item-level analyses included measures of central tendency, floor and ceiling effects, item-total correlations, item response frequencies and interitem correlations. Confirmatory (CFA), exploratory (EFA), and second-order factor analyses were performed to evaluate the structure of the measure and develop the scoring algorithms.

Internal consistency was assessed using Cronbach's α, with a target value of greater than 0.70, to make the instrument suitable for use in clinical trials.27 Two-day test-retest reliability was evaluated for consecutive days (Days 1–2; 2–3; 3–4, etc.), with data from patients reporting no change in lung condition on the daily global assessment question, and Days 1 and 7 in all subjects, with the latter assuming symptomatic stability across these two observations (no confirmatory global assessment). Intraclass correlation coefficients (ICC), paired t tests and effect sizes were computed.

Validity was examined by correlating E-RS scores with SGRQ total and domain scores, rescue medication use, clinician rating of dyspnoea on the mMRC, and forced expiratory volume in 1 s (FEV1) % predicted. Known-groups validity was tested using Student t test, comparing E-RS scores of those with and without a medical diagnosis of chronic bronchitis (clinician-reported), current and former smokers (self-report), and those using no rescue medication versus three or more puffs (Day 1; self-report). Scores were also compared across clinician-rated disease severity, hypothesising weak relationships given the multidimensional nature of the clinician's assessment. To take this into account, analyses were performed with and without controlling for age, comorbidity status (≤1 or ≥2) and FEV1.

Statistical analyses were performed using SAS statistical software V.9.1 (Cary, North Carolina, USA).



Sample demographic and clinical characteristics by phase and stage are shown in table 1.

Table 1

Sample demographic and clinical characteristics by phase

Phase I: content validity

Qualitative analyses identified three categories of respiratory symptoms patients experience when stable: breathlessness, cough and sputum, and chest symptoms. Representative quotations for each symptom category and the interrelationship/co-occurrence of symptoms and the saturation grid are provided in online supplementary appendix tables S1 and S2.

Participants spoke of being ‘breathless’, ‘short of breath’, and having difficulty breathing, with severity levels that varied day to day. Cough and sputum were generally discussed together; most patients were unable to make it through the day without coughing. Sputum was characterised in terms of quantity and thickness. Most participants cited difficulty coughing up sputum or phlegm. Chest symptoms included congestion, tightness and discomfort, which also varied day to day. Participants often presented their respiratory symptom descriptions as inter-related experiences, describing how they related to and affected one another.

Content of the final instrument, in the context of the 14-item EXACT, is shown in online supplementary appendix table S3.

Phase II: reliability and validity

Item and factor analyses and scoring algorithm

Participants used the full range of response options, with no missing data and minimal floor and ceiling effects. The CFA confirmatory fit index (CFI) was 0.75, less than the 0.95 prespecified as a good fit for unidimensionality, thereby precluding the use of Rasch analysis for the total score. EFA showed a three-factor solution (table 2), indicating that three respiratory symptom subscales comprise the measure. The second-order factor model fit the data very well (CFI=0.96), with standardised coefficients between the items and respiratory symptom factors, and between the respiratory symptom factors and the general factor, all greater than 0.60 (range 0.68 to 0.85 and 0.65 to 0.95, respectively) (see figure 1). The 0.94 correlation between general factor and RS-Total scores provided further support for an empirical general factor governing the three E-RS factors.

Table 2

Exploratory factor analysis: promax factor loading*

Figure 1

Higher-Order Factor Model for the E-RS. Confirmatory Fit Index (CFI)=0.958. Root Mean Square Error of Approximation (RMSEA)=0.073 (90% CI 0.050 to 0.096). Standardised Room Mean Square Residual (SRMR)=0.043. The E-RS is a derivative instrument, using the 11 respiratory symptom items from the 14-item EXACT. SOB, shortness of breath.

The scoring algorithms for the E-RS yield a total score and three subscale scores, with higher scores on these ordinal-level scales indicating more severe symptoms. Item-level scores range from 4 to 5 points (0 to 3 or 0 to 4), which are summed to yield total and subscale scores. E-RS scores are calculated for each day the diary is completed, and may be aggregated or summarised in a manner consistent with the study purpose and design.

Descriptive statistics, floor and ceiling effects, total and subscale intercorrelations, and sample patient-level figures for RS-Total scores over 7 days are provided in online supplementary appendix tables S4a and b and appendix figure S1.


Reliability parameters are presented in table 3; reproducibility estimates for Days 1 to 2 are shown, with the remaining consecutive-day analyses appearing in online supplementary appendix table S5.

Table 3

E-RS reliability: internal consistency and test-retest reproducibility


Correlations between E-RS scores and alternative assessments of respiratory symptom severity (concurrent validity), related constructs (convergent validity), and weakly related constructs (divergent validity) are provided in table 4.

Table 4

Validity: correlation with related constructs and alternative measures

RS-Total and subscale scores (RS-Breathlessness, RS-Cough & Sputum, and RS-Chest Symptoms) differentiated those with and without a diagnosis of chronic bronchitis, current and ex-smokers, and rescue medication free versus ≥ three puffs (table 5). The univariate models of clinician-rated disease severity were not significant, but the multivariate models for RS-Total (p < 0.05), RS-Breathlessness (p < 0.05), and RS-Cough & Sputum scores (p < 0.01) were significant (see online supplementary appendix table S6).

Table 5

Known-groups validity: E-RS Mean (SD) scores by chronic bronchitis diagnosis, smoking status, and rescue medication use


Although respiratory symptoms play a key role in the diagnosis, assessment and management of patients with COPD and symptom relief is an important target of therapy, there is no standardised, reliable and valid daily diary for evaluating this outcome in natural history studies and clinical trials. This paper presents the first evidence of the validity and reliability of the E-RS to meet this need.

Phase I of this work addressed content validity.10 ,28 Participant descriptions of their breathlessness, cough and sputum were consistent with the literature3 ,5 ,8 ,9 ,29 and the content and structure of the E-RS. Of particular note were descriptions of chest symptoms (chest congestion, discomfort and tightness), a symptom set not measured with existing questionnaires.14–17 ,19

Participant descriptions of symptom variability suggest an unstable component to ‘stable’ COPD, consistent with findings reported by Kessler et al29 in severe patients. Of the symptomatic patients participating in this pan-European observational study (70% of the 2441), most (63%) experienced symptom variability, with over half indicating variance throughout the week (54%) or across seasons (60%). The most variable symptoms were breathlessness and chest tightness; variability in breathlessness was associated with a history of two or more exacerbations the prior year and greater adverse impact on daily activity.29 These results suggest respiratory symptoms in COPD may not be as stable as previously believed,30 and that further research on day-to-day variability is needed.

It is important to note that the E-RS is administered in the evening prior to bedtime with respondents rating their symptoms as they reflect back on the day. This method is efficient and less burdensome than twice-daily assessments. However, this approach may be less precise for those interested in characterising and tracking nighttime or morning symptoms, specifically.29 ,31 Studies could be performed to evaluate the added precision of administering the E-RS twice-daily, with the corresponding adjustment in recall period, or using a separate morning diary for this purpose.

Participants described breathlessness, cough, sputum and chest congestion as co-occurring and interacting, suggesting a respiratory symptom complex in COPD that can be captured through a total score, representing the overall severity of this symptom complex, and subscale scores capturing the three types of respiratory symptoms. This was supported quantitatively in the second-order factor structure and strong interscale correlations, internal consistency reliability levels and validity metrics. This measurement structure permits step-down hypothesis testing, with respiratory symptom severity overall tested first, followed by tests for breathlessness, cough and sputum, and chest symptoms.

Internal consistency estimates for RS-Total, RS-Breathlessness, and RS-Chest Symptom scores exceeded the conservative 0.80 standard, indicating a high degree of precision with low measurement error. Score reproducibility over two consecutive days in patients reporting no change was very high. The lower estimates over a six-day interval (Day 1 and Day 7) are consistent with the known symptom variability day to day in stable patients. This finding is pertinent given regulatory authority interest in the use of daily symptom assessments12 and the need for further study of symptom temporal-severity dynamics.

The magnitude and pattern of correlations between health status (SGRQ), dyspnoea (mMRC), rescue medication use, and FEV1% and E-RS subscales were consistent with score validity. The chronic bronchitic phenotype and current smokers were more symptomatic,32–34 while subjects who reported no rescue medication use on Day 1 also reported significantly less severe symptoms, with the strongest effect observed in the RS-Breathlessness scale. Although one might speculate that the relationship between symptom severity and the clinician's global assessment of COPD severity would be stronger, the modest relationship had improved power when confounding factors were controlled. This supports the differential and complementary roles played by direct symptom assessment from the patient and the clinician's integrated assessment of the patient's COPD, with the latter including a clinical appraisal of symptoms, spirometry, physical assessment, exacerbation history, treatment history, comorbidity and general health, among other factors.

Although the dataset did not permit an evaluation of sensitivity to change, results of known-groups analyses offer preliminary insight into score interpretation. For example, the difference in RS-Total scores between symptomatic and less symptomatic patients (table 5) was approximately 4 on the 40-point scale. Effect sizes were very large (>0.60), indicating that 4 points may be substantially greater than a ‘minimally clinically important difference’ (MCID). Applying a commonly used distribution-based method for estimating the MCID (0.5 SD of the sample mean; online supplementary appendix table S4a), estimates would be: RS-Total: 3.35; RS-Breathlessness: 1.85; RS-Cough & Sputum: 1.15; RS-Chest Symptoms: 1.05. Until anchor-based estimates across multiple samples are available, these values should be considered preliminary since they are probably higher than the true MCID.

Although respiratory symptoms may be assessed periodically as part of existing COPD-specific health status questionnaires, there are several potential advantages of daily assessments. First, daily scores reduce recall bias and provide a prospective, daily accounting of symptom severity. Second, this approach yields information on day-to-day variability. Third, daily data offer analytical flexibility, with methods for evaluation over time determined by the study purpose. Advantages of the E-RS specifically include its embedded position within the EXACT, providing data on exacerbations and respiratory symptoms simultaneously, with no additional patient burden. The EXACT was developed as an eDiary, with short, easy-to-read questions and recommendations for formatting and reminders to enhance compliance. Compliance rates were high (94%) in this short-term validation study, and have exceeded 88% across several 3–6 month clinical trials,24 suggesting eDiaries are feasible in this patient population. Widespread smart phone access may facilitate future testing and use of eDiaries in natural history studies or clinical practice.35

An important question is the suitability of the E-RS for international studies. During the development of the parent instrument, international content and translation experts served on the advisory panel. To date, the EXACT has been translated into more than 50 languages with cognitive interviews conducted with COPD patients in the target countries to assure cultural and linguistic equivalence. Although the EXACT has performed well in international trials,24 an evaluation of E-RS score reliability, validity and responsiveness in international settings is warranted.


The E-RS was designed to assess daily respiratory symptoms in clinical studies of COPD. Results suggest the instrument is content valid with quantitative evidence of score reliability and validity. Further research on the performance properties of E-RS scores in new samples and its sensitivity to change, including MCID estimation and development of responder definitions, are warranted.


The authors wish to thank Jennifer Petrillo and Kellee Howard for their assistance with data collected for Phase I, Stage 1 analyses; Laurie Roberts for assistance with the Phase II data collection; Ren Yu and Ray Hsieh for their SAS programming; and Wen-Hung Chen for statistical and analytic support.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors NKL, takes full responsibility for the content of this manuscript, including the data analysis. Each author contributed substantially to the research described in this paper, including contributions to the concept and design, analyses, and interpretation of the data. NKL, CCS, and SMN also contributed to acquisition of data, drafting the article, and revising it in response to research team comments. All authors reviewed and approved this version of the paper prior to submission.

  • Funding Funding for this work was provided by unrestricted funds from AstraZeneca, Boehringer Ingelheim International GmbH, and Merck & Company, Inc.

  • Competing interests NK Leidy, CC Sexton, and LT Murray are employed and SM Notte was employed by Evidera (formerly United BioSource Corporation), which provides consulting and other research services to pharmaceutical, device, government and non-government organisations. As Evidera employees, they work with a variety of companies and organisations and are expressly prohibited from receiving any payment or honoraria directly from these organisations for services rendered. BU Monz was an employee of Boehringer Ingelheim at the time this research was conducted; M Goldman is an employee of AstraZeneca; and L Nelsen is an employee of Merck & Company. BI, AstraZeneca, and Merck develop and market respiratory products. PW Jones and S Sethi consult with various companies on topics related to COPD and its treatment.

  • Ethics approval Essex IRB, Inc., ID# A2-3864, A2-3864B; Ethical Review Committee, Inc., ID# 472-05-09.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Readers are asked to contact the first author to discuss access to data used in this study.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.