Rationale: Dyspnoea is a debilitating and distressing symptom that is reflected in different verbal descriptors. Evidence suggests that dyspnoea, like pain perception, consists of sensory quality and affective components. The objective of this study was to develop an instrument that measures overall dyspnoea severity using descriptors that reflect its different aspects.
Methods: 81 dyspnoea descriptors were administered to 123 patients with chronic obstructive pulmonary disease (COPD), 129 with interstitial lung disease and 106 with chronic heart failure. These were reduced to 34 items using hierarchical methods. Rasch analysis informed decisions regarding further item removal and fit to the unidimensional model. Principal component analysis (PCA) explored the underlying structure of the final item set. Validity and reliability of the new instrument were further assessed in a separate group of 53 patients with COPD.
Results: After removal of items with hierarchical methods (n = 47) and items that failed to fit the Rasch model (n = 22), 12 were retained. The “Dyspnoea-12” had good internal reliability (Cronbach’s alpha = 0.9) and fit to the Rasch model (χ2 p = 0.08). Items patterned into two groups called “physical”(n = 7) and “affective”(n = 5). In the separate validation study, Dyspnoea-12 correlated with the Hospital Anxiety and Depression Scale (anxiety r = 0.51; depression r = 0.44, p<0.001, respectively), 6-minute walk distance (r = −0.38, p<0.01) and MRC (Medical Research Council) grade (r = 0.48, p<0.01), and had good stability over time (intraclass correlation coefficient = 0.9, p<0.001).
Conclusion: Dyspnoea-12 fulfills modern psychometric requirements for measurement. It provides a global score of breathlessness severity that incorporates both “physical” and “affective” aspects, and can measure dyspnoea in a variety of diseases.
Statistics from Altmetric.com
Dyspnoea is a perceptual experience that is complex and highly subjective. It is a common and distressing symptom in cardiorespiratory disease. Progress has been made in identifying various aspects of dyspnoea that arise from multiple factors including environmental, psychological and physiological.1 Other evidence indicates that dyspnoea, like pain perception, consists of “sensory quality” and “affective” components,2 3 yet no currently available dyspnoea instrument encompasses these.
Dyspnoea has been measured in two ways: (1) directly, using modified Borg or visual analogue scales, in persons experiencing dyspnoea at rest or in response to various stimuli4 5 6; single item scales may not be adequate enough to capture dyspnoea complexity; and (2) indirectly, by asking the respondent to report the level of physical activity they are not able to accomplish because of dyspnoea,7 8 or within scales assessing the impact of disease on quality of life.9 10 Whilst indirect methods provide useful information, they do not quantify dyspnoea; they measure its effects on activity. This also means that they cannot measure different aspects of dyspnoea.
Patients with cardiorespiratory disease use a variety of terms to describe the experience of being breathless, and it has been proposed that dyspnoea descriptors may provide a direct route for its quantification.11 12 Studies have explored the semantics of dyspnoea, principally from a diagnostic perspective, or to understand mechanisms.13 14 15 16 That work utilised descriptor lists that focused primarily on sensory quality, but evidence suggests that sensory quality descriptors are not sufficiently robust to aid differential diagnosis, since they reflect a variety of sensations which are shared by a range of conditions.13 16 17 Whilst they may have limited diagnostic utility, verbal descriptors have been found to be related to the severity level of dyspnoea in patients with cardiorespiratory disease, providing a mechanism for its assessment.18 19
The affective aspect of dyspnoea evokes distress and motivates behaviour, often overlooked in clinical and laboratory settings. This may be, in part, because no measure of this dimension currently exists. Evidence for a range of descriptors representative of affect has evolved from studies in which patients were asked to describe the experience of being breathlessness in their own words.20 21 22 23 Affect-laden descriptors were relatively consistent between studies, yet it is not known which of these may possess reliable measurement properties.
Taken together, this work has created a body of language that reflects multiple aspects of dyspnoea. A core list of dyspnoea descriptors that patients consider most relevant has not previously been defined. Our objective was to develop a concise and valid questionnaire of overall dyspnoea severity using descriptors that were relevant across different cardiorespiratory diseases. Our underlying hypothesis was that the words patients use can be applied to form a reliable scale that reflects dyspnoea severity. This paper presents two studies: the development of a new questionnaire for the quantification of dyspnoea and its initial validation.
Materials and methods
Development of the initial item pool
A pool of 81 items was generated from published literature reporting the language to describe breathlessness. Papers were identified by a systematic search of Medline, CINAHL and PsychInfo (up to October 2005) using the terms: dyspnoea/dyspnea or breathless/breathlessness, and language/descriptor/questionnaire/qualitative. One patient with chronic obstructive pulmonary disease (COPD) and one with interstitial lung disease (ILD) assisted the removal of duplicate items and advised on the structure of each phrase and severity response scale used. The 81 items, each with response options of “none”, “mild”, “moderate” or “severe”, were arranged as a questionnaire list that asked patients to respond to each item that “best matched” their current experience of being breathless. Two studies were performed; both were approved by the local ethics research committees and written consent was obtained from all participants.
Study one: item reduction and preliminary testing
Study participants and measures
Breathless patients (n = 358) with a primary diagnosis of COPD, ILD or chronic heart failure (CHF), and able to read English, were recruited through outpatient clinics from three hospitals in England. Each participant completed one of three randomly allocated 81-item descriptor lists, the Medical Research Council (MRC) dyspnoea scale,8 and two modified Borg scales6 that assessed “current level of breathlessness” and “current level of distress caused by breathlessness”.
Item reduction using hierarchical methods24: to remove the least discriminative descriptors, items were excluded if ⩾50% of patients gave a response of “none”. Items influenced by age at p<0.05 (Pearson correlation r) were removed.
Item reduction using Rasch analysis: items that survived hierarchical reduction were analysed with Rasch (RUMM2020 software).25 Rasch models provide a template for testing how well each item contributes to the concept being measured (breathlessness severity).26 This is an iterative process whereby the poorest fitting item is removed and the effect on the fit of the remaining items is then retested. This process is continued until each item demonstrates a good fit to the model and the stability of the overall item set also to meet the requirement for it to be a reliable unidimensional measure. In a Rasch model, severity associated with any given item is measured in “logits”—which is the log odds of a patient of a given level of breathlessness severity, as assessed by their response to all the items combined, having a 50% chance of responding positively to that item.27
Individual item fit was assessed using a χ2 statistic to compare the difference between the observed responses and those expected by the model.28 Analysis of variance (ANOVA) was used to test each item for the influence of gender or diagnosis on the patients’ response to that item, an effect known as differential item functioning (DIF). Further details regarding individual item tests of fit and DIF are provided in the online supplement (see supplementary figs 1–4).
The presence of any item–trait interaction was tested using a χ2 test to assess whether all items perform consistently, regardless of overall breathlessness severity (determined by p>0.05). Internal reliability was assessed using Cronbach’s reliability (α).29
Exploratory principal component analysis (PCA): PCA using varimax rotation was used to assess the underlying structure of the final item set. The number of components extracted was based on eigenvalues, and allocation of an item to a component was determined by a factor loading which by convention is set at >0.5.29
Preliminary validity testing of the final 12-item set: the final item set consists of 12 items. The score is calculated using simple addition of the responses for each item. It ranges from 0 to 36, where 0 represents no breathlessness and 36 represents maximal severity.
Pearson (r) correlations examined relationships between the 12-item set total score, MRC dyspnoea grade, Borg-intensity and Borg-distress scales. The effect of diagnosis on differences in the 12-item set score across MRC dyspnoea grades was tested using ANOVA. The average Dyspnoea-12 score and Borg-intensity and Borg-distress scores were computed for each category of MRC dyspnoea grade for the entire group (n = 358).
Study two: Dyspnoea-12 validation
The final version is called Dyspnoea-12 (Appendix 1). The Dyspnoea-12 asks patients to respond to each item in relation to their breathlessness “these days” and is designed to capture the patient’s general perception about their current state, rather than a record specific to the day of testing. This time reference frame was used because it has been found to be reliable in other instruments such as the St George’s Respiratory Questionnaire.9
Study participants and measures
In the separate study, we tested Dyspnoea-12 reliability and validity with a further group of 53 patients with COPD. During a routine clinic visit participants completed the Dyspnoea-12, MRC dyspnoea scale,8 Hospital Anxiety and Depression Scale (HADS),30 lung function tests and 6-minute walk distance (6MWD). All patients repeated the Dyspnoea-12 after a median of 16 days (range 10–20). Dyspnoea-12 internal consistency was assessed using Cronbach α, and test–retest reliability was tested using the intraclass correlation coefficient (ICCC). Correlations using Pearson (r) were examined between Dyspnoea-12, MRC grade, HADS, forced expiratory volume in 1 s (FEV1) and 6MWD.
Face validity was explored using qualitative methods (6 patients with COPD, 2 with ILD and 4 with CHF). These patients also rated the Dyspnoea-12 for ease of completion (0 = not easy to 10 = extremely easy); helpful in expressing their experience (0 = not helpful to 10 = extremely helpful); and ease of completion (0 = not easy to 10 = extremely easy).
A total of 358 patients completed the questionnaires; most (n = 275, 77%) completed these during a clinic visit, the remainder for return within 2 weeks. The majority were Caucasian (n = 337, 94%). Their baseline characteristics are provided in table 1.
Forty-seven items were initially removed because ⩾50% of participants rated them as “none”; one item (“breathing more”) was associated with age (p = 0.02) (see supplementary table 1 online).
Rasch analysis was applied to the remaining 34 items. Twenty-two items were removed due to lack of fit to the Rasch model (see supplementary table 1 online). Twelve items survived the item reduction process, and make up the final item set. The 12-item set had very good internal reliability (α = 0.9). The item–trait χ2 probability was p = 0.08, supporting the conclusion that the items formed a unidimensional measure of overall breathlessness severity.
The DIF tests for two items (“distressing” and “difficulty catching breath”) suggested that they behaved differently between diseases (p<0.002); however, the effect on the overall score for dyspnoea severity was negligible. Overall severity of the patients with these items included was −0.627 (1.4) mean (SD) logits and with the items removed it was −0.625 (1.4) logits, so they were retained.
Exploratory PCA of the 12 items identified one component with an eigenvalue >1, explaining 58% of the variance (see supplementary table 2 online), and supports the conclusion from the Rasch item–trait test that the items form a unidimensional model. The second component had an eigenvalue >0.9 and explained 8% of the variance, so a two-component structure was forced to explore the patterning of items further. The first component called “physical” contained seven items and the second, called “affect”, contained five items (table 2). One item “irritating” loaded >0.5 on both components. The apparent paradox of a unidimensional structure with perhaps two components is explained by the severity associated with the groups of items. The “physical” items were associated with a lower overall level of dyspnoea severity (mean = −0.17 logits) whereas the “affective” items were associated with more severe breathlessness (mean = 0.24 logits) (fig 1).
For the entire sample (n = 358) the 12-item set total score correlated significantly with Borg-intensity (r = 0.47, p<0.001), Borg-distress (r = 0.59, p<0.001) and MRC grade (r = 0.39, p<0.001). The relationship between the 12 items and MRC grade was not dependent on diagnosis (F = 1.3, p>0.05 for the interaction). The average 12 items total score, Borg-distress ratings and Borg-intensity ratings increased progressively with similar gradients of between 8% and 10% of full scale per increment in MRC grade (fig. 2).
Fifty-three patients with COPD (32 male) participated in the separate validation study. The groups mean age was 69 (16); FEV1 (litres/min) 1.43 (0.7); FEV1/FVC % predicted 55 (16); and 6MWD (metres) 181 (175). The mean Dyspnoea-12 score for the sample was 18 (8); MRC grade 2.6 (1.3); HADS anxiety 8.7 (1.3); and HADS depression 7.7 (4). Dyspnoea-12 demonstrated good internal reliability (α = 0.9) and good test–retest reliability (ICCC = 0.90, p<0.001). Mean Dyspnoea-12 score was strongly associated with HADS scores (anxiety r = 0.51 and depression r = 0.44, p<0.001). Dyspnoea-12 correlated significantly with FEV1 (r = −0.30, p = 0.03), 6MWD (r = −0.38, p<0.01) and MRC grade (r = 0.48, p<0.001).
Interview participants rated the Dyspnoea-12 as easy to complete (median = 9; range = 5–10), easy to understand (median = 9.5; range = 5–10) and helpful (median = 9; range = 5–10). One man with COPD commented that the questionnaire was “superb” and enabled him to say “right this is how I feel”.
We have developed and conducted initial validity testing of the Dyspnoea-12. It was derived from the largest pool of breathlessness descriptors that has been assembled. This is a unique instrument since it quantifies breathlessness using descriptions by patients of its qualities and its affective sequelae. We have demonstrated the concurrent validity of Dyspnoea-12 with other relevant measures. Uniquely, this instrument was developed in three disease populations. Its relationship to MRC grade was independent of diagnosis, enabling it to make direct comparisons of dyspnoea severity between patients with different diseases. Patients found the Dyspnoea-12 was easy to complete and understand, and helpful in expressing their experience of being breathless.
Unlike other self-report breathlessness instruments, Dyspnoea-12 does not depend on a reference level of activity or any specific type of activity. The reference frame “these days” is designed not to be situation specific but to be temporally specific—in terms of how patients currently experience breathlessness in their daily lives, as opposed to specifically on the day of the test or in response to a specific activity. When using this reference, patients’ responses may be determined in part by the nature and intensity of their day-to-day activities, but this possibility requires testing.
In developing this instrument, we used advanced methodology to ensure reliable measurement properties, and it represents the first application of Rasch methodology to the measurement of breathlessness. Our use of Rasch modelling facilitated the development of a questionnaire that provides measurement of dyspnoea using a parsimonious collection of items that form a unidimensional measure. Rasch analysis determined the minimum level of items required to represent the underlying construct being measured (ie, breathlessness severity). There are no rules that determine the number of items in a questionnaire; the optimum number is determined by achieving a balance between economy, precision and reliability. This is different for every instrument. Rasch methodology indicates when internal reliability begins to deteriorate with further item removal, and we found that 12 items provided the best compromise.
The 12 items contribute reliably to the measurement of overall dyspnoea severity, even though some address the affective consequences of breathlessness (eg, “makes me feel miserable”) and some measure physical aspects (eg, “more work”). “Physical” items tended to be associated with milder severity, whereas the “affective” items were associated with more severe breathlessness. Differences in breathlessness severity associated with physical and affective items explain the patterning of items observed with PCA. These findings demonstrate the benefit of combining both physical and affective aspects in a dyspnoea scale if it is to cover a broad range of overall breathlessness severities.
We are not the first to hypothesise a multiple component dyspnoea model, since it has been suggested that a multidimensional instrument used to measure pain could be adapted to measure dyspnoea.31 Our findings are compatible with this hypothesis, since we found that a range of different descriptors from the physical to affective components of dyspnoea could be combined to form a single overall unidimensional scale. Our approach created a pragmatic, usable scale using rigorous methodology. Our primary objective was to develop an overall measure of breathlessness severity. To do this, we required all items to fit a unidimensional model. Thus, we only included “physical” and “affective” items that reflected overall severity in a similar way. This process may have excluded “physical” or “affective” items that may have behaved differently, with different measurement properties. The “physical” and “affective” components may be referred to as “components” in questionnaire parlance, but should be used only for exploratory analyses.
Patient-reported outcomes such as the Dyspnoea-12 should be derived from the words used by patients to describe disease effects and their clinical state. We are indebted to workers who collected those descriptors in previous studies. We developed our initial 81-item pool after a systematic review of existing literature and were struck by the similarity of items identified by different research groups. Moreover, during interview sessions patients were invited to volunteer any other terms; no new terms were provided. We are confident that the Dyspnoea-12 items are relevant, appropriate and truly capture patients’ perceptions.
The terms “work” and “effort” are often used interchangeably, in part because they tended to form into one cluster.13 14 15 We found that the term “more work” had more reliable patient responses than “hard work” and “more effort”. Likewise, the term “panic” is often associated with the perception of breathlessness,21 22 32 yet in this large patient cohort responses to this item were erratic, so it was removed. Two of the final 12 items did demonstrate DIF associated with diagnosis. We chose to retain these because the effect of this bias on the overall score was negligible. Furthermore, the descriptors “distressing” and “difficulty catching breath” have not previously been noted as being specific to any particular disease, so this may have been a chance finding.
Whilst we have demonstrated concurrent validity and repeatability of the Dyspnoea-12 in a different sample from that used in its development, a larger prospective study is needed to confirm these findings. The Dyspnoea-12 correlated well with Borg-distress and HADS scores. The direction of causality in these relationships is yet to be determined, but one hypothesis is that a person’s level of psychological distress has an impact on their perceived breathlessness severity,32 although distress due to dyspnoea could also increase anxiety. The Dyspnoea-12 provides a method of breathlessness measurement that may allow this relationship to be explored further. It should have utility in clinical trials of new treatments targeted at the emotional impact of breathlessness, since it incorporates items related to the affective aspect of breathlessness severity. Dyspnoea-12 is currently validated for the English-speaking population; translations into other languages will require careful translation, back-translation, and cultural and linguistic validation to ensure that it performs in the same way.
In summary, the Dyspnoea-12 forms a unidimensional measure that reflects both the physical and affective aspects of dyspnoea. It addresses the need for a comprehensive dyspnoea instrument and is based on the language used by patients to describe the experience. It can measure breathlessness across several disease groups, is simple and quick to use and should find utility in routine clinical monitoring, clinical research and trials of interventions designed to ameliorate the impact of breathlessness.
The authors thank all patients and staff from Royal Brompton and Harefield NHS Foundation Trust, St George’s Hospital and North Manchester General Hospital who gave their time for this project. We are indebted to Ashi Firouzi, Professor Martin Cowie, Professor Athol Wells, Michelle Rajab, Ross Ellis, Dr Margret Lau-Walker, Dr David Weir, Dr Georges Ng Man Kwong, Janet Mills and Sue Mason for their support with patient recruitment and administration of this project. We are also grateful to Professor Peter Barnes for his support and advice when designing this study.
Appendix: Dyspnoea-12 questionnaire
This questionnaire is designed to help us learn more about how your breathing is troubling you.
Please read each item and then tick in the box that best matches your breathing these days. If you do not experience an item tick the “none” box. Please respond to all items.
We do not recommend use of the instrument with data from more than three items missing. The method for scoring the Dyspnoea-12 with up to three missing items is detailed in the table below.
Funding The questionnaire development study was funded by the Clinical Research Committee Royal Brompton and Harefield NHS Foundation Trust. The Dyspnoea-12 validation study was funded by Action Medical Research.
Competing interests None.
Ethics approval The two studies performed were approved by the local ethics research committees.
Provenance and Peer review Not commissioned; externally peer reviewed.