Rationale The St George's Respiratory Questionnaire (SGRQ) is often applied to assess health-related quality of life in patients with idiopathic pulmonary fibrosis (IPF). Some SGRQ items will inevitably have weaker measurement properties than others when applied to this population. This study was conducted to develop an IPF-specific version of the SGRQ.
Methods Data from a recently completed trial that enrolled subjects with IPF (n=158) who completed the SGRQ and other measures were analysed at baseline and 6 months. There were four phases to the study: (1) removing items with missing responses and using Rasch analysis on retained items to identify fit and refine item response categories; (2) development of a new scoring scheme; (3) testing agreement between original and revised versions and testing construct validity of the revised SGRQ; and (4) rewording to finalise the IPF-specific version (SGRQ-I).
Results Items were removed due to missing responses (6 items) and misfit to the Rasch model (10 items); 34 items from the original 50 were retained. For certain items, disordered response thresholds were identified and corrected by collapsing response categories. A scoring algorithm was developed to place SGRQ-I scores on a scale with SGRQ scores. For any given outcome measure (eg, forced vital capacity (% predicted) and lung carbon monoxide transfer factor (% predicted), 6-min walk distance and patient-reported questionnaires), Pearson correlations were similar between pairs that included original SGRQ scores and corresponding pairs that included SGRQ-I scores. Internal reliability (Cronbach α) for each SGRQ-I component was comparable to the original SGRQ (Symptoms 0.62; Activities 0.80; Impacts 0.85).
Conclusions The SGRQ-I contains items from the original SGRQ that are the most reliable for measuring health-related quality of life in patients with IPF.
- Interstitial fibrosis
Statistics from Altmetric.com
Idiopathic pulmonary fibrosis (IPF) is a progressive fibrotic interstitial lung disease (ILD) that induces shortness of breath,1 resulting in poor quality of life2 3 and significantly shortened survival for most patients.4 Available treatments have no reliable effect on prolonging life. In patients with IPF, health-related quality of life (HRQL) is increasingly viewed as an important outcome used to assess the effectiveness of treatments and to monitor disease trajectory. HRQL refers specifically to a person's satisfaction with life domains that either affect or are affected by his or her health status.5 A plethora of generic and condition-specific instruments exist to measure HRQL. Condition-specific instruments are tailored to patients with the disease of interest, a quality that makes them more sensitive to underlying change than generic instruments.5 For IPF there is currently no disease-specific measure of HRQL available so investigators have used generic or non-IPF respiratory-specific instruments to measure HRQL in multicentre trials.6 7 The potential problem is that these instruments may not capture many of the effects of IPF on patients' lives,8 thus calling into question whether HRQL results from these trials are valid.
The St George's Respiratory Questionanire (SGRQ) was originally designed and validated for use in patients with chronic obstructive pulmonary disease (COPD)9 and has been in existence for nearly two decades. It has been validated for use in other chronic respiratory diseases and it also appears to possess acceptable validity and reliability for use in patients with IPF.10–12 However, it is inevitable that some SGRQ items have weaker measurement properties than others when applied to patient populations other than the one for which it was developed. Removing such items and modifying response options for retained items would generate a version of the SGRQ that is more appropriate for patients with IPF. This study was conducted to refine the content of the SGRQ to develop an IPF-specific version. It used a methodology that was developed and proven for reducing the item content of the orginal SGRQ, creating a version specific for chronic obstructive pulmonary disease (COPD) (the SGRQ-C).13
Overview and study sample
For this study we used data from a recently completed multicentre placebo-controlled trial (the Bosentan Use in Interstitial Lung Disease-1; BUILD-1).6 In BUILD-1, subjects were randomised to receive either bosentan or placebo; for the current study, data from these two groups were pooled. Subjects underwent assessments of pulmonary physiology and completed a 6-minute walk test (6MWT), the SGRQ, Short Form-36 (SF-36) and baseline/transition dyspnoea index (BDI/TDI) at baseline, 6 months and 12 months. For this analysis we used data collected at baseline and 6 months. Subjects in BUILD-1 had very well-defined IPF according to international guidelines.1
The SGRQ (12-month version) is a self-administered HRQL instrument for asthma and COPD that contains 50 items divided into three components: Symptoms (8 items), Activity (16 items) and Impacts (26 items).9 Each item has an empirically derived weight, and scores ranging from 0 to 100 are calculated for each component, as well as a total score. Higher scores indicate greater impairment in HRQL.
The SF-3614 is a generic health status instrument that contains 36 items tapping eight domains that can be separated into two psychometrically-derived summary components—mental and physical. Domain and summary component scores range from 0 to 100; lower scores correspond to worse health status. For the summary components we used scoring algorithms to generate linear T-score transformations that place scores on scales with means of 50 and standard deviations of 10.
Baseline dyspnoea index (BDI)
The BDI15 is a dyspnoea questionnaire that evaluates three dimensions: functional impairment, magnitude of effort and magnitude of task at a single time point. It rates the patient's breathlessness in each of these domains on a scale from 0 (severe) to 4 (no impairment), so total scores range from 0 to 12.
Other outcome measures
Forced vital capacity (FVC) and transfer factor of the lung for carbon monoxide (Tlco) were measured in accordance with American Thoracic Society (ATS) guidelines16 17 and expressed as percentages of the gender, age and height-adjusted predicted values (ie, FVC%, Tlco%). The 6MWT was performed according to ATS guidelines.18 The average length of time between physiological testing and completion of the self-report questionnaires was 1 day.
Study phases and statistical analyses
The methodology was based on that developed and validated for creating the SGRQ-C13 and follows very similar processes. Descriptive statistics were generated for baseline data.
Phase 1: Reducing and refining the SGRQ
Items with more than 25% missing responses at baseline were excluded. The performance of retained items was tested using Rasch analysis (RUMM2020; http://www.rummlab.com).13 19 Each of the three SGRQ components was tested separately. The Rasch model provides a template for testing how well each item contributes to the single construct being measured.19 Here, a construct refers to a SGRQ component (Symptoms—respiratory symptoms; Activity—activities that either cause or are limited by shortness of breath; Impacts—the impact of the disease on factors such as emotional health and sense of control). Rasch analysis uses an iterative process whereby the poorest fitting item is removed and the effect on the fit of the remaining is then retested. Item fit was assessed by examining the residual and χ2 fit statistic for each item. The item residual is a summation of the difference between the observed score and the score expected by the model for a particular item and persons. Item residuals between ±2.5 indicate adequate fit to the model. The χ2 test compares the difference between the observed values with values expected by the model across different levels of health for each item. These groups are defined by ordering all patients' responses and then splitting them into groups of approximately equivalent size across the sample (this is done automatically within RUMM2020). A non-significant item χ2 (p>0.05) indicates good fit to the model. Item fit is also assessed graphically using the item characteristic curve (ICC). Items with the worst model fit were removed while ensuring that the balance of items and content validity for each component was retained. The overall fit of each component to the Rasch model was determined by examining the person item separation (PSI) which is analogous with Cronbach α and item–trait interaction χ2 statistic (a non-significant p value (>0.05) indicates fit to the model).
Phase 2: Development of a new scoring scheme
Rasch analysis revealed that response options for some items needed to be collapsed. As published previously, for such items, weights from collapsed response categories were averaged to produce new weights (see online supplement for new weights).13 Linear regression was then used to create scoring algorithms to rescale component scores for the refined version of the SGRQ and place them on a scale compatible with the original SGRQ. Specifically, for each component we regressed SGRQ scores (dependent variable) on revised SGRQ scores (independent variable). We next assessed agreement between the original and revised SGRQ scores using intraclass correlation coefficients and Bland–Altman plots.20 For each component, the difference between scores was plotted against the average of the two scores.
Phase 3: Cross-sectional and longitudinal validity
Correlations between scores from both the original and refined versions of the SGRQ (implementing the scoring as described in phase 2) and other outcome measures were assessed using Pearson correlation coefficients. We analysed change in original or revised SGRQ component scores from baseline to 6 months by using a mixed-effects model (Proc Mixed procedure in SAS) for each component. Each model considered assessment number (baseline or 6-month) as a categorical factor and used an unstructured variance–covariance matrix to model the covariance structure among the repeated measures by subject. All available data were included in the analyses. Besides the Rasch analyses, all statistics were run using SAS Version 9.1.3 (SAS Institute Inc, Cary, North Carolina, USA).
Phase 4: SGRQ-I
The refined SGRQ is called the SGRQ-I. Because development of the SGRQ-I involved the removal of some items and collapsing response categories for other items, we also revised the wording and recall period of several items to make their content more practical.13 We examined internal consistency reliability (Cronbach α21) for each domain of the SGRQ-I and, for comparison purposes, for each domain of the original SGRQ.
Baseline demographic and disease characteristics of the study sample are presented in table 1.
Two of the eight items were removed due to missing data: item 6 (53% missing data) and item 8 (33% missing data). The remaining six items were examined for fit to the Rasch model. The PSI for this component was 0.71 (‘good’).13 The response option for the six items had disordered thresholds. The disordered thresholds were corrected by combining two or more response categories (details are provided in the online supplement). Following these changes the PSI was 0.63 (acceptable) and the item–trait interaction was χ2=10.4, p=0.58, indicating good fit to the Rasch model (see figure 1).
All 16 items were examined with Rasch analysis. The initial PSI was 0.88. Three items (1, 5 and 10) were removed due to a significant χ2 indicating lack of good fit to the Rasch model. A further three items were removed because their location on the severity scale was high (item 1=6.14 logits; item 9=5.7 logits; and item 8=4.2 logits) (see details in the online supplement). After removal of these items the PSI was 0.83 and the final set demonstrated good fit to the model (item–trait interaction χ2=26, p=0.14).
Four of the 26 items were initially deleted due to a high number of missing data: item 17 (45%), item 18 (44%), item 19 (44%) and item 20 (44%). The initial PSI for the remaining 22 items was 0.85. Items 1 and 2 displayed disordered thresholds. These were rectified by combining adjacent response options (see details in online supplement). Three items (1, 13 and 22) demonstrated the worst fit (p<0.0001) and were removed. Item 22 demonstrated good fit to the model but it was removed to improve the overall targeting of this component (item logit=4.6) (see details in the online supplement). Following these adjustments, the component demonstrated good fit to the model (PSI=0.83 and item–trait interaction χ2=50, p=0.06).
We adjusted SGRQ-I scores to achieve equivalence with the scale used for SGRQ scoring. Specifically, linear regression equations yielded the following: SGRQ-I Symptoms score=−1.08+(0.94×SGRQ Symptoms score), R2=0.88; SGRQ-I Activity score=4.2+(0.82×SGRQ Activity score), R2=0.95; SGRQ-I Impacts score=3+(0.82×SGRQ Impacts score), R2=0.95; and SGRQ-I Total score=1.62+(0.87×SGRQ Total score), R2=0.97. The intraclass correlation coefficients showed excellent reliability: symptoms 0.97; activities 0.99; impacts 0.98; and total 0.99. Bland-Altman plots indicated excellent agreement between scores from the SGRQ and SGRQ-I for each component (figure 2).
Table 2 shows correlations between SGRQ or SGRQ-I scores and measures of pulmonary physiology, dyspnoea and health status. For any given outcome measure, correlations were similar between pairs that included SGRQ scores and the corresponding pair that included SGRQ-I scores. The weakest correlations were between SGRQ or SGRQ-I scores and measures of either pulmonary physiology or functional capacity and, among these, pairs that included Symptoms component scores (whether from the SGRQ or SGRQ-I) were weakest of all. Considering all outcomes in the study, the only two correlations that did not reach statistical significance were for SGRQ Symptoms and 6MWD and for SGRQ-I Symptoms and 6MWD. Correlations with dyspnoea and health status (assessed with the SF-36) were moderate to strong, and the strongest were between SGRQ or SGRQ-I Activity scores and SF-36 Physical Functioning domain scores. Table 3 shows the similarity in change scores from components of the SGRQ or SGRQ-I from baseline to 6 months.
The SGRQ-I comprises 34 items, 6 in the Symptoms component, 10 in the Activity component, and 18 in the Impacts component. Response categories for each item in the Symptoms component were collapsed, such that items in this component from the SGRQ-I contain three response options compared with five for the original SGRQ. All items in the Activity component are true/false, so no collapsing was performed—each of the 10 items in this component on the SGRQ-I remains true/false. Response categories for one item in the Impacts component were collapsed, such that this item on the SGRQ-I contains two response options compared with three for the original SGRQ. Internal consistency reliability for each component of the SGRQ versus SGRQ-I was as follows: Symptoms 0.66 vs 0.62; Activity 0.85 vs 0.80; Impacts 0.86 vs 0.85.
A previously validated methodology based on Rasch analyses using HRQL data from a recently completed randomised placebo-controlled trial was used to develop a new IPF-specific version of the SGRQ, the SGRQ-I. We developed new item response categories for items with disordered response thresholds, new weights for the new response categories, and a scoring algorithm that places SGRQ-I scores on a scale with the original SGRQ.
Rasch analysis and psychometric testing confirmed that the SGRQ-I has acceptable measurement properties. Specifically, each component of the SGRQ-I had good fit to the Rasch model, verifying that items within each component tap a single construct. The internal consistency reliability of each component of the SGRQ-I was similar to the reliabilities for the corresponding components of the original SGRQ. The construct validity of the SGRQ-I was supported by the numerous significant correlations between component scores and measures of pulmonary physiology, functional capacity, dyspnoea or health status. It was reassuring to see that coefficients for these correlations were on par with coefficients for corresponding assessments for the original SGRQ. This was not unexpected, given that these correlations were assessed using the same cohort. It will be interesting to examine such correlations in future studies using data from a different sample.
Changes in SGRQ-I component scores over time matched changes in original SGRQ component scores quite well. Rasch analysis has gained momentum as an advanced psychometric technique for the robust development of health-related instruments22 and refinement of others.13 An important feature of Rasch analysis is the ability to examine the hierarchical ordering of response options for polytomous items (ie, items with Likert-type response options). We found that the response options for a number of items did not follow a logical order. This was resolved by combining response options and recalculating weights for such items. After incorporating these weights, a new scoring algorithm was developed to ensure that scores from the SGRQ-I are equivalent to scores from the original SGRQ. Our preliminary correlational analyses demonstrate that the new scoring system is accurate and reliable.
An advantage of Rasch analysis is the ability to gain information about how items are working, both individually and as a scale. Using Rasch methodology here enabled us to select items that generate the most precise measurement of HRQL (in domains tapped by the SGRQ) for patients with IPF. If data fit Rasch model expectations, then a fundamental assumption—that each item contributes reliably to the measurement of the single underlying construct—is met. There are no set rules as to whether mis-fitting items should be retained or removed but, like others,13 we based our decisions on a combination of item-fit statistics and a requirement to maintain the internal consistency reliability and construct validity of the three components.
Our study has limitations. First, the Symptoms component of the SGRQ-I contains a couple of items that would seem not to be entirely relevant to patients with IPF—specifically, item 4 which asks about ‘attacks of wheezing’ and item 5 which asks about ‘attacks of chest trouble’. Despite the apparent lack of face validity of these items, subjects responded to them, unlike items 6 and 8 which were removed because of excessive missing data. Furthermore, they did show a good fit to the other items in the model that have greater face validity as possible IPF questionnaire items. Although removing items 4 and 5 would seem to have been a reasonable step, doing so detracted substantially from the internal consistency reliability and the PSI, so we decided to retain them. Second, patients with coexisting COPD were not excluded outright. Any potential subject with a residual volume (RV) >120% predicted or with a ratio of forced expiratory volume in 1 s to FVC (FEV1/FVC) <65 was excluded. This may have led to the inclusion of some, but not all, subjects with COPD. However, the goal was to produce an instrument that captures HRQL in patients with IPF, including those with comorbidities known to occur in this patient group. The SGRQ-I is not designed to assess the independent impact of IPF on patients' HRQL but, rather, to better (than generic or non-IPF specific instruments) assess HRQL in patients with IPF.
It is likely that many subjects would have responded differently to certain items whose response category structures were modified for the SGRQ-I. Likert scales are often adopted for questionnaires without ever evaluating the ordering of responses. Disordered options violate the meaning implicit in the employed grading system (eg, for the SGRQ, ‘Not at all’ represents the least amount, ‘Most days a week’ represents the greatest amount, and there are ordered levels of increasing amount between the two). Ignoring disordering yields invalid unreliable HRQL response data. Thus, prospective studies will be needed to assess the responsiveness of the SGRQ-I and to determine the minimum change in scores over time that is clinically meaningful. Based on the rigorous statistics and thoughtful approach used here, we would expect the SGRQ-I to be better targeted to aspects of HRQL and even more responsive to underlying change than the original SGRQ. Whether an instrument developed from the ground up specifically for patients with IPF will be even better targeted and more responsive to change is unknown.
In conclusion, we used a systematic statistically-based method to revise the original SGRQ and develop an IPF-specific version, the SGRQ-I. Both the reliability and validity of the SGRQ-I are acceptable and comparable to the original SGRQ. Prospective studies will determine whether the specificity of the SGRQ-I is more responsive to underlying change than the SGRQ in patients with IPF.
Funding Actelion Pharmaceuticals funded the performance of the underlying BUILD-1 trial which investigated the efficacy of bosentan in the treatment of idiopathic pulmonary fibrosis. No additional external funding was obtained. JJS is supported in part by a Career Development Award from the NIH (K23 HL092227).
Competing interests None.
Ethics approval This study was conducted with the approval of the ethical committees (or Institutional Review Boards) of each of the 29 participating centres. All patients gave written informed consent to participate in the study.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.