Background: A short, standardised, self-administered quality of life questionnaire would be a useful addition to the outcome measures in obstructive sleep apnoea (OSA) research. A study was therefore undertaken to validate a new OSA specific self-administered questionnaire (the Quebec Sleep Questionnaire, QSQ) for use in clinical trials.
Methods: This study followed a description of health related quality of life in patients with OSA. Construct validity and responsiveness were tested by comparing the baseline and changes in domain scores (daytime sleepiness, diurnal symptoms, nocturnal symptoms, emotions, social interactions) with those of questionnaires measuring related constructs (SF-36, Epworth Sleepiness Scale, Beck Depression Inventory, SCL-90, and Functional Outcomes in Sleep Questionnaire).
Results: Sixty patients (48 men) of mean (SD) age 55 (10) years participated in the study. In the analysis of the discriminative function of the questionnaire, moderate to high correlations were found between the scores in each domain of the QSQ and the corresponding questionnaires. In the analysis of its evaluative function significant differences were found in score changes between patients who were treated and those who were not, and moderate to high correlations were seen between changes in scores in the QSQ and changes in the corresponding questionnaires. Most of these correlations met the a priori predictions made regarding their magnitude.
Conclusion: The QSQ is a valid measure of health related quality of life in patients with OSA and is sensitive to treatment induced changes.
- obstructive sleep apnoea
- quality of life
- Quebec Sleep Questionnaire
Statistics from Altmetric.com
Obstructive sleep apnoea (OSA) clearly affects important domains of quality of life which remain unexplored by the nocturnal recording of physiological variables in the sleep laboratory.1 The identification of the areas of patients’ health related quality of life most likely to be specifically affected by OSA represents an important initial step in the full evaluation of the impact of the disease and its treatment modalities. In our descriptive study of the impact of OSA on patients’ quality of life we were reassured to find items that were remarkably similar to those of the Sleep Apnoea Quality of Life Index (SAQLI), an OSA specific questionnaire developed independently of ours and used as an evaluative instrument (that is, as a clinical outcome in clinical trials).2 In our view, this similarity represented a strong argument in support of the comprehensiveness of our description of quality of life in OSA.
We recently expanded our study of the validation of the SAQLI3 but we are concerned about some redundancy in the items that form most of its domains. The SAQLI is interviewer administered and its “symptoms” domain is individualised—that is, patients are asked to select from a list of items the most important symptoms they have experienced. A wide spectrum of symptoms may therefore be chosen by the patients so that each respondent will answer a different set of questions. Individualised questionnaires offer the potential of enhanced responsiveness when the instrument is used in clinical trials.4 The administration of the SAQLI by an interviewer ensures a high completion rate. However, these interesting properties of the SAQLI are at the expense of it being rather sophisticated and time consuming. Also, in long term or large clinical trials the ease and convenience of standardised items (as opposed to individualised items) may outweigh the benefits of individualised items.5
We reasoned that a short, standardised, self-administered quality of life questionnaire would be a useful addition to the outcomes measured in patient orientated research in OSA. The objective of this study was to examine the validity, reliability, responsiveness, and interpretability of a new short self-administered OSA specific quality of life questionnaire to be used in clinical trials.
The Quebec Sleep Questionnaire
The Quebec Sleep Questionnaire (QSQ) is a 32-item OSA specific questionnaire developed for use as an evaluative instrument—that is, as a clinical outcome in clinical trials6—according to standard methodology described elsewhere.1 In brief, we first constructed a comprehensive list of items potentially related to quality of life of patients with OSA. From this list, consecutive patients were asked, at the time of the diagnosis, to identify the most significant items and to grade their importance. The item impact was determined from the proportion of patients who identified the item to be important and the mean importance score attributed to this item (impact score = frequency × importance). One hundred patients were interviewed, and the items with the most important impact on quality of life were clustered into five domains: (1) hypersomnolence; (2) diurnal symptoms; (3) nocturnal symptoms; (4) emotions; and (5) social interactions. The QSQ is standardised—that is, all respondents answer the same set of questions. Each domain includes 4–7 items and each item is scored on a 7-point scale.
Consecutive adult patients in whom OSA had recently been diagnosed and who were still untreated were eligible for the study. The study population was different from that of our previous studies.1,3 Diagnostic criteria for OSA included: (1) apnoea plus hypopnoea index ⩾15 (an apnoeic event being defined as a cessation of the oronasal flow for at least 10 seconds, and hypopnoea as a 50% decrease in the nasal pressure signal associated with a desaturation of >3%7 and/or arousal); or (2) typical nocturnal home oximetry recording showing repetitive short duration fluctuations in arterial haemoglobin saturation in patients with excessive daytime sleepiness.8 After the initial evaluation, therapeutic decisions were left to the patient and the treating physician.
The study of the construct validity of the QSQ followed the same methodology as that used in our validation study of the SAQLI.3 Briefly, once the diagnosis of OSA had been made and before the initiation of OSA specific therapy, the QSQ was administered to 60 consecutive patients (time 1). At the same time the patients completed the following five other questionnaires measuring constructs related to those measured by the QSQ:
Medical Outcome Survey—Short Form 36 (SF-36)9,10: a generic self-completed questionnaire that measures eight dimensions of health (physical functioning, role limitation due to physical problems, role limitation due to emotional problems, social functioning, mental health, energy/vitality, bodily pain, and general health perceptions).
Symptom Checklist—90 (SCL-90)11: contains 90 items relating to nine different domains (anxiety, depression, hostility, obsessive-compulsiveness, sensitivity, sleeping disturbances, agoraphobia, somatisation and psychoticism). We limited our use of the SCL-90 to the depression and hostility domains.
Beck Depression Inventory (BDI)12: a 21 item commonly used traditional instrument developed specifically to identify depression. It also has been extensively used as an evaluative instrument (that is, to monitor response to treatment in clinical trials).
Epworth Sleepiness Scale (ESS)13: a simple self-administered eight item questionnaire measuring the risk of falling asleep in eight specific situations that are commonly met.
Functional Outcomes in Sleep Questionnaire (FOSQ)14: a 30 item self-report questionnaire designed to measure the impact of excessive sleepiness on multiple activities of daily living. It comprises five dimensions: activity level, vigilance, intimacy and sexual relationships, general productivity and social outcome.
Because of long waiting lists, several months often elapse between the diagnosis of OSA and the initiation of treatment. We used this period to examine the test-retest reliability of the questionnaire before the initiation of any treatment, assuming clinical stability over this period, by administering the same set of questionnaires to a subgroup of 19 patients on the day preceding their CPAP titration (time 2). At their 3 month follow up visit (time 3) 36 patients completed the same set of questionnaires whether or not they had received any treatment for OSA over this period. The respondents were then unaware of their previous responses. In addition, patients were asked to make a global rating of changes in their OSA related symptoms, daily life activities, social interactions, and emotions over the study period. For instance, they were asked: “Overall, has there been any change in your social life since the last time you saw us?”. Changes were scored on a 15-point scale, from −7 (a very great deal worse) to 0 (no change) to +7 (a very great deal better). The administration of these five questionnaires was not supervised and took on average 45 minutes.
Baseline and sample size
Descriptive statistics (proportions, means and standard deviations) were used to describe the study population at baseline. We computed that at least 45 patients were needed if moderate (r = 0.50) but statistically significant correlations were to be detected in the baseline discriminative analyses at the 0.01 level (β error 0.15).15 Individual items were equally weighted and the results were expressed as the mean score per item (ranging from 1 to 7) within each domain. The other questionnaires were analysed as advocated by their respective authors.
Distribution in scores at baseline, reliability, and internal consistency
We first plotted the distribution in scores at baseline in order to investigate the potential for “ceiling effect” (the situation in which the patients with the best score may nevertheless have significant quality of life impairment) and “floor effect” (the situation in which patients with the worst score may deteriorate further).16 Test-retest reliability was determined by correlating the baseline results (time 1) with those obtained before the initiation of nasal CPAP therapy (time 2). It was calculated using intraclass correlation coefficient, an index that corrects correlation for systematic bias that may exist if all the patients score higher (or lower) after a period of observation.17 In addition, we illustrated the test-retest reliability (repeatability) of each domain of the questionnaire by plotting the difference in scores against the mean for each patient.18 Internal consistency (the extent to which different items in an instrument are measuring the same construct) was determined for each domain using Cronbach’s alpha statistics.19
The extent to which the QSQ can distinguish among groups of patients was measured.20 Cross sectional construct validity was evaluated by correlating baseline quality of life scores with other related measures. Throughout the regression analyses, given the multitude of comparisons involved, statistical significance was set at the 0.01 level.
In this analysis we examined the extent to which the QSQ can capture changes in quality of life over time—that is, the responsiveness of the questionnaires.20 This was primarily tested as the ability of the questionnaires to detect statistically significant differences in scores in the patients who were treated over the study period (time 3–time 1) using paired t tests. In addition, we computed the standardised response mean that compares the magnitude of change with the standard deviation of change,21 and also examined the ability of the questionnaire to distinguish between groups of patients (treated v untreated) in terms of a change in quality of life during the study period (time 3–time 1) using unpaired t tests. Longitudinal construct validity was then demonstrated by correlating within-subject changes in quality of life scores over the study period with within-subject changes in other quality of life indices, and by showing that correlations of changes in different measures conform with what one would expect if the questionnaire is measuring what it is supposed to measure.
For an evaluative instrument, a score is interpretable when it tells the reader whether a particular change in score represents a significant clinical improvement or deterioration.22 We compared the results of the global rating of change questions with the within-domain changes in scores. Those who scored −3, −2, −1, 1, 2, or 3 on the global rating of change question were classified as having experienced a “small change” in quality of life. The mean absolute change in score in the OSA questionnaire was considered as the minimal clinically important difference—that is, the smallest difference perceived by the average patient.22
A priori predictions
We formulated the following a priori predictions regarding the direction and magnitude of the correlations. At baseline we anticipated moderate to high correlations (0.4⩽r⩽0.7) between scores in each domain of the QSQ and the corresponding instruments. Also, given the expected inability of generic questionnaires to detect change over time, we anticipated weak to moderate correlations (0.2⩽r<0.4) between changes in scores in the QSQ and changes in the corresponding instruments. If the actual correlations met these a priori predictions, this would strengthen inferences regarding the validity of the OSA specific questionnaires.20
The demographic and clinical characteristics of the 60 consecutive patients who agreed to participate in the study are summarised in table 1. Of the 36 individuals who were available at 3 month follow up, 27 received nasal CPAP during this period and nine did not and remained untreated throughout the study period. The baseline clinical characteristics of the 27 treated patients were not statistically different from the nine untreated patients.
Distribution in scores at baseline, reliability, and internal consistency
The scores in each of the QSQ domains covered the whole range (from 1 to 7), indicating no obvious floor or ceiling effect (fig 1). Test-retest reliability was determined from the 19 who completed the questionnaires before the initiation of nasal CPAP at time 2. The median time between times 1 and 2 was 7.6 months. Test-retest reliability was excellent, as indicated by the following intraclass correlation coefficients: daytime sleepiness, r = 0.91; diurnal symptoms, r = 0.89; nocturnal symptoms, r = 0.87; emotions, r = 0.82; social interactions, r = 0.86. A typical Bland-Altman diagram is shown in fig 2. Cronbach’s alphas were as follows: daytime sleepiness (6 items), 0.83; diurnal symptoms (10 items), 0.94; nocturnal symptoms (7 items), 0.76; emotions (5 items), 0.78; and social interactions (4 items), 0.68.
The observed cross sectional correlations supporting the discriminative validity of the questionnaires are shown in table 2. We observed moderate to high correlations between the QSQ and all the other related measures. The magnitude of these correlations met our a priori predictions.
The ability of the QSQ and the FOSQ to detect changes is summarised in table 3. Results are presented as within-group differences in the treated group. The ability to detect change in the treated group was higher for the QSQ than the FOSQ. Also, in examining the ability of the questionnaire to distinguish between treated and untreated patients during the study period, we did not find any significant difference in scores between the treated and untreated groups at baseline (data not shown). However, at follow up, statistically significant differences were observed (table 4). Longitudinal construct validity correlations are shown in table 5. Overall, there were moderate correlations between the changes in the QSQ and the related instruments. The weakest correlations were in the social interactions domain. As expected, these correlations were smaller than those obtained in the cross sectional analysis. Again, the magnitude of most of these correlations met our a priori predictions.
Across the domains the differences in score that represented a small change were as follows: daytime sleepiness, 1.8; diurnal symptoms, 2.0; nocturnal symptoms, 1.5; emotions, 1.1; social interactions, 2.5. These differences may be regarded as the “minimal clinically important differences” for each domain.
This validation study indicated that the QSQ represents a valid measure of health related quality of life in patients with OSA. Also, the QSQ is sensitive to treatment induced change, a property required for any questionnaire to be used in clinical trials to evaluate the effect of new treatment. Because the SAQLI is also a valid and sensitive alternative, several of our findings deserve further comments and discussion.
In our descriptive study of the impact of OSA on patients’ quality of life we identified items that were similar to those of the SAQLI.1 Accordingly, the items composing the QSQ overlap, to some extent, with those of the SAQLI. In our view, this is in favour of the face validity, content validity, and cross-cultural adaptability of both questionnaires. However, we organised the items of the QSQ into different domains using the impact method, a method that uses clinical judgement in the composition of the domains of a new questionnaire.23 We preferred the “clinical impact method” over the “factor analysis method” in which mathematical linkage between items is explored. Although both methods may lead to the selection of different items into different domains, significant overlap usually exists when they are compared.23 None of the methods has proved superior to the other in selecting items to describe quality of life in specific health conditions. As excessive daytime sleepiness is the cardinal symptom in OSA, we felt that it should be in itself a full domain of the questionnaire. The “diurnal symptoms” domain of the questionnaire measures symptoms related to lack of energy, difficulties with concentration and memory, and performance at work.
We believe that the time lag between assessments to examine the stability (test-retest reliability) of an instrument depends on the health condition under study. Too short a period might allow patients to recall their previous responses, and too long a period might allow a true change in their status.23 Sleep apnoea is most often a condition from which patients have suffered for years, and in this study our patients had experienced OSA related symptoms for an average of more than 8 years (table 1). We therefore believe that several months is not excessive for measuring reproducibility in an OSA population. This was verified by the high intraclass correlation coefficients observed.
We used Cronbach’s alpha as a standard measure of internal consistency. Investigators often see high Cronbach’s alpha values as an indication of reliability of a questionnaire. Coefficients above 0.7 are generally regarded as acceptable, although it is often recommended that values should be above 0.8 (good) or even 0.9 (excellent).24 We consider that high Cronbach’s alphas indicate that there is some redundancy in the items that form the domain. In such circumstances, reducing the number of items in a given domain is likely to provide the same information while shortening the time of administration and enhancing the completion rate of the questionnaire. Reporting on Cronbach’s alpha should also specify the number of items in the scale. We did not reduce the number of items in the “diurnal symptoms” domain despite a Cronbach’s alpha of 0.94 because some of the items may not apply to individual patients. In such circumstances, provision is made for the exclusion of one or more of the 10 items that compose the domain, so that the score is given by the mean score of the remaining items.
The QSQ proved sensitive to change in quality of life in several ways. Statistically significant differences were observed in those who were treated with nasal CPAP (table 4). In addition, the change in scores observed in treated patients was real because the QSQ was able to distinguish between treated and untreated patients (table 3). There is no agreement or consensus on the preferred statistical method for assessing an instrument’s responsiveness. The existing approaches to the evaluation of responsiveness have been summarised by Liang.21 We adopted the standardised response mean that compares the magnitude of change with the standard deviation of change for several reasons. It represents an intuitive estimate of the signal to noise ratio defining responsiveness.20 In addition, it has direct implications for sample size determination for those planning clinical trials. The larger the standardised response mean, the smaller the sample size to demonstrate a treatment effect.
The lack of significant correlations in rating of change between the “social interactions” domain of the QSQ and related measures (especially the “social outcome” domain of the FOSQ) is of concern. Because both the QSQ and the FOSQ were able to detect change over time (table 4), this cannot be attributed to lack of responsiveness of either of the two questionnaires. A more likely explanation is that both questionnaires are measuring different constructs. For instance, the QSQ asks patients about their lack of will to do things together with their partner, children or friends, or their guilt about their relationship with family members or a close personal friend. In a two-item domain, the FOSQ asks about difficulties visiting family or friends either at the patient’s or host’s home.
Among the considerations in the selection of any quality of life questionnaire in clinical trials, its mode of administration is of primary importance. The QSQ is administered without supervision and may even be mailed to patients. Such considerations were important in the development of a standardised version of an asthma specific quality of life questionnaire that retained most of the measurement properties of its individualised counterpart.5 Only a direct comparison of both the SAQLI and the QSQ would provide evidence of the strengths and weaknesses—in terms of acceptability, completion rate and measurement properties—of the two instruments in different clinical settings.
Finally, we wish to comment on the issue of cross-cultural adaptability, a problem that relates to the development, translation, or utilisation of questionnaires in languages other than English. There is more and more evidence, including this study, that careful translation and back translation of quality of life questionnaires can produce non-English language versions that appear to behave in a very similar manner to their originals.25 We did translate the QSQ from French to English using this method. Both versions are available on request.
We conclude that the QSQ is a valid measure of health related quality of life in patients with OSA and is sensitive to treatment induced changes. We determined the differences in score that may be regarded as the “minimal clinically important differences” for each domain. We believe that the QSQ is a useful instrument for evaluating the impact of OSA and its treatment modalities.
This study was supported by the Quebec Lung Association. Yves Lacasse is a clinician scientist and Frédéric Sériès is a research scholar of the Fonds de la recherche en santé du Québec.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.