Article Text

Download PDFPDF

Predictive value of automated oxygen saturation analysis for the diagnosis and treatment of obstructive sleep apnoea in a home-based setting
  1. V Jobin,
  2. P Mayer,
  3. F Bellemare
  1. Laboratoire du sommeil, Centre hospitalier de l’Université de Montréal (CHUM)−Hôtel-Dieu, Montréal, Québec, Canada
  1. Correspondence to:
    François Bellemare PhD
    Laboratoire du sommeil, CHUM−Hôtel-Dieu, 3840 St-Urbain, Montréal, Québec, Canada H2W 1T8; bellemare{at}


Background: A portable monitor for the automated analysis of episodic nocturnal oxygen saturation or Spo2 (the Remmers Sleep Recorder, RSR) has been proposed for the diagnosis of obstructive sleep apnoea-hypopnoea (OSAH). A study was undertaken to compare the diagnostic performance of automated analysis with the manual scoring of polygraphic data by a more comprehensive respiratory monitor (the Suzanne recorder) used simultaneously in their intended home environment.

Methods: The respiratory disturbance indexes of the two monitors were compared in 94 consecutive adult patients suspected of having OSAH and who were deemed eligible for home-based investigation.

Results: The RSR overestimated the number of respiratory events associated with a ⩾4% fall in Spo2 by 13% (p<0.005) but underestimated the number of apnoeas and hypopnoeas defined on the basis of respiratory variables alone or their association with a ⩾4% fall in Spo2 or autonomic arousals by 38–48% (p<0.0001). In addition to these significant biases, the limits of agreement in all instances were wide, indicating a poor concurrence between the two monitors.

Conclusion: The automated analysis of Spo2 with the RSR cannot be substituted for the manual scoring of polygraphic data with the more comprehensive respiratory monitor in the diagnosis of OSAH in an ambulatory home-based setting.

  • AHI, apnoea-hypopnoea index
  • ESS, Epworth Sleepiness Scale
  • FEV1, forced expiratory volume in 1 s
  • OSAH, obstructive sleep apnoea-hypopnoea
  • PSG, polysomnography
  • RDI, respiratory disturbance index
  • RSR, Remmers Sleep Recorder
  • Spo2, oxygen saturation
View Full Text

Statistics from

Obstructive sleep apnoea-hypopnoea (OSAH) is a highly prevalent disorder affecting 2–4% of adult men and women.1 However, the standard diagnostic test for OSAH, that is, supervised in-laboratory overnight polysomnography (PSG), is time consuming and costly. Because of insufficient human, physical and financial resources, accessibility to this form of testing in some places is limited, and the diagnosis of OSAH is consequently delayed.2 To overcome such difficulties, several portable monitors have been proposed that range in complexity from full PSG to oximetry alone.3 Although portable monitoring does not constitute a standard of care, in specialised centres such as ours it can help to reduce the burden for full PSG studies. Monitors that record respiratory variables together with oximetry but without electroencephalography (EEG) and electromyography (EMG) are particularly attractive because the same definition of OSAH as in standard PSG can be employed, and the test can be self-administered by patients in their home, thereby increasing accessibility. This type of monitoring offers high diagnostic sensitivity and specificity (>90%).4 However, given the number of signals being recorded, and because these studies are unattended, the risk of technical failures is higher. Furthermore, the scoring of such studies remains labour intensive as it requires expertise and manual editing.

Although a diagnosis of OSAH can also be made by oximetry alone,5 wide variability in sensitivity and specificity (range 31–100%) has been reported for this type of monitoring.3,4,6,7,8,9,10,11 Furthermore, the reliability of automatic scoring algorithms present on some of these devices has been questioned.3,4 However, in a large prospective study of consecutive patients, Vasquez et al12 reported excellent agreement between the respiratory disturbance index (RDI) derived from the automated analysis of oxygen saturation (Spo2) of the Remmers Sleep Recorder or RSR (RDIRem) and the apnoea-hypopnoea index (AHI) derived from the manual scoring of standard PSG that was minimally altered when EEG-based arousals were incorporated in the definition of AHI. Compared with previous reports on oximetry, RDIRem offered excellent diagnostic sensitivity (>95%) and specificity (>85%). Because the analysis is automated and does not require a trained technician, it should be more widely applicable than more comprehensive respiratory monitors. The purpose of the present study was to assess the performance of this simpler monitor in comparison with the more comprehensive portable respiratory monitor currently employed in our ambulatory setting for the diagnosis of OSAH.


Patients were recruited after referral to the sleep clinic of a university affiliated tertiary care centre. All patients who were prescribed an ambulatory trial by a sleep specialist were eligible for enrolment. In our centre, patients are generally considered eligible for an ambulatory investigation unless they meet one or more of the following exclusion criteria: inability to perform an ambulatory PSG, history of neuromuscular disease, severe lung disease (forced expiratory volume in 1 s (FEV1) <50% predicted), unstable coronary artery disease or referral for parasomnia. Of the 513 consecutive patients eligible for ambulatory investigation between March 2003 and May 2004, 104 were enrolled in the study. The limit for enrolment was set by the number of available monitors (six Suzanne recorders and one RSR; see below). The RSR was offered to patients on a “first come first served” basis. The protocol was approved by the Institutional Human Ethics Committee of CHUM-Hôtel Dieu and all participants provided written informed consent. The number of patients recruited was sufficient to detect a difference in RDI of >5 events/h between the two monitors with a probability of 0.01 and a power of 90%.

Data acquisition devices

A computerised polysomnographic monitor (Suzanne Polysomnographic Recording System, Nellcor Puritan Bennett (Melville) Ltd, Ottawa, Ontario, Canada), hereafter referred to as the Suzanne recorder, and a standardised montage served as the reference for the study. Nasal pressure was recorded by means of a nasal canula, breathing movements by means of piezoelectric respiratory effort belts (Ultima Respiratory Effort Sensor, Model 0522S, Braedon Medical Corp, Carp, Ontario, Canada) placed around the torso and abdomen, body position by a sensor placed over the thorax and held in place by a thoracic belt, SpO2 by a finger probe (Oximax Durasensor, Model DS-100A, Tyco Healthcare, Pleasanton, California, USA) and snoring by means of a miniature microphone attached to the garments (Suzanne Microphone, Model Y-70615, Nellcor Puritan Bennett (Melville) Ltd). With the RSR (SnoreSat, SagaTech Electronics Inc, Calgary, Alberta, Canada), Spo2 was recorded by means of a second finger probe (Oximax Durasensor, Model DS-100A, Tyco Healthcare) attached to the free hand as well as snoring and body position changes by means of a specially designed sensor fixed with adhesive tape over the sternal notch.

Study protocol

After medical assessment the patients were asked to complete a sleep quality questionnaire incorporating the Epworth Sleepiness Scale (ESS).13 They were then instructed how to use the two portable devices simultaneously at night. They were asked to indicate on a log the time they went to bed, the time lapse before sleep onset, the wake time and any problems that occurred during recording. The following morning the data from the two recorders were downloaded to a desktop computer for analysis (see below). At study completion a follow-up questionnaire was sent to all treated patients attending the sleep clinic. Specifically, patients were asked to report the number of months they had been on treatment, whether they were still using their device and, on average, how many hours per night. They were also asked to report whether they were satisfied with their treatment and to complete the ESS questionnaire.

Data analysis

The data from the two monitors were analysed separately and in a blind fashion by one of three trained sleep technicians. Data from the Suzanne recorder were scored manually using Sandman software (Nellcor Puritan Bennett (Melville) Ltd).14 A RDI was calculated based on the number of apnoeas and hypopnoeas per hour of recording time (RDISuz). Apnoea was scored when no breathing could be detected on the nasal pressure trace for at least 10 s. Hypopnoea was scored when the amplitude of respiratory movements detected by the nasal pressure sensor or the thoracoabdominal effort belts decreased by more than 50% below baseline or when the amplitude of respiration was reduced by less than 50% but was followed by a fall in Spo2 of ⩾4%.14 An extended RDI (RDISuz+) was also calculated that incorporated, in addition to apnoeas and hypopnoeas, respiratory events associated with probable autonomic arousals,15 defined as increases in pulse rate of >5 beats/min immediately after a respiratory event characterised by the clear presence of inspiratory flow limitation, a decrease in flow of <30% of the baseline value and no change in Spo2. Finally, for direct comparison with RDIRem, a restricted RDI (RDISuz−) was defined as the number of respiratory events associated with a fall in Spo2 of ⩾4%. The Suzanne recorder samples the Spo2 signal at 12 Hz and reports the average value over the preceding second every second.

For the RSR, no specific scoring was done since the recordings are automatically analysed by the software supplied with the instrument. With this software the raw data can be viewed but no editing is possible. RDIRem is calculated automatically as the number of episodes per hour of oximeter probe-on time with falls in Spo2 ⩾4% below baseline. The RSR samples the Spo2 signal at 1 Hz. Details concerning the detection algorithm of this monitor can be found elsewhere.12 RDIRem appears automatically in the report sheet along with a graphic display of Spo2, snoring and body position as well as a series of related information including recording time, oxygen sensor probe-on time, minimum, maximum and mean Spo2. This report can be visualised and printed.

Statistical analysis

Differences between the two devices were compared using non-parametric statistics and the Wilcoxon signed rank test. The bias and limits of agreement between the two instruments were calculated16 and receiver operating characteristic (ROC) curves generated for selected diagnostic RDI cut-off values. Values are expressed as group mean (SEM) or median (interquartile range, IQR), depending on their distribution. The statistical level of significance was set at p<0.05.


Patient characteristics

Of the 104 patients studied, 10 returned technically inadequate studies, leaving a total of 94 patients (62 men) for final analysis (table 1).

Table 1

 Characteristics of study subjects

Technical considerations

Differences in recording time between the two monitors, although significant, were small (table 2). By contrast, wide variations were found in the time spent in different body positions (data not shown), presumably related to the different features or displacement of body position sensors, preventing meaningful comparisons from being made in different body postures. Mean and minimum Spo2 from these two monitors differed significantly, the RSR yielding consistently lower readings. The difference between mean and minimum Spo2 was also significantly greater for the RSR, suggesting greater sensitivity of its oxygen sensor.

Table 2

 Comparative results between the Suzanne and Remmers monitors

Measures of agreement

The mean (SE) of the differences between RDIRem and RDISuz− was small but statistically significant (1.58 (0.76); p<0.0001), indicating a positive instrument bias. The distribution of these differences is shown in fig 1 in the form of a Bland-Altman plot. Visual inspection indicated that systematic differences occurred near the diagnostic threshold (ie, for RDI values between 5 and 20 events/h, where the impact of the discrepancy was most significant).

Figure 1

 Bland-Altman plots showing the distribution of the differences between the respiratory disturbance index (RDI) of the Remmers Sleep Recorder (RDIRem) and the RDI of the Suzanne recorder calculated according to several definitions of respiratory events (RDISuz+, extended RDI; RDISuz−, restricted RDI). For a definition of the Suzanne recorder RDIs, see footnote to table 3. In each panel the horizontal solid line represents the mean difference and the broken lines ±2SD of the observed differences.

The mean (SE) values of the differences between RDIRem and RDISuz (−6.04 (0.94); p<0.0001) and between RDIRem and RDISuz+ (−10.16 (1.03); p<0.0001) were larger, reflecting the addition of respiratory events with <4% falls in Spo2 and with autonomic arousals, respectively. These negative biases are substantial, representing approximately 38% and 48% of all respiratory events included in RDISuz and RDISuz+, respectively. The Bland-Altman plots in fig 2 illustrate the distribution of these differences. Again, visual inspection indicated that large differences between the two recorders occurred near the diagnostic threshold where these differences mattered. Furthermore, the limits of agreement were wide, suggesting poor agreement between the two monitors.

Figure 2

 Receiver operating characteristic (ROC) curves of the respiratory disturbance index (RDI) of the Remmers Sleep Recorder for two diagnostic cut-off values of the Suzanne recorder RDI calculated according to several definitions of respiratory events: Solid lines represent events defined on the basis of their association with a ⩾4% fall in Spo2 (RDISuz−). Dotted lines represent events defined on the basis of a ⩾50% reduction in mechanical variables or ⩾4% fall in Spo2 (RDISuz). Dash-dotted lines represent events defined on the basis of their association with a ⩾50% reduction in mechanical variables or a ⩾4% fall in Spo2 or autonomic arousals (RDISuz+).

Predictive value of the automated analysis

The sensitivity and specificity of RDIRem varied with the diagnostic RDI cut-off value selected as well as with the RDISuz definition (table 3, fig 2). Compared with RDISuz−, RDIRem had reasonable sensitivity at the same cut-off points but specificity (74%) and likelihood ratio for a positive test result (3.68) near the diagnostic threshold were low. When compared with RDISuz and RDISuz+, the sensitivity of RDIRem was markedly reduced at all cut-off points, but specificity was minimally altered. Because of this lower sensitivity, the likelihood ratio for a negative test result increased markedly at all cut-off points. At the diagnostic cut-off value of ⩾5 events/h, only between 76% and 77% of the patients were correctly classified by RDIRem. At this cut-off point, a diagnosis of OSAH could be made with confidence (ie, 100% specificity) when RDIRem was ⩾20 events/h (RDISuz ⩾5) or ⩾10 events/h (RDISuz+ ⩾5). With these cut-off points, however, only between 29% and 53% of OSAH patients were identified by RDIRem.

Table 3

 Predictive value of automated analysis of the Remmers sleep recorder


Of the 94 patients, 49 met the minimum diagnostic criteria (⩾5 events/h) using RDISuz−, of whom two had RDIRem of <5 (Table 4). Adding respiratory events with >50% reduction in respiratory movements but <4% fall in Spo2 in the definition of RDISuz led to 25 additional patients meeting this criterion, of whom 16 had RDIRem <5. Adding respiratory events associated with autonomic arousals in the definition of RDISuz+ led to 12 additional patients meeting the minimum diagnostic criteria, of whom 9 had RDIRem <5.

Table 4

 Patient distribution according to minimum required diagnostic criteria

Of these 94 patients, 13 were referrals from external centres. Of the 81 patients attending our clinic, 5 were lost to follow-up and 5 were referred for a second evaluation, leaving 71 new cases. Fifty-five patients with RDISuz+ ⩾5 events/h were prescribed a treatment trial with continuous positive airway pressure or a mandibular advancement prosthesis. A follow-up questionnaire sent to all these patients 6 months to 1 year after completion of the study was returned by 41. Compliance with treatment, defined as the number of patients who reported using their device regularly, was 95%, and all but 4 patients were satisfied with their treatment. Patients reported having used their device for 1.5–9 h/day over a period of 0.5–17 months. The ESS score decreased from an average of 13.0 (range 3–21) before treatment to 8.0 (range 1–14) while on treatment (p<0.0005; n = 38). The response to treatment was independent of whether the diagnosis was based on RDISuz−, RDISuz or RDISuz+ criteria (equality of medians test, p = 0.45; table 4).


This study shows that the limits of agreement of the two monitors are large, leading to low sensitivity and specificity of the RSR. As an oximeter, the automated analysis algorithm of the RSR overestimated by 13% the number of respiratory events with ⩾4% fall in Spo2. As an apnoea-hypopnoea monitor, the RSR underestimated the number of respiratory events by 38–48%. Only between 29% and 53% of OSAH patients, identified on the basis of RDISuz or RDISuz+, were identified with confidence by the RSR.

Comparison of RDIRem and RDISuz

In a previous study by Vasquez et al12 no significant bias was found between RDIRem and PSG-derived AHI, even when EEG-based arousals were included in the definition of AHI. In the present work, a significant positive instrument bias was apparent when respiratory events were defined on the basis of their association with ⩾4% fall in Spo2, RDIRem exceeding RDISuz− by 13%. By contrast, when respiratory events associated with a <4% fall in Spo2 or with autonomic arousals were included in the definition of RDISuz, a negative instrument bias was evident, RDIRem underestimating RDISuz by 38% and RDISuz+ by 48%.

RDIRem values in excess of RDISuz− may have been caused by inadvertent changes in the Spo2 signal unrelated to respiratory events detected by automated analysis with RSR, but filtered out by manual scoring with the Suzanne polygraphic recorder. However, as the data in table 2 suggest, differences in the sensitivity of the oxygen sensors or analysis algorithms of the two recorders may also contribute to explain this finding.

On the other hand, RDIRem values smaller than RDISuz or RDISuz+ are probably explained by different definitions of respiratory events. This should also contribute to an explanation of the difference between our results and those of Vasquez et al. In their study, hypopnoea would be scored if there was a fall in Spo2 of ⩾4% whereas, in our work, a fall in Spo2 of this magnitude was required only when the decrease in nasal flow or thoracoabdominal movements was <50%.14 Because apnoeas and hypopnoeas longer than 10 s duration can occur without significant changes in Spo2,17 events associated with a fall in Spo2 of <4% would contribute to RDISuz and RDISuz+ but not to RDIRem, thereby increasing the bias in our study compared with theirs.

In our investigation the inclusion of autonomic arousals in the definition of RDISuz+ heightened instrument bias significantly whereas, in the study by Vasquez et al, no significant bias was noted when EEG-based arousals were included in the definition of hypopnoea.12 In their trial, however, the inclusion of EEG-based arousals did not change the AHI significantly, implying a negligible contribution of these events to AHI. By contrast, in the present study, using autonomic arousals as a surrogate for EEG microarousals,15 RDISuz+ significantly exceeded RDISuz, explaining the greater bias found when this criterion was included in the definition of RDI.

Although a negative bias could be anticipated in our study based on the different event definitions, its magnitude could not be predicted. In a previous investigation by Redline et al, adding a ⩾4% desaturation criterion as a prerequisite for scoring a respiratory event reduced the median RDI by 85% (from 29.3 to 4.4) and the prevalence estimate by about 55% (from 100% to 45%).18 In that study, however, flow was measured with a thermocouple, a method which may not reflect flow accurately.14 The closest comparison in our investigation would be between RDISuz and RDISuz−, showing a 66% reduction in median RDI (from 12.9 to 5.6) and a 26% decrease in the prevalence estimate (from 78% to 52%). The fact that, on average, 38–48% of respiratory events in the present study could be associated with a fall in Spo2 of <4% may be understood when considering that the duration of the events as well as the pre-event oxygen saturation level, end-expiratory lung volume and metabolic rate all contribute to determine the magnitude of the fall in Spo2 during obstructive events.17 Because these factors can be expected to vary substantially in different patients, they may also contribute to explain the large inter-individual variability in the difference in RDI and, hence, the poor agreement between the two monitors.

One could question the clinical significance of including respiratory events associated with <4% fall in Spo2 or with autonomic arousals in the definition of RDISuz. As shown by Guilleminault et al,19 sleep fragmentation can occur in a number of patients with OSAH in association with periodic increments of respiratory efforts but without significant oxygen desaturation. These patients can be very symptomatic and benefit from treatment. Furthermore, as demonstrated experimentally by Martin et al,20 even subcortical autonomic arousals can increase sleepiness. In the present study the inclusion of respiratory events with <4% fall in Spo2 or with autonomic arousals in the definition of RDISuz led to 37 additional patients being classified as having OSAH, of whom 50% were prescribed a treatment trial with benefits in most of them (table 4). These data suggest that a substantial number of patients with OSAH identified on the basis of these criteria can benefit from treatment, thereby justifying their inclusion in the definition of RDI.

Implications of the findings

Significant negative bias and poor agreement between the two monitors both contributed to reduce the diagnostic sensitivity and specificity of the RSR. Using a diagnostic cut-off value of ⩾15 events/h, Vasquez et al reported 98% sensitivity,12 a value comparable to that of the present study when a ⩾4% fall in Spo2 was required to define a respiratory event with the Suzanne recorder. However, when respiratory events associated with a <4% fall in Spo2 or with autonomic arousals were included in the definition of RDISuz, the sensitivity of the RSR at the same cut-off point was markedly reduced to between 52% and 63%. These sensitivities are within the range reported previously for home oximetry (range 31–98%).4 The implication of this is that, for a substantial number of patients, additional investigations would have been required had the RSR been employed as the sole instrument. According to our results, for a positive diagnosis of OSAH (defined as ⩾5 events/h) to be made with 100% confidence using the RSR would require that RDIRem be ⩾20 or ⩾10 events/h, depending on whether autonomic arousals are included or not in the definition of RDISuz. Because only between 33% and 49% of all cases met these criteria, a second study with a more comprehensive respiratory monitor would have been needed in 51% or more of patients. Although a cost-benefit analysis of the RSR under such circumstances has not been performed, the results are unlikely to show significant benefits. Furthermore, because a second study would entail additional delay in the diagnosis of OSAH in a substantial number of patients, substitution of the more comprehensive respiratory monitor by the automated analysis of the RSR is not warranted.

With the RSR, no editing of RDIRem was possible. Spo2 traces, however, could be reviewed. Other signals such as flow and respiratory effort belts may also be added and reviewed. As several studies have pointed out, visual inspection of Spo2 tracings can improve the diagnostic sensitivity of portable monitors. In an investigation by Douglas et al,8 66% of patients with OSAH were correctly identified by visual inspection of Spo2 traces alone with no false positives. Using a ⩾4% fall in Spo2, only 41% of patients with OSAH were correctly identified with 97% specificity. In a study by Sériès et al9 the visual appreciation of periodic fluctuations in Spo2 correctly identified 98% of patients with OSAH with a specificity, however, of only 48%. Although this has not been examined in the present work, it is conceivable that visual inspection of Spo2 traces and of other signals such as flow by a sleep specialist could improve the diagnostic sensitivity of the RSR.

Another limitation of our study is that the performance of the RSR in the home environment was not compared with diagnostic in-laboratory PSG. Differences between EEG-based sleep time and recording time, night to night variability and different environments are all factors that can affect RDI and, hence, the diagnostic performance of the RSR. By using the two recorders simultaneously in the home environment, we avoided these potential confounders. While this has advantages with respect to defining the performance of automated analysis, further validation comparing home-based monitoring with in-laboratory PSG is warranted. However, because the added variability associated with the home environment can be expected to reduce the diagnostic sensitivity and specificity of the RSR further, such a comparison is likely to reinforce our conclusions.

In summary, this study shows that, in spite of a significant association between the RDI of the two monitors, the low diagnostic sensitivity and specificity of RDIRem near the diagnostic threshold are such as to deter substitution of the more comprehensive respiratory monitor.


The Remmers Sleep Recorder used in this study was provided by Sagatech Electronics Inc, Calgary, Alberta, Canada. The authors acknowledge the editorial assistance of Ovid Da Silva, Research Support Office, Research Center, CHUM.


View Abstract


  • Published Online First 24 January 2007

  • FB was supported by Medigas, OSR Medical and the Laboratoires Biron.

  • Competing interests: None.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles