Statistics from Altmetric.com
What is the key question?
How should sleep portable monitor (PM) recordings be scored in the absence of EEG-defined arousals and sleep duration?
What is the bottom line?
Hypopnoea criteria using 3% oxygen desaturation level without pulse-wave amplitude (PWA) drops as EEG arousal surrogate accurately classified mild-to-moderate obstructive sleep apnoea (OSA), whereas including PWA drops may provide a higher accuracy only for diagnosis of severe OSA.
Why read on?
This study allows direct comparison between complete polysomnography and type 3 PM recorders, provides new insights on the scoring of respiratory events with PM recordings and evaluates a novel way to define respiratory events with the use of PWA drops as surrogates for EEG arousals.
Obstructive sleep apnoea (OSA) is a highly prevalent disease associated with neurocognitive1 impairment and cardiovascular morbidity.2 ,3 Despite an increased awareness, studies have shown that >75% of the patients remain undiagnosed and untreated.4 To date, attended in-laboratory polysomnography (PSG) is considered the gold standard for OSA diagnosis.5 However, considering the cost, technical complexity and human resources required for PSG, unattended portable monitor (PM) devices have been proposed as an alternative technique for the diagnosis of OSA. Recent studies suggest that the diagnosis of OSA with PM devices can be as accurate as with PSG in selected populations.6–11 Current guidelines from the American Academy of Sleep Medicine (AASM) recommend the use of unattended PMs for the diagnosis of OSA only in combination with a comprehensive clinical assessment in patients with a high pre-test probability of moderate-to-severe OSA and without comorbid sleep disorder or major comorbid medical disorders.6 ,12 The AASM recommends type 3 PM devices, which include an average of 4–7 channels with at least a measurement of airflow, respiratory effort and blood oxygenation.6 ,13
There are however concerns regarding the interpretation of PM recordings because of the lack of an EEG signal for the determination of sleep periods and arousals. Frequency of respiratory events identified with PM devices (PM-apnoea–hypopnoea index (AHI)) is calculated based on the number of apnoeas and hypopnoeas divided by the total recording time (TRT), while the PSG-AHI is calculated based on the number of apnoeas and hypopnoeas divided by the total sleep time (TST). Moreover, PM recordings cannot incorporate the criteria of arousals used for the identification of hypopnoeas as recommended by the AASM.5 ,14 Therefore, PM devices may underestimate the severity of OSA because of a longer TRT compared with TST (increased denominator) and because of a lower number of hypopnoeas due to the absence of EEG arousal detection (decreased numerator). On the other hand, possible scoring of respiratory events during nocturnal wake periods with PM devices (increased numerator) may result in an overestimation of the PM-AHI compared with the PSG-determined AHI.
Currently, there are no specific recommendations for the scoring of respiratory events in PM recordings. Even though type 3 PMs cannot detect arousals because of the absence of EEG, indirect detection of arousals using surrogate signals have been elaborated. For instance, pulse-wave amplitude (PWA) drops have been shown to be a sensitive surrogate marker for EEG arousals.15 PWA drops is a non-invasive measure obtained from finger photoplethysmography provided by most pulse oximeters. It is believed to be a marker of peripheral vasoconstriction induced by the autonomic response associated with arousals from sleep (autonomic arousals). PWA drops could therefore be considered as a surrogate for EEG arousals and used as a secondary criterion in the definition of hypopnoeas in PM recordings. Figure 1 shows an example of a hypopnoea followed by an arousal and a PWA drop.
Even though this marker has not been validated in the general population, PWA drops sensitivity of 89.1% and 70.9% for EEG arousals were reported in patients with non-invasive ventilation and in patients with OSA respectively.15 ,16 Moreover the magnitude of PWA drops is associated with the EEG arousal intensity and the use of PWA drops as surrogate for EEG arousal was shown to improve inter-scorer reliability for PSG.17 ,18
We hypothesised that the use of PWA drops as surrogates for EEG arousals may improve the sensitivity of the PM-AHI and that the use of a higher threshold for oxygen desaturation (4% instead of 3%) could improve the specificity of the PM-AHI for the diagnosis of OSA. The objective of our study was to determine the performance of different hypopnoea scoring criteria, including or not PWA drops and using 3% or 4% oxygen desaturation thresholds, for the scoring of PM recordings in comparison with complete home PSG.
Selection and description of the participants
For the present study, 312 subjects were drawn from the HypnoLaus cohort (previously described19) and stratified according to AHI severity (<5/h, 5–15/h, 15–30/h and >30/h), age (≤60 and >60 years of age) and gender. We also identified a subpopulation with a high pre-test risk of OSA defined as an Epworth Sleepiness Scale (ESS)20 score >10 and a STOP-BANG score21 of at least 3/8. The latter score included the following items: Snoring, daytime Tiredness, Observed apnoea, high blood Pressure, Body mass index ≥35 kg/m2, Age ≥50 years, Neck circumference ≥40 cm and male Gender. Subjects on β-blocker therapy were excluded from the analysis as this treatment could potentially blunt autonomic response and therefore alter PWA drop detection.
Complete PSGs were performed unattended at home and were scored according to the AASM 2007 recommendation.14 Certified PSG technicians equipped the subjects with the PSG recording system (Titanium, Embla Flaga, Reykjavik, Iceland) in late afternoon at the Lausanne University Sleep Centre (Centre for Investigation and Research in Sleep). PSG recording and interpretation techniques used in HypnoLaus was previously described.19 Respiratory events were scored manually according to the updated AASM 2012 recommendations5 with hypopnoea defined as: (1) a drop of nasal pressure excursion of at least 30% from pre-event baseline, (2) lasting at least 10 s and (3) accompanied with either a 3% oxygen desaturation or an EEG arousal. Frequencies of respiratory events with PSG were reported as AHI, that is, the number of apnoeas and hypopnoeas divided by the TST in hours. The arousal index was calculated as the number of EEG arousals divided by the TST.
Type 3 portable monitor devices
The PSG recordings were reinterpreted by a trained pulmonologist (SV) after removing channels available only in PSG (EEG, electro-oculography and muscle activity) in order to simulate a type 3 PM device. The remaining channels were: nasal pressure, pulse oximetry, thoracic and abdominal respiratory effort belts and body position. Recording duration was determined according to the subject's self-reported ‘lights-off’ and ‘lights-on’ times. Respiratory events were scored manually. Hypopnoea was defined as: (1) a drop of nasal pressure excursion of at least 30% from pre-event baseline, (2) lasting at least 10 s and (3) four sets of secondary criteria described in table 1. An automated algorithm identified PWA drops on the photoplethysmography channel based on drop of at least 30% of baseline PWA, a PWA drop duration of at least 3 s and ≤50 s and a minimum interval between two PWA drops of at least 10 s. To be associated with a respiratory event, PWA drops had to occur at least 5 s after the beginning of the respiratory event and no later than 10 s after its end. Frequencies of respiratory events with type 3 PM recordings were reported as PM-AHI values, that is, the number of apnoea and hypopnoea divided by the TRT. The PWA drop index corresponded to the number of PWA drops divided by the TRT.
Agreement between PM-AHIs and the AHI, between arousal index and PWA drop index and between TST and TRT were assessed using Bland–Altman plots in which the mean value of both measures are plotted on the abscissa and their difference value on the ordinate. The limit of agreement between the two measurements was defined as the mean difference value ±1.96×SD of the individual differences (see online supplementary figure S1). The accuracy of the PM-AHI was also determined for different AHI thresholds (≥5/h, ≥15/h and ≥30/h) and was expressed as sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio (+LR), negative likelihood ratio (−LR), positive post-test probability (number of true-positive divided by all positive results), negative post-test probability (number of false-negative divided by all negative results) and percentage of correctly classified subjects (the number of true-positives and -negatives results divided by all positive and negative results). Receiver operating characteristic (ROC) curves were used to illustrate the discriminatory abilities of the PM-AHI for the same established AHI thresholds. The statistical programmes used were Stata V.11 (StataCorp, College Station, Texas, USA) and Medcalc for Bland–Altman plots (V.18.104.22.168 Medcalc software, Ostend, Belgium).
Table 2 summarises anthropometric characteristics and main PSG data. Included subjects were evenly distributed among OSA severity categories defined by AHI thresholds (<5/h, 5–15/h, 15–30/h and ≥30/h), between age categories (≤60 and >60 years of age) and between genders. A total of 116 subjects (37.18%) were identified as having ‘high pre-test-risk’ of OSA.
We observed a significant but modest correlation between the EEG arousal index determined by PSG and the PWA drop index obtained with type 3 PM devices (r=0.20, p=0.0004). The latter tended to overestimate the arousal index by a mean of 15.61±17.53 events/h. Almost a third of hypopnoeas (median (95% CI): 31.3% (27.8 to 33.9)) scored on the PSG recordings were not associated with a 3% oxygen desaturation. There was also a correlation between TST and TRT (r=0.565, p<0.0001) but TST was overestimated by a mean of 80.74±64.11 min by TRT.
Table 3 summarises the results of the Bland–Altman analysis for the global and ‘high-risk’ populations, respectively (the four detailed Bland–Altman plots are shown on online supplementary table S1 in repository). This table shows that the evaluated PM-AHIs tend to underestimate the PSG-determined AHI (mean difference PM-AHI1<PM-AHI4<PM-AHI3: −1.3±4.8/h< −2.8±7.4/h< −7.6±7.5/h, respectively) except for PM-AHI2 that resulted in an overestimation (mean difference +3.5±5.4/h). Differences between the AHI and PM-AHIs and limits of agreement (1.96×SD) were the lowest with PM-AHI1 (mean difference 1.3±4.8/h for the total population and 1.7±5.2/h for the ‘high-risk’).
The discriminatory abilities of the PM-AHIs were assessed in terms of sensitivity and specificity using ROC curve analysis for AHI thresholds of ≥5/h, ≥15/h and ≥30/h. The area under the curve (AUC) values are summarised in table 4. For all subjects, the best operating characteristics were obtained mostly with PM-AHI1 and increased with higher AHI thresholds.
Tables 5 and 6 summarise the diagnostic accuracy of the PM-AHIs. These tables show that for any given AHI thresholds, inclusion of PWA drops in hypopnoea criteria increased sensitivity, whereas using an oxygen desaturation of 4% increased specificity. In our population sample, prevalence of OSA (pre-test probability) for AHI thresholds of ≥5/h and ≥15/h were 74.4% and 48.7%, respectively, according to the PSG results. In the high-risk population, the prevalence for the same AHI thresholds was 80.2% and 54.3%, respectively. For these AHI thresholds, PM-AHI1 resulted in the best diagnostic accuracy in terms of combined sensitivity (≥90.13%), specificity (≥81.25%), +LR (≥5.24), −LR (≤0.10) and thus correctly classified 93.3% of subjects. For an AHI threshold of ≥30/h, PM-AHI2 was superior to PM-AHI1 regarding combined sensitivity (≥89.19%), specificity (≥95.24%), +LR (≥20.34), −LR (≤0.11) and therefore correctly classified 94.2% subjects. The accuracy of the four PM-AHIs was mildly higher in the ‘high-risk group’ compared with all subjects, probably due to a slightly higher sleep disodered breathing (SDB) prevalence, reducing the risk of false-negative results in this group.
To our knowledge, this is the first study comparing different scoring criteria for the interpretation of PM recordings using traditional and alternative signals such as PWA drops. According to our results, interpretation of PM recordings using hypopnoea criteria, which include 3% desaturation level without PWA drops, showed the best diagnosis accuracy for mild and moderate OSA. Incorporating PWA drops as surrogates for EEG arousals only adds accuracy in detecting severe OSA.
Overall, all tested PM-AHIs showed a high concordance with PSG-AHI and slightly underestimated the number of respiratory events for most of them (except PM-AHI2). This seems to be mainly due to a difference between TRT based on self-reported ‘lights off’ and ‘lights on’ times compared with EEG-measured TST, which was about 80 min shorter. This increase in the denominator of the PM-AHI (ie, the TRT) systematically decreased the severity of PM-AHI values in almost all of the subjects. There is thus a need for a reliable non-EEG-based algorithm that could predict sleep time using PM physiological signals in order to improve PM accuracy.
When PWA drops were included in hypopnoea definitions along with 3% desaturation levels (PM-AHI2), the increase in the numerator (ie, the number of respiratory events) led to a mild overestimation of the index. However, including PWA drops did not improve the number of correctly classified subjects with mild-to-moderate SDB, probably due to the poor correlation between PWA drops and EEG arousals.
Most previous studies in which PM devices were used evaluated ‘high-risk’ populations referred to sleep clinics for suspicion of OSA. For that reason, AASM 2007 clinical guidelines6 recommended the use of type 3 PM devices only in patients with a ‘high pre-test probability’ for OSA. One of the strengths of our study was the inclusion of a sample of subjects drawn from the general population and stratified to obtain a sample evenly distributed in terms of age, gender and OSA severity. This allowed testing the effect of the different scoring criteria in a population with higher and lower pre-test probability of OSA. When we specifically studied a subgroup of subjects identified as ‘high-risk’ for OSA based on clinical symptoms and parameters (ESS20 >10/24 and STOP-BANG21 >2/8) to simulate the population seen in clinical practice, we obtained results comparable to our stratified general population sample. Overall, we found a higher prevalence of sleep disordered breathing in our study sample than previously reported. As discussed in another article,19 we believe that this is due to the use of more sensitive sensors (nasal pressure instead of thermistor and modern oximeters) and new scoring criteria compared with previous cohorts.
Our results show that the strongest correlation between PSG-AHI and type 3 PM was obtained using the PM-AHI1 criteria, which define hypopnoea using 3% oxygen desaturation without PWA drops. According to the ROC curve analysis, PM-AHI1 also showed the best discriminatory abilities for the diagnosis of OSA in almost all AHI categories. In the global population and in the ‘high-risk’ subgroup, PM-AHI1 correctly classified most subjects when AHI thresholds of ≥5/h and ≥15/h were used. These criteria (PM-AHI1) seem to be the best compromise between a high +LR and a low −LR and thus seem to be the most reliable criteria to score PM recordings in a clinical setting. However, we cannot exclude that a more stringent definition of PWA drops (eg, 50% instead of 30% drop) or another surrogate for arousals that shows better correlation to EEG arousals could potentially yield a better agreement between PM and full PSG scoring.
Previous studies tried to use surrogate to EEG arousal to score PM recording. Masa et al22 used breathing amplitude increases to estimate the presence of an arousal but found that it did not substantially increase the agreement between PM and PSG. In our study, PWA drop signal was used as a surrogate for EEG-defined arousals in the PM-AHI2 and PM-AHI4. Overall, the PWA drop index overestimated the EEG arousal index by a mean of 15.6 events/h. This can probably be explained by the important variations in PWA signal occurring during nocturnal wake periods included in PM-based PWA drop index but not in PSG-based arousal index. Moreover, non-respiratory stimulations of the autonomic system such as arousals due to noise, pain or periodic leg movements can also generate PWA drops without an EEG arousal and without a SaO2 drop
No single index derived from PSG recordings have shown to be a clear predictor of incident hypertension,23 but previous studies have suggested that PWA drops are associated with increased blood pressure24 and reflect subtle cortical activation even in the absence of EEG-defined arousals.15 Even though adding PWA drops does not substantially increase the overall accuracy of PM, we cannot exclude that including this signal in the definition of respiratory events such as PM-AHI2 (or PM-AHI4) could be a more reliable predictor of incident cardiovascular and metabolic outcomes than standard AHI. However, only the long-term follow-up of this group of subjects will provide the answer to this question.
Several studies25–32 analysed the correlation between PSG-AHI and AHIs obtained from different PM devices and showed good diagnostic agreements, but the comparison between these studies among them and with ours is challenging because of differences in the sensors used or the settings in which the studies took place (unattended at home or attended in-laboratory, simultaneous or separate recordings). In our study, we tried to limit the influence of these factors through a study design including a single PSG night recording at the subject's home using one single portable recorder device. We cannot exclude, however, that using a PM with a less-sensitive oximeter could yield a lower agreement rate with the PSG recording.33
There are also limitations in our study that need to be considered. First, the HypnoLaus cohort study, from which these subjects were drawn, included only subjects between 40 and 85 years of age. Our results can thus only apply to this age range and cannot be generalised to younger subjects. We believe, however, that the 40- to 85-year-old population is representative of standard patients referred sleep clinic patients for a suspicion of OSA. Second, only a subset of HypnoLaus cohort subjects (312) were re-analysed with the four sets of PM-AHI criteria. Still, since the concordance between the PM-AHI and the AHI was already highly significant, we do not believe that adding more subjects would change our conclusion. Third, our analysis is based on unattended home PSG with the recorder and sensors attached in the sleep laboratory, which may slightly differ from standard type 3 PMs that can be attached in the sleep laboratory or at home. Fourth, we did not perform an event-by-event comparison between PSG and PM. It is thus possible that not the same events were identified with the two scoring methods. Despite these possible differences, the overall clinical classifications showed a good agreement. Last, we excluded subjects using β-blockers as these medications are believed to influence the PWA signal. Consequently, our results cannot be generalised to this population.
Overall this study suggests that, despite differences in the measured parameters, PM can provide a reliable evaluation of OSA severity in most patients. Recent studies further evaluated the place of PM in diagnostic algorithms9 ,34–37 and suggested that PM devices could be used in current practice.
These results show that, compared with home PSG-AHI, PM-derived AHI values using standard criteria (3% desaturation without PWA drops) and self-reported TRT can correctly classify patients with OSA with an accuracy of >93%. Criteria including PWA drops as a surrogate for EEG arousal for the scoring of hypopnoea showed a higher sensitivity only for the identification of severe OSA subjects but do not seem to substantially increase the overall accuracy of PM.
The authors would like to express their gratitude to the Lausanne population who volunteered to participate in the HypnoLaus study to the CoLaus team, to Yannick Goy and Jérôme Toriel for their technical assistance and to the Ligue Pulmonaire Vaudoise for their support.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online figure
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.