Background Measurement of FENO might substitute bronchial provocation for diagnosing asthma. We aimed to investigate the diagnostic accuracy of FENO measurement compared with established reference standard.
Methods Systematic review and diagnostic meta-analysis. Data sources were Medline, Embase and Scopus up to 29 November 2015. Sensitivity and specificity were estimated using a bivariate model. Additionally, summary receiver-operating characteristic curves were estimated.
Results 26 studies with 4518 participants (median 113) were included. Risk of bias was considered low for six of seven items in five studies and for five items in seven studies. The overall sensitivity in the meta-analysis was 0.65 (95% CI 0.58 to 0.72), the overall specificity 0.82 (0.76 to 0.86), the diagnostic OR 9.23 (6.55 to 13.01) and the area under the curve 0.80 (0.77 to 0.85). In meta-regression analyses, higher cut-off values were associated with increasing specificity (OR 1.46 per 10 ppb increase in cut-off) while there was no association with sensitivity. Sensitivities varied significantly within the different FENO devices, but not specificities. Neither prevalence, age, use of bronchoprovocation in >90% of participants or as exclusive reference standard test, nor risk of bias were significantly associated with diagnostic accuracy.
Conclusions There appears to be a fair accuracy of FENO for making the diagnosis of asthma. The overall specificity was higher than sensitivity, which indicates a higher diagnostic potential for ruling in than for ruling out the diagnosis of asthma.
- Exhaled Airway Markers
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is the key question?
Is the measurement of FENO sufficiently accurate compared with established reference standard to diagnose or exclude bronchial asthma?
What is the bottom line?
The overall specificity was higher than sensitivity, which indicates a higher diagnostic potential for ruling in than for ruling out the diagnosis of asthma; thus, FENO measurement might render bronchial provocation partially superfluous.
Why read on?
This systematic review and meta-analysis summarises the methods, risk of bias and findings of all currently available diagnostic studies relevant to the question.
Bronchial asthma is one of the most frequent chronic diseases. It is characterised by a chronic inflammatory process in the respiratory tract mostly concomitant with unspecific bronchial hyper-reactivity as well as reversible airflow obstruction.1 In mild asthma in particular, airway obstruction is often not present during investigation by spirometry, thus leading to diagnostic uncertainty.2 Serial peak-flow measurement or bronchial provocation are recommended in international guidelines for these cases.1 However, the low diagnostic value of peak-flow variability has already been demonstrated.3 ,4 Thus, bronchial provocation for determining bronchial hyper-responsiveness remains a reference standard in case of inconclusive spirometric results.5 ,6 On the other hand, bronchial provocations are time-consuming, cost-intensive, often only available in specific lung function laboratories and also bear the risk of heavy bronchospasm.5
When compared with that, measurement of FENO concentration in exhaled air is a non-invasive method, which is used to estimate inflammatory processes in the lung. Patients with asthma, even in milder stages of the disease, have been shown to exhale FENO in higher concentrations, in correlation with high expression levels of the inducible NO synthase in airway epithelium.7 The major pathophysiological basis is that NO modulates airway hyper-responsiveness8 and is associated with eosinophilic inflammation.9 Thus, the concentration of FENO is regarded as an indirect marker for the extent of airway inflammation.
Corresponding to this, FENO measurement is discussed as an alternative procedure to diagnose or exclude bronchial asthma. This would facilitate asthma diagnosis without referral for bronchial provocation, and, consequently, FENO measurement would be well suited for application especially in primary care. A health technology assessment (HTA) found that the inclusion of FENO measurement into the diagnostic pathway might increase the diagnostic cost-effectiveness.10 However, the overall diagnostic accuracy of FENO for asthma is still unclear. The aim of our systematic review and meta-analysis was to estimate the diagnostic accuracy of FENO measurement in patients suspected to suffer from this disease.
The review was registered in the PROSPERO international prospective register of systematic reviews (CRD42014010810; full protocol provided in online supplementary material). The ideal studies for our review would be prospective studies recruiting consecutive, undiagnosed, mainly (>90%) steroid-naive patients with symptoms suggestive of asthma in a natural setting (eg, patients referred by general practitioners for diagnostic testing of asthma), not restricted to highly specific patient groups (eg, bakers), testing FENO measurements according to current guidelines with well-defined cut-off values against adequate reference standards. However, as such studies were likely to be rare, we used more liberal inclusion criteria for our overall review. To be eligible, studies had to allow the generation of 2×2 tables for asthma diagnosis by FENO compared with a reference standard. The reference standard could be bronchial provocation, measurement of FEV1 with bronchodilation, peak-flow variability, expert's opinion or a combination of these. Study participants were patients with suspected asthma, and at least 75% had to be steroid naive. Studies on patients with prediagnosed asthma or in populations in which >25% already had undergone a trial of inhaled corticosteroids were excluded as we expected too many a priori diagnosed patients in these studies (selection bias). Measurement of FENO had to be done using a mean exhalation flow rate of 50 mL/s and instantaneous flows within the range of 45–55 mL/s according to international guidelines.11 Studies not explicitly stating the exhalation flow rate were included if the technical properties of the used instruments assured that a result could only be gained with a flow rate within the stated range.
A systematic literature search was executed over the databases Medline, Embase and Scopus (last update search 29 November 2015). Furthermore, reference lists of identified papers and systematic reviews were searched for eligible papers. For searches in Medline and Embase, the interface supplied by Ovid (Wolters Kluwer) was applied. The main Medline search from 30 September 2014 is shown in the online supplementary table S1. Search terms for asthma or asthma indication were combined with terms for the index test. For Scopus, the offered functionality generating secondary documents based on the primary list of sources was additionally applied. Methodological search filters to more specifically identify diagnostic accuracy studies were not used.12
After de-duplication, two reviewers independently screened titles and abstracts of search hits and excluded all studies that clearly did not fit the study question. The remaining references were retrieved in full text and examined for eligibility according to the selection criteria described above. The following relevant information was extracted from included studies by two reviewers independently using a pre-tested form: setting, asthma prevalence, study type and recruitment strategy, number of study centres, type, brand and model of NO analyses, adherence to measurement guidelines; number of participants, age, symptoms raising the suspicion of asthma, inclusion of smokers or patients with recent respiratory tract infection; cut-off levels; prevalence of asthma; and reported findings. If the numbers of true positive, true negative, false positive and true negative subjects were reported, these values were used. Whenever possible, we used data for the cut-off point with the highest sum of sensitivity and specificity (ie, the Youden index); for two studies, we had to use data for a cut-off chosen by the authors referring to previous studies, and for three the reasons choosing the cut-off presented were not clear. Otherwise, 2×2 tables were generated from the reported diagnostic accuracy parameter, that is, sensitivity and specificity, prevalence or predictive values. All studies included were assessed for quality using the Quadas-2 tool13 (seven items).
Based on the extracted 2×2 tables (a=true positive, b=false positive, c=false negative, d=true negative), sensitivity (a/(a+c)), specificity (d/(b+d)) and diagnostic odds ratio (DOR) (a/b/c/d) with corresponding 95% CIs were calculated for the reported cut-off of each study. For further statistical analysis, the bivariate random-effects model by Reitsma et al14 for meta-analysis of diagnostic accuracy studies was applied using the original form of normal–normal approximation.15 ,16 Sensitivity and specificity were pooled using the ‘reitsma’ function of the R package ‘mada’ (R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2015. http://www.R-project.org/; accessed 29 March 2016). Additionally, summary receiver-operator characteristic (sROC) curves were estimated. To this end, both the Rutter and Gatsonis approach16 and the Rücker and Schumacher approach17 were applied. The latter is based on the assumption that the investigators of the primary studies have chosen the optimal cut-off for each study based on maximum Youden index (sensitivity+specificity −1). The curve adjusts for the resulting cut-off optimisation. Calculation of pooled DORs was based on the random-effects model with variance estimation according to DerSimonian-Laird. The heterogeneity of the DOR was measured by Cochran's Q and Higgins's I2.18 To examine small study effects as an indicator of publication bias, a funnel plot according to Deeks et al19 was generated. We performed an additional best evidence analysis limited to ‘key studies’ meeting the criteria listed above for ideal studies. In meta-regression analyses, we investigated the influence of asthma prevalence, type (chemoluminescence or electrochemical) and brand (Niox Mino, Niox Flex or other) of the FENO measurement device, cut-off values, age group, whether or not >90% of all study participants underwent bronchial provocation with a biochemical agent, whether or not a positive bronchial provocation test was the only mean to diagnose asthma and risk of bias. Variables for meta-regression analyses were partly chosen after qualitative review of the available studies.
The literature search identified a total of 6070 references of which 287 were retrieved in full text (figure 1). In total, 234 of these full texts were clearly irrelevant while further 53 were excluded after closer scrutiny (see online supplementary table S2). Also, 26 studies with 4518 participants (median 113, range 30–923) were included in the review (see characteristics of studies in table 1 and additional information on participants in online supplementary table S3).20–45 Two studies only presented results for participants divided into subgroups (never smokers, ex-smoker and current smoker in one study and depending on the specific reference standard used in another); therefore, the 26 studies provided 29 contingency tables for data analysis. Studies were published between 2003 and 2015 and originated from 16 different countries. Apart from one case–control study (nested in a cohort), all were cohort studies. Three studies exclusively included children, all other studies included adults or predominantly adults (one study without information on age). Only a few studies were performed in patient groups with distinct symptoms, like cough variant asthma or wheezing children (see online supplementary table S3). In most studies, participants had been referred to the specialised centres performing the study, but details of the referral process were not always described in detail. The reference standard used varied considerably; bronchial provocation and combinations of different procedures were used frequently. The asthma prevalence determined varied between 9% and 80% (median 39%; 25th and 75th percentile 32% and 51%). In 17 studies, all or >90% of participants underwent a bronchial provocation test. In seven of these studies, the diagnosis of asthma was exclusively based on this test while all other studies used a combination of tests or stepped approaches. Cut-off levels used for FENO for determining the diagnosis of asthma varied between 10.5 and 64 ppb (median 30 ppb; 25th and 75th percentile 20 and 40 ppb). The most frequently used devices for measuring FENO were Niox Mino (12 studies) and Niox Flex (5 studies). Nine studies met the criteria of ‘key studies’; the most frequent reasons for excluding other studies from this group were doubts that studies were naturalistic and recruitment strictly consecutive avoiding exclusions of relevant patient groups.
Study quality was variable but no study was considered to have low risk of bias in all seven items assessed (see online supplementary table S4 for detailed assessments of all studies). This was mainly due to the fact that the cut-off level for FENO measurement was defined post hoc in almost all studies (which can lead to overestimation of diagnostic accuracy). Many studies were rated as having unclear risk of bias for items V (Could the reference standard, its conduct or its interpretation have introduced bias?) and VII (Could the patient flow have introduced bias?) due to insufficient reporting. Problems with other items were minor. Overall risk of bias was considered low for six of seven items in five studies, for five items in seven studies and for four items in eight studies. For the remaining six studies, the risk of bias was considered low in three or fewer items.
In individual studies, calculated sensitivity varied between 0.16 and 0.94, and specificity between 0.31 and 0.98 (see figure 2A, B). Pooled sensitivity was 0.65 (95% CI 0.58 to 0.72), pooled specificity was 0.82 (0.76 to 0.86) and the pooled DOR was 9.23 (6.55 to 13.01). Heterogeneity of the DOR was considerable (Q=112, df=28 (p<0.0001), I2=75% (64%, 83%)). The area under the curve according to Rutter–Gatsonis was 0.80 (0.77 to 0.85). Results are shown in an ROC scatterplot in figure 3. All included studies are plotted individually as well as the pooled estimate from the bivariate model including the corresponding 95% confidence area and the 95% prediction area, where 95% of future studies are expected. Furthermore, sROC curves according to Rutter and Gatsonis as well as Rücker and Schumacher are shown. The funnel plot over the included studies (see online supplementary figure S1) did not suggest relevant publication bias (p=0.71) for the Egger test modified according to Deeks et al.19 In the nine key studies, the pooled sensitivity was 0.68 (0.53 to 0.79), specificity 0.83 (0.74 to 0.89), the pooled DOR 10.21 (4.78 to 21.82) and the area under the curve 0.83 (0.75 to 0.92) (see online supplementary figure S2); heterogeneity remained considerable (Q=51, df=8 (p<0.0001), I2=84% (72%, 91%)).
In meta-regression analyses, higher cut-off values were associated with increasing specificity (OR 1.46 per 10 ppb increase in cut-off) while there was no association with sensitivity (see online supplementary table S5). Neither prevalence, age, use of bronchial provocation in >90% of participants or as an exclusive reference standard test, nor risk of bias were significantly associated with diagnostic accuracy. However, sensitivities varied within the different FENO devices, with Niox devices showing lower sensitivities compared with the other chemoluminescence devices (p<0.01). The resulting sensitivities and specificities of the different FENO devices are given in table 2. Multivariable regression analysis with prevalence, choice of cut-off points and FENO devices as independent variables provided similar results.
In the present systematic review, we found a fair diagnostic accuracy of FENO in general for discrimination of asthma in patients suspected to suffer from asthma. We received a good summary area under the curve. Specificities were mostly higher than sensitivities, and the observed cut-off points were predominantly in the range reported in guidelines regarding FENO interpretation.46
Because the bivariate model is based on pairs of sensitivity and specificity, not directly on the cut-offs, it does not allow to identify ideal cut-off points for medical decision-making. However, the results of our meta-analysis allow a better estimation of the diagnostic usefulness of FENO measurement. Values of specificity were observed to be often superior to those of sensitivity. This suggests that FENO measurement is more suitable for ruling in than for ruling out the disease. Fitting to this, if the summary sensitivity and specificity values found in the present meta-analysis are applied to the median asthma prevalence over all included studies (0.39), the resulting positive predictive value (PPV) is 0.70. Such a PPV is well comparable to those of established bronchial provocation testing procedures.5 ,47 The mediocre specificity of bronchial provocation might result because of postinfectious bronchial hyper-responsiveness, gastro-oesophageal reflux disease, allergic rhinitis and many more, thus leading to false positive diagnoses.5 It can therefore be assumed that FENO measurement might render bronchial provocation partially superfluous. In this context, for the guidance of therapy the prediction of steroid responsiveness might be more relevant than solely making the diagnosis of asthma. It has been shown that FENO measurement has some prognostic value for assessing steroid responsiveness.40 ,48 Fitting to this, the American Thoracic Society (ATS) guideline recommends that FENO values >50 ppb can be used to indicate eosinophilic inflammation and that, in symptomatic patients, responsiveness to corticosteroids is likely.46 Originally, we also wanted to evaluate whether FENO has in particular an added value in subgroups of patients, for instance, with wheezing of chronic cough. However, we identified only a few studies that did not allow pooling of data for further subgroup analyses.
The reported FENO cut-off values were generally at the lower end of the range defined as ‘intermediate’ by the ATS guideline46 for adults (25–50 ppb) or children (20–35 ppb). Even considering the typically lognormal distribution of FENO values, the lower threshold of this intermediate range might be too high for a reliable exclusion of asthma. This view is supported by findings that suggested lower cut-off values ranging between 9 and 16 ppb for exclusion of asthma.37 ,49 This lower performance for ruling out asthma might be explained by a weakness of FENO regarding non-eosinophilic inflammation.37 ,50 However, recent studies suggest that FENO is more representative of a Th2-driven local inflammation, specifically of the bronchial mucosa, rather than general eosinophilic inflammation.51–53 This might explain why exclusion of asthma could be still possible, but only at low FENO values, which might be around 9–16 ppb. However, the difficulty to rule out non-eosinophilic asthma should be kept in mind when the presence of asthma is suggested by symptoms and clinical history but FENO is low. Very low FENO values might indicate the absence of airway inflammatory processes, with a negative predictive value of around 80%, provided this fits into the patient's medical history.54
In clinical practice consequently, the diagnostic pathway could start with FENO measurement at first. Bronchial provocation would be superfluous when FENO exceeds a distinct cut-off value, which needs to provide a meaningful PPV. Referral for bronchial provocation would be necessary when FENO values are lower. A positive response during bronchial provocation would help to rule in the diagnosis of asthma in patients with low FENO values, and a negative bronchial provocation response would rule out asthma.5 ,47 In the light of previous research, it remains unclear to which extent FENO measurement is more promising than bronchial provocation to guide therapy with inhaled corticosteroids. A recent study suggests that also some patients with negative bronchial provocation response might benefit from inhaled corticoid steroids (ICS) when FENO>33 ppb because FENO measurement turned out to be more useful for predicting ICS responsiveness than diagnosing asthma.55 However, there might be some overestimation of the predictive value as the authors state that the chosen response criteria were likely to be ‘oversensitive’. Therefore, further research has to compare the diagnostic usefulness of bronchial provocation and FENO measurement, and to evaluate the informative value of their combination, ideally in long-term studies.
Regression analyses showed that the diagnostic accuracy in terms of sensitivities differed within the different FENO devices. It might be speculated if the lower sensitivities of Niox instruments compared with the other chemoluminescence devices are explainable by the different clinical settings of the diagnostic studies. However, we found no relationship between prevalence, cut-off points and FENO devices in the multivariable regression analysis, which might be, on the other hand, explainable by the low number of studies. However, this has rather low relevance as specificities were shown to be of higher importance. Altogether, the different devices showed comparatively high specificities, which increased in meta-regression analysis with increasing cut-off points. This once again indicates that FENO measurement is more suitable for ruling in than ruling out the disease.
Our systematic review and meta-analysis provides a comprehensive summary of the currently available studies investigating the diagnostic accuracy of FENO in patients with symptoms suggestive of asthma. It is based on a protocol predefining methods and was performed following the recommendations of the Cochrane Diagnostic Test Accuracy Working Group.56 Our study selection differs to some extent from two similar recent reviews.10 ,57 The careful systematic review performed as a part of a Diagnostic Assessment Report commissioned by the National Institute for Health Research (NIHR) HTA Programme on behalf of the National Institute for Health and Care Excellence (NICE) by Harnan et al10 included 24 studies, of which we excluded 6, mainly because of highly selected patient populations (eg, only patients with negative metacholine challenge), investigating prediction of steroid responsiveness rather than diagnosis of asthma, and insufficient data reported for meta-analysis (see online supplementary table S1 for details). Seven of eight additional studies included by us were either published only recently or not identified; one study32 was excluded. It should be noted that the HTA report focuses less on estimating diagnostic accuracy per se but focused on processes compatible with usual clinical pathways in the UK. The authors of the report decided to not conduct meta-analysis because of the strong clinical heterogeneity of the studies. While we think that the pooled estimates in our meta-analysis must be interpreted with considerable caution due to clinical and statistical heterogeneity, we consider meta-analysis desirable and justifiable for several reasons. First, some of the heterogeneity of the studies included by Harnan et al is due to their in some aspects wider inclusion criteria. Second, the narrative review leaves the reader without an intuitive idea about the overall findings available. Third, our approach did allow us to investigate potential reasons for heterogeneity between study findings empirically by meta-regression. Fourth, the very restrictive approach using solely key studies showed similar results as the main analysis. However, heterogeneity remained high also in the key study set.
The recent systematic review by Li et al57 actually also includes meta-analysis and meta-regression analyses. However, this review might have important shortcomings. In total, 6 of the 19 studies analysed were excluded by us due to highly selected populations, unusual techniques for FENO measurement or because we were unable to construct a plausible and consistent contingency table from the data presented. No less than 12 studies included by us were not mentioned. While one study published in 2015 might not have been available to the authors, the reason for this large discrepancy remains unclear as the selection process is not transparent. Pooled sensitivity in this review was higher than in our review (0.78 vs 0.65), specificity slightly lower (0.74 vs 0.82) and the DOR slightly higher (11.37 vs 9.23).
The significant association between increasing cut-off points and increasing specificities in meta-regression analysis is plausible and in line with the clinical decision rule, which allows ruling in asthma with higher certainty when higher FENO values are given.46 ,54 We are uncertain whether the differences according to type and brand of FENO measurement device are a valid or a chance finding. On the other hand, given the limited number of studies and the multiplicity of potential influencing variables, existing differences according to other factors might have been missed. Furthermore, factors like infection, smoking habits and allergy that might modify results of FENO testing58 are difficult to investigate without access to individual patient data.
It should also be taken into account that the included studies mostly defined the optimal cut-off levels based on the observations made in the particular patient populations. This probably led to an overestimation of the diagnostic accuracy per study and consequently in the meta-analysis. We used the alternative approach according to Rücker and Schumacher17 for calculating the sROC curve that implies a correction for this problem. The resulting values were well within the confidence area of the classic calculatory approach by Rutter and Gatsonis.16
In conclusion, the systematic review and meta-analysis showed promising test indices of FENO measurement with good specificity, DORs and ROC area under the curve. The high specificity indicates a diagnostic potential for ruling in asthma, notwithstanding sensitivity and prevalence of disease are also important. However, sensitivity is comparatively low, suggesting that ruling out might be rather difficult with FENO measurement. Despite the impossibility to provide definite cut-points with optimal sensitivities and specificities, our results point towards the necessity of a confirmatory diagnostic study. With this respect, a diagnostic study in terms of a triage test in a fully paired study design with an a priori defined cut-off value would be necessary to evaluate the potential for partial replacement of bronchial provocation.59 It should be kept in mind that higher cut-off values seem to be more suitable for ruling in asthma. This means that the ideal a priori cut-off point should not be determined solely on the basis of the highest sum of sensitivity and specificity, but rather under consideration of an optimal PPV, which might correspond with higher specificity but lower sensitivity. Studies have shown a PPV > 70% when a cut-off >45 ppb is chosen.2 The ATS guideline46 suggests FENO>50 ppb to detect ICS responsiveness by referring to the study of Smith et al.40 Therefore, a cut-point around 50 ppb might guarantee for a sufficient PPV for ruling in asthma and to determine ICS responsiveness at the same time, with subgroup analysis to keep possibly lower values in mind. Future diagnostic accuracy studies should optimally report their full ROC curve numerically to allow a more in-depth use in meta-analyses.
Contributors AS conceived the study and wrote the review protocol. GR, JK, RAJ, SK and KL contributed to the protocol development. SK, MK-V and KL performed the literature search, selection, data extraction quality assessment and data entry. HS and GR planned and performed statistical analyses. AS, KL and SK drafted the manuscript. All authors contributed to data interpretation and critical revision of the draft manuscript. AS is the guarantor.
Funding The project was supported by the German Federal Ministry of Education and Research (BMBF FKZ 01KG1211).
Competing interests SK declares that he uses a NIOX Vero testing device provided by Aerocrine without charge within a study outside the submitted work.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The complete data set for the analyses performed is available as an MS Excel file in the online supplemental material.