Background Clinical roles of QuantiFERON-TB Gold (QFT-G)/Gold in-Tube (QFT-G-IT) and T-SPOT.TB in tuberculosis require clarification.
Methods MEDLINE and EMBASE were searched for relevant English papers. Summary estimates of likelihood ratios (LR) of QFT-G/QFT-G-IT and T-SPOT.TB for latent tuberculosis infection (LTBI) and tuberculosis disease in adults were obtained by bivariate and univariate random effects meta-analyses after assessing heterogeneity. Probable ranges of prevalence for LTBI and tuberculosis disease were estimated. Critical values of positive LR (PLR) and negative LR (NLR) corresponding to a 90% certainty threshold were calculated over probable prevalence ranges. It was considered reliable to rule in when the best estimated PLR exceeds the corresponding critical value and to rule out when the best estimated NLR is less than the corresponding critical value.
Results 35 studies involving predominantly immunocompetent adults were identified. Based on bivariate meta-analysis, PLR (95% CI) for LTBI were 7.9 (3.6 to 17.3) for T-SPOT.TB and 48.1 (19.7 to 117.6) and 10.8 (5.3 to 21.8) for QFT-G/QFT-G-IT based on Japanese and other studies, respectively. Corresponding NLR (95% CI) were 0.10 (0.06 to 0.18), 0.11 (0.07 to 0.18) and 0.23 (0.16 to 0.32). PLR (95% CI) for tuberculosis disease were 3.6 (2.3 to 5.6) for QFT-G, 2.1 (1.1 to 4.0) for QFT-G-IT and 4.7 (2.4 to 9.1) and 2.3 (1.3 to 4.0) for T-SPOT.TB based on studies with mean or median age >47. 1 years and ≤47.1 years, respectively. Corresponding NLR (95% CI) were 0.18 (0.12 to 0.27), 0.38 (0.22 to 0.68), 0.11 (0.06 to 0.20) and 0.20 (0.10 to 0.40). Estimated prevalence ranges were 10–55% for LTBI and 40–60% for tuberculosis disease.
Conclusions At a 90% certainty threshold, LTBI is best diagnosed by QFT-G/QFT-G-IT and excluded by T-SPOT.TB or QFT-G/QFT-G-IT; none can diagnose tuberculosis disease, whereas. T-SPOT.TB can exclude tuberculosis disease among middle-aged and older patients.
- tuberculin test
Statistics from Altmetric.com
Tuberculosis (TB) is an ancient disease that has re-emerged as a major public health concern. While rapid diagnosis and treatment of patients with infectious TB remain the cornerstone in TB control,1 targeted screening and treatment of high-risk subjects with latent TB infection (LTBI) has been recognised as an important control measure, especially in low-burden countries.2 3 Unlike many viral diseases, the diagnosis of TB disease still relies heavily on the isolation of Mycobacterium tuberculosis complex in culture, which is often achievable in only 60% of cases.4 The tuberculin skin test (TST), which has been the gold standard for the diagnosis of LTBI until recent years, is cross-reactive to Bacille Calmette Guérin (BCG) and many non-tuberculous mycobacteria. While a high cut-off value such as 15 mm may enhance the specificity of the TST at the expense of sensitivity, the latter remains suboptimal regardless of the cut-off values among infants, young children, elderly subjects, severely malnourished subjects and those who are immunocompromised.2 5 6
With advances in mycobacteriology, M tuberculosis-specific region of difference 1 antigens such as culture filtrate protein 10 and early secretory antigen target 6 were discovered. Absent in BCG and most environmental mycobacteria, these antigens form the basis for interferon-gamma release assays (IGRA), which assess the presence of TB infection by detecting the in vitro release of interferon-gamma upon stimulation from previously sensitised T cells. Commercially available IGRA formats include QuantiFERON-TB Gold (QFT-G), QuantiFERON-TB Gold in-Tube (QFT-G-IT) (Cellestis Ltd, Carnegie, Victoria, Australia) and the T-SPOT.TB test (Oxford Immunotec, Oxford, UK).
The clinical roles of QFT-G, QFT-G-IT and T-SPOT.TB in TB require clarification. A number of systematic reviews have examined the test characteristics of QFT-G, QFT-G-IT and T-SPOT.TB with a focus on pooled estimates of sensitivity and specificity rather than the actual predictive value of a positive or negative test.7–9 The positive predictive value (PPV) or negative predictive value (NPV), defined as respective proportions of true positive results among test-positive subjects and true negative results among test-negative subjects, are highly dependent on prevalence and can be estimated only in appropriate cross-sectional studies. As such, predictive values cannot be effectively combined across different settings in meta-analyses. Summary estimates of sensitivity and specificity obtained by separate pooling cannot be used for estimating predictive values owing to lack of consideration for variations between studies. This is possible with likelihood ratios (LR) that incorporate both sensitivity and specificity of the same study. The positive likelihood ratio (PLR) tells how much the odds of a condition are increased by a positive test, while the negative likelihood ratio (NLR) tells how much they are decreased by a negative test. The post-test odds (the odds of a condition after applying the test) equal LR multiplied by the pre-test odds (the odds of a condition before applying the test). For example, a PLR of 10 gives post-test odds that are 10 times the pre-test odds, whereas an NLR of 0.1 gives post-test odds that are 0.1 times the pre-test odds. It has been proposed that LR should not be pooled directly in systematic reviews.10 The hierarchical SROC model11 and the bivariate random effects model12 have been proposed to tackle the above problem.
An evaluation of the clinical roles of IGRA for TB would be incomplete without considering both LTBI and active TB disease. A distinction must be made between these two conditions as active TB disease develops among only one-tenth of immunocompetent subjects infected with the tubercle bacillus after an incubation period that ranges from a few weeks to a few decades.2 The LTBI state is completely free of clinical manifestations, and there is no golden standard for its diagnosis. Bacteriologically-confirmed cases of TB are often used as surrogates for LTBI in the assessment of sensitivity of a diagnostic test for LTBI. A very low prevalence of LTBI is expected among unexposed subjects in an area with a low incidence of active TB disease (and hence low background transmission risk). Healthy controls without contact exposure or other risk factors for TB are therefore used as surrogates for the absence of LTBI to assess the specificity of such a test. For targeted screening of LTBI, the focus is to rule in LTBI with reasonable certainty among subjects at a high risk of progressing to active disease. IGRA is sometimes used as an adjunctive test for the diagnosis of active TB disease because infection must precede active disease. To assess the sensitivity and specificity of a test for active TB disease among patients with clinical manifestations, confirmed cases of active TB disease (by bacteriology, histology and/or appropriate response to treatment) are used to estimate sensitivity, whereas cases with unconfirmed or alternative diagnoses are used to estimate specificity. For TB disease a delay in treatment may be complicated by considerable morbidity or mortality. A diagnostic test with high NPV will therefore be required to rule out the condition.
The primary objective of the current review is to clarify the clinical roles of IGRA in LTBI and TB disease by focusing on summary measures of LR generated by bivariate random effects models. The secondary objective is to examine the feasibility of arriving at similar conclusions by pooling LR using univariate random effects models.
A literature search was performed through the OvidSP platform to browse MEDLINE, EMBASE and other non-indexed citations to 11 July 2009 for non-review non-animal English papers by combining (using the Boolean operator “and”) articles identified by three search phrases containing Medical Subject Headings or key words in titles or abstracts for: (1) interferon-gamma or QuantiFERON or ELISPOT or TSPOT.TB; (2) tuberculosis; and (3) cut-off values or sensitivity coexisting with specificity. The literature search was supplemented by relevant studies from a recent systematic review.9 Only studies with concurrent data on sensitivity and specificity of QFT-G, QFT-G-IT, T-SPOT.TB and pre-commercial versions of these IGRA using similar criteria for interpretations were included. All studies examined the sensitivity of IGRA among subjects with TB disease. Studies with fewer than five subjects for evaluating either sensitivity or specificity were excluded. Indeterminate results and data on IGRA applied in body fluids other than blood were excluded from analysis. One reviewer abstracted data and the other double-checked the data.
Before meta-analysis, sources of heterogeneity within data grouped by QFT-G/QFT-GIT, T-SPOT.TB and TST at respective cut-off values were examined by unweighted meta-regression analysis using the Moses–Shapiro–Littenberg method, which regressed the log diagnostic OR against a measure of diagnostic threshold, to identify heterogeneous subgroups.13–16 Significant heterogeneity for a covariate was considered present when p≤0.05. The following covariates were considered: country of origin (for countries contributing at least two sets of data), estimated TB incidence, proportion of culture-proven cases, proportion of co-morbidity (among cases only when controls were healthy low-risk subjects), proportion of HIV infection (among cases only when controls were healthy low-risk subjects), mean or median age (separately for cases and controls when controls were healthy low-risk subjects), proportion of men (separately for cases and controls when controls were healthy low-risk subjects) and, if applicable, QFT-G versus QFT-G-IT or TST with cut-off at 10 mm versus 15 mm. Except for the country of origin, all covariates were rated as either above the median value or not.
Summary estimates of PLR and NLR were generated by two different methods of meta-analysis in the presence of at least four sets of data grouped by QFT-G/QFT-G-IT, T-SPOT.TB and TST at specified cut-off values: (1) the bivariate random effects model using SAS proc mixed procedure12; and (2) pooling by a univariate random effects model using the DerSimonian–Laird method in the absence of significant threshold effect,17 which was assessed by the Spearman correlation coefficient between sensitivity and specificity and denoted as significant by p values ≤0.05.
The most probable range of pre-test odds was determined by reference to a recent systematic review18 and a few early studies19–21 for LTBI, and from relevant studies included in the current review for TB disease. Critical values of PLR (PLRcrit) and NLR (NLRcrit) corresponding to PPV and NPV of 90%, respectively, could be calculated over the probable range of prevalence by the following equations: (1) pre-test odds=prevalence/(1–prevalence); (2) PPV=PLRcrit × pre-test odds/(1+PLRcrit × pre-test odds), and (3) NPV=1–(NLRcrit × pre-test odds/(1+NLRcrit × pre-test odds)). At a 90% certainty threshold, it was considered reliable to rule in when the best-estimated PLR exceeds the critical value and to rule out when the best-estimated NLR is less than the corresponding critical value.
Funnel plot asymmetry was examined by a regression of the natural log diagnostic odds ratio against standard error22 separately for QFT-G/QFT-G-IT, T-SPOT.TB and TST in the respective context of LTBI and TB disease. Significant asymmetry is denoted by p≤0.05.
A total of 218 articles were identified after adding two articles identified only by a recent systematic review9 but not by the literature search through the OvidSP platform. A total of 35 adult studies (see tables E1 and E2 in online supplement) were included in the current review after excluding 183 articles for the following reasons: irrelevant (n=95), no data on both sensitivity and specificity (n=46), no data on specificity (n=24), no data on sensitivity (n=14), no data for sensitivity for one test plus no data for specificity for another test (n=1) and paediatric study (n=3). The majority involved subjects that were predominantly BCG-vaccinated. Among 24 studies involving non-TB patients as controls (E1, E4, E5, E13, E16, E19–E37), the proportion of HIV infection exceeded 20% among either cases or controls in three (13%) studies (E21, E24, E28).
Adult studies on LTBI
Among adult studies using healthy low-risk subjects as controls, the following data were available for different tests: 16 sets of data for QFT-G (E1–E12)/QFT-G-IT (E3, E6, E13, E14), six sets for T-SPOT.TB (E7, E8, E11, E12, E15, E16), four sets for TST with cut-off at 5 mm (E4, E8, E11, E12), seven sets for TST with cut-off at 10 mm (E5, E8, E10, E11) or 15 mm (E8, E10, E11). Table E1 summarises major findings of included adult studies using healthy low-risk subjects as controls. Unweighted meta-regression analysis showed no significant heterogeneity except for Japanese versus other studies within data grouped by QFT-G/QFT-G-IT.
Adult studies on TB disease
Among adult studies using non-TB patients as controls, the following data were available for different tests: 16 sets of data for QFT-G (E1, E4, E5, E19–E26)/QFT-G-IT (E13, E27–E30), 12 sets for T-SPOT.TB (E16, E20, E21, E27, E31–E37) and six sets for TST using 10 mm as the cut-off (E5, E26, E34, E35, E37). Table E2 in the online supplement summarises the major findings of included adult studies using non-TB patients as controls. Data on TST were obtained from predominantly BCG-vaccinated subjects. Non-TB diseases that might mimic TB disease included tumours, bronchiectasis, congestive heart failure, sarcoidosis and infection due to non-tuberculous mycobacteria and other pathogens. Unweighted meta-regression analysis showed no significant heterogeneity except for QFT-G versus QFT-G-IT within data grouped by QFT-G/QFT-G-IT, and mean or median age >47.1 years versus ≤47.1 years within data grouped by T-SPOT.TB.
Summary estimates from meta-analysis
Tables 1 and 2 show summary estimates of test characteristics of IGRA and TST based on adult studies using healthy low-risk subjects and non-TB patients as controls, respectively. Both methods of meta-analysis gave similar best-estimated LR. Tables E3 and E4 in the online supplement compare different diagnostic tools by sensitivity, specificity and diagnostic odds ratios using the bivariate random effects model.
Probable range of pre-test odds
Based on a recent systematic review18 which showed that the prevalence of LTBI controlled for overdispersion among close contacts during TB contact investigation in low- and middle-income countries was 51.4% (95% CI 50.6% to 52.2%) and a few early studies,19–21 it was assumed that the prevalence of LTBI in the context of TB contact investigation probably ranged from 10% to 55%. The corresponding range of pre-test odds would be 1/9 to 11/9. Based on all studies (n=24) used in the current review (E1, E4, E5, E13, E16, E19–E37), the pooled estimate of prevalence of TB disease corrected for over-dispersion was 49% (95% CI 43% to 56%). Thus, it was assumed that the prevalence of TB disease among patients with clinical manifestations probably ranged from 40% to 60%. The corresponding range of pre-test odds would be 2/3 to 3/2.
Tests with ≥90% certainty for LTBI and TB disease
Tables 3 and 4 show that, under the most probable range of pre-test odds, LTBI among largely immunocompetent adults is best ruled in by QFT-G/QFT-G-IT (based on Japanese studies) and ruled out by T-SPOT.TB or QFT-G/ QFT-G-IT (based on Japanese studies) at a 90% certainty threshold.
Over the probable prevalence range for TB disease (40–60%), critical values of PLR corresponding to PPV of at least 90% would lie in the range of 6.0–13.5. Table 2 shows that, under the probable prevalence range, neither IGRA evaluated (QFT-G, QFT-G-IT, T-SPOT.TB) nor TST (cut-off at 10 mm) is suitable for ruling in TB diseases at a 90% certainty threshold. On the other hand, T-SPOT.TB can rule out TB disease among immunocompetent middle-aged and older adults at a 90% certainty threshold (table 4). Table E4 in the online supplement shows that better NPV of T-SPOT.TB for TB disease among older patients with clinical manifestations can be attributed to higher specificity rather than higher sensitivity.
Funnel plot asymmetry
Among studies using healthy low-risk controls, regression analysis showed significant funnel plot asymmetry for studies of QFT-G/QFT-G-IT among non-Japanese studies, but not for studies of QFT-G/QFT-G-IT among Japanese studies, studies of T-SPOT.TB and those of TST with cut-off at 5 mm, and at 10 mm or 15 mm.
Among studies using non-TB patients as controls, regression analysis showed no significant funnel plot asymmetry for studies of QFT-G/QFT-G-IT after controlling for QFT-G versus QFT-G-IT, studies of T-SPOT.TB after controlling for age groups, and those of TST with cut-off at 10 mm.
Besides updating the test performance of IGRA with recent publications, the current review is unique by its focus on summary estimates of LR, rather than those of sensitivity and specificity, over the most probable range of pre-test odds for LTBI and TB disease. The current review has further clarified the clinical roles of IGRA and demonstrated no significant difference between univariate random effects models using the DerSimonian–Laird method and bivariate random effects models as far as best-estimated values of LR are concerned.
Significant heterogeneity between Japanese and non-Japanese studies
The current review demonstrated significant heterogeneity in the test performance of QFT-G/QFT-G-IT for LTBI between Japanese and non-Japanese adult studies. Unweighted meta-regression analysis failed to identify any other independent confounding factor; this is corroborated by a comparison of estimates of sensitivity and specificity using the bivariate random effects model (see table E3 in online supplement). The difference might be partly explained by publication or other forms of selection bias, as significant asymmetry was found in the funnel plot for non-Japanese studies but not Japanese studies. Other sources of variations might include measurement errors, interpretation bias and genetic differences. Further investigations are warranted to explore the reason(s) underlying the observed heterogeneity.
Method of meta-analysis
It has been proposed that LR should not be directly pooled in systematic reviews as pooling may result in values of sensitivity and specificity of >1 or <0.10 Notwithstanding often wider CI for the bivariate random effects model, the current review showed that best-estimated LR obtained by a univariate random effects model using the DerSimonian–Laird method closely approximated those obtained by the bivariate random effects model. This implies that diagnostic test evaluation by meta-analysis with clinical focus on best-estimated LR might be accomplished by simpler pooling methods. However, a more thorough comparison of sensitivity, specificity and diagnostic odds ratio between different assays would probably require bivariate meta-analysis using software such as SAS proc mix procedure (see tables E3 and E4 in online supplement).
There are a number of limitations. First, although the literature search among English papers may have been reasonably thorough, exclusion of non-English papers from the current review could have introduced publication bias. Second, the quality of the current review is inevitably affected by intrinsic errors arising from the use of TB disease as a surrogate marker for LTBI and the assumption of no LTBI among healthy low-risk subjects. Errors due to inclusion of false TB cases might be modest as most cases of TB had been confirmed by acid-fast bacilli smear, culture, polymerase chain reaction for M tuberculosis or histopathology. Third, the use of median values in meta-regression analysis could have reduced the statistical power in identifying heterogeneity. Fourth, differences between QFT-G and QFT-G-IT for LTBI could have been missed owing to the small number of studies. Lastly, the predominance of immunocompetent subjects in the current review and the exclusion of indeterminate results from analysis may render findings of the current review less applicable to severely immunocompromised subjects. As IGRA depend on host immunity, indeterminate and false-negative results of IGRA will inevitably increase among hosts with impaired immunity, notably HIV-infected subjects and those with low CD4 count, especially <200 cells/μl.24–28 Indeterminate results may be less frequent for T-SPOT.TB than QFT-G as a result of the choice of cut-off value.24 29
In conclusion, at a 90% certainty threshold, LTBI is best diagnosed by QFT-G/QFT-G-IT (based on Japanese studies) and excluded by T-SPOT.TB or QFT-G/QFT-G-IT (based on Japanese studies); none can diagnose TB disease, whereas T-SPOT.TB can exclude TB disease among middle-aged and older patients.
Review history and Supplementary material
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.