Background Lack of a gold standard for latent TB infection has precluded direct measurement of test characteristics of the tuberculin skin test and interferon-γ release assays (QuantiFERON Gold In-Tube and T-SPOT.TB).
Objective We estimated test sensitivity/specificity and latent TB infection prevalence in a prospective, US-based cohort of 10 740 participants at high risk for latent infection.
Methods Bayesian latent class analysis was used to estimate test sensitivity/specificity and latent TB infection prevalence among subgroups based on age, foreign birth outside the USA and HIV infection.
Results Latent TB infection prevalence varied from 4.0% among foreign-born, HIV-seronegative persons aged <5 years to 34.0% among foreign-born, HIV-seronegative persons aged ≥5 years. Test sensitivity ranged from 45.8% for the T-SPOT.TB among foreign-born, HIV-seropositive persons aged ≥5 years to 80.7% for the tuberculin skin test among foreign-born, HIV-seronegative persons aged ≥5 years. The skin test was less specific than either interferon-γ release assay, particularly among foreign-born populations (eg, the skin test had 70.0% specificity among foreign-born, HIV-seronegative persons aged ≥5 years vs 98.5% and 99.3% specificity for the QuantiFERON and T-SPOT.TB, respectively). The tuberculin skin test’s positive predictive value ranged from 10.0% among foreign-born children aged <5 years to 69.2% among foreign-born, HIV-seropositive persons aged ≥5 years; the positive predictive values of the QuantiFERON (41.4%) and T-SPOT.TB (77.5%) were also low among US-born, HIV-seropositive persons aged ≥5 years.
Conclusions These data reinforce guidelines preferring interferon-γ release assays for foreign-born populations and recommending against screening populations at low risk for latent TB infection.
Trial registration number NCT01622140.
- clinical epidemiology
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is the key question?
How do the three existing tests for latent TB perform in a large cohort of persons at high risk in a low-incidence country using latent class analysis, given that there is no gold standard test for latent TB?
What is the bottom line?
Test sensitivity was moderate at best; specificity was good for the interferon-γ release assays but moderate for the tuberculin skin test, resulting in poor test positive predictive values in key populations including young children and persons infected with the HIV.
Why read on?
This is the first study that directly examines the test characteristics of the three existing tests for latent TB without using imperfect surrogate markers and provides key data for important populations such as young children.
Targeted testing and treatment of latent TB infection (LTBI) is a key component of TB elimination efforts in countries with a low incidence of TB disease. However, LTBI diagnosis is hampered by the imperfect nature of the three existing diagnostic tests: the Mantoux tuberculin skin test (TST), the QuantiFERON Gold In-Tube (Qiagen, Germantown, Maryland) and the T-SPOT.TB (Oxford Immunotec, Marlborough, Massachusetts). A substantial challenge to understanding how to best use these tests in clinical practice is the absence of a gold standard for LTBI that indicates infection with certainty. This absence prevents both accurate assessment of test characteristics (sensitivity and specificity) and accurate evaluation of LTBI prevalence among populations of interest. Evaluation of test performance has employed: (A) the proportion of positive tests among patients with confirmed TB disease as a measure of sensitivity, (B) the proportion of negative results among populations at low risk as a measure of specificity or (C) the correlation between exposure to infectious TB and the likelihood of a positive test result.1–3 These surrogates are all imperfect at best and misleading at worst. For example, the ability for the immune system to respond to TB antigens, which must occur to have a positive LTBI test, might differ for someone with TB disease, with actively replicating bacilli, compared with LTBI,4 5 with quiescent dormant bacilli. Similarly, persons who are at low risk for LTBI reside in low-incidence areas and are often socioeconomically and demographically different from persons residing in high-incidence areas. These differences might be associated with differential exposure to non-tuberculous mycobacteria and Mycobacterium leprae, both of which have potential to cross-react with tests for LTBI and affect the measured specificity of the tests.6 7 Studies that examine the association between a positive test result and subsequent risk for progression to TB disease are the best gold standard available, although even that type of study is imperfect, limited by low event rates and confounded by differential acceptance of LTBI treatment.8–10
Latent class analysis (LCA) is a statistical technique that offers an alternative method for understanding test characteristics when no gold standard is available. LCA has been used extensively in the social sciences but is now increasingly used to examine diagnostic tests in medicine.11 It uses the observed patterns of test results to calculate the prevalence of the underlying condition, which otherwise cannot be directly observed, as well as the sensitivity and specificity of the tests. We used LCA to understand the three available tests for LTBI among a cohort of persons at high risk for LTBI.
We examined a cohort of participants enrolled in a prospective study to assess the predictive ability of available tests for LTBI. The Centers for Disease Control and Prevention (CDC) funded the study through the Tuberculosis Epidemiologic Studies Consortium (TBESC), a partnership of academic institutions and TB control programmes in 11 US states. Sixteen TBESC-affiliated clinics enrolled children and adults at high risk for LTBI and tested them concurrently with a TST, a QuantiFERON Gold In-Tube (QFT-IT) and a T-SPOT.TB (TSPOT) test; a 17th clinic used only TST and QFT-IT because TSPOT was not initially available. All participants had at least one of the following risk factors for LTBI or progression to TB disease: (1) close contact with an infectious TB patient; (2) recent immigration (≤5 years) from a country with moderate rates of TB (eg, Mexico); (3) immigration at any time from a country with high rates of TB (eg, India); (4) recent (≤5 years) residence for ≥30 days in a country with high rates of TB; (5) member of a group with high (≥25%) local prevalence of LTBI (eg, homeless persons); or (6) a diagnosis of HIV infection.
Trained study personnel collected demographic and LTBI-related risk information at participant enrolment. All participants were evaluated for TB disease at time of enrolment and will be followed up at 6 month intervals for 2 years or until diagnosis of TB; matching of participant names to state TB registries will continue through 2021. Participants found to have TB at the time of enrolment were excluded from this analysis. All participants provided written informed consent, assent or parental permission. The study was registered at clinicaltrials.gov (identifier NCT01622140).
Study enrolment began on 20 July 2012 and ended in April 2017. This LCA includes participants enrolled during 20 July 2012–8 September 2014. A flow chart of enrolled participants is included in figure 1, and comparison with excluded participants is in online supplementary table S1.
Testing for measurement invariance among groups was performed with PROC LCA12 in SASV.9.4. On the basis of published data,1 13 14 a strong belief existed that test characteristics would differ among the following groups: (A) HIV-seropositive, compared with HIV-seronegative; (B) foreign-born persons, compared with US-born (born in US states and territories) persons; and (C) aged ≥5 years, compared with aged <5 years. We tested for measurement invariance across these groups by sequentially comparing a model that assumed measurement invariance with one that permitted the ρ parameters (ie, sensitivity and specificity) to vary freely, examining the differences in the G2 values (a measurement of goodness-of-fit) for each model. We assumed that the differences follow a χ2 distribution with the df equal to the differences in df between models. We rejected the assumption of measurement invariance across all three groups (p<0.05 for all three comparisons) and divided the cohort into analysis groups on the basis of permutations of the three grouping variables (HIV status, foreign birth and aged ≥5 years). Because the numbers of participants in certain groups were limited and unlikely to yield robust results, we focused on five groups with ≥100 participants in each: (A) foreign-born, HIV-seronegative and aged ≥5 years; (B) foreign-born, HIV-seropositive and aged ≥5 years; (C) foreign-born, HIV-seronegative and aged <5 years; (d) US-born, HIV-seronegative and aged ≥5 years; and (e) US-born, HIV-seropositive and aged ≥5 years.
QFT and TSPOT were performed using standard clinical protocols at each site; as this was a pragmatic study, no special efforts were made to standardise the laboratory procedures at each site. We included data only from participants with valid results for all three tests. We used the international cut-off for the TSPOT (six or more spots as positive and five or fewer spots as negative),15 because using the US interpretation with a borderline zone16 would have required discarding all results in the borderline zone (five, six or seven spots); we also ran the models using the US cut-off and classifying borderline results as negative. We used current CDC guidelines for TST interpretation.17 We used the standard manufacturer cut-off for the QFT-IT.18 Because all of the tests are immunologically based with overlapping antigens, we suspected that the assumption of conditional independence of tests (after taking true latent class status into account) would not be valid. Therefore, we created two latent class models for each group, one using a modification of the method of Qu et al 19 that included a random effect to account for conditional dependence and the other setting the random effect to zero (ie, assuming conditional independence). We used a Bayesian approach for both of these models20 with literature-based prior distributions for test sensitivities and broad prior distributions for specificity and prevalence (table 1). The literature-based prior distributions were used for sensitivity because the proportion of persons tested who subsequently develop active TB disease is a good proxy for test sensitivity, whereas broad prior distributions were used for other parameters because there is no similarly good proxy for test specificity or LTBI prevalence in the literature. The literature-based prior distributions for test sensitivity represent a weighted average of patients who were tested for LTBI and then subsequently experienced incident TB disease (online supplementary table S2 and S3). For example, the prior distribution for TST sensitivity for HIV-seronegative persons is based on 180 contacts, as reported in the literature, who had a TST, were prospectively followed and subsequently developed TB disease. A total of 143 of these persons had a positive TST; the point estimate for sensitivity would be 143/180, and the prior distribution representing those data would be a beta (143,37) distribution. Prior distributions for HIV-seronegative adults were obtained from a simple summation of studies of contacts of persons with TB disease; prior distributions for other groups (HIV-seropositive adults and children aged <5 years) were obtained from studies of those populations. To further assess the robustness of the latent class models, we split the TSPOT into two separate tests (the panel A and panel B results) with the same cut-off for a positive test applied to each (five or more spots greater than the nil plate). We then ran four test models (TST, QFT-IT, TSPOT panel A and TSPOT panel B) for each group. We used R V.3.3.0 (open-source software, the R Foundation, Vienna, Austria) and JAGS (Just Another Gibbs Sampler, available at http://mcmc-jags.sourceforge.net/) V.4.2.0 (open-source software) through the runjags package (V.3.2) to implement these models, with Markov chain Monte Carlo sampling to estimate parameter distributions. Four independent chains were used for each model. The initial 1000 and subsequent 5000 samples were used for model adaptation and burn-in, with subsequent sampling of a minimum of 20 000 iterations or enough iterations to obtain Gelman-Rubin statistics <1.05 for all sensitivity, specificity and prevalence parameters, whichever was greater.
Models that assumed the tests were conditionally independent (ie, without a random effect) provided higher sensitivity estimates for all three tests and lower prevalence estimates than the models that included a random effect to account for conditional dependence (figure 2 and online supplementary figure S1). Furthermore, the posterior means of the sensitivity estimates for the models that assumed conditional independence were often outside the 95% credible intervals of the literature-based prior estimates for sensitivity. Conversely, the sensitivity estimates for the models that accounted for conditional dependence generally fell within the credible intervals for the evidence-based prior distribution. We therefore focused on the conditional dependence (random effects) model for further reporting.
To assess the validity of the latent class measure, we created an exposure variable for the foreign-born participants as an estimation of the opportunities for a person to be exposed to and infected with TB. This surrogate exposure variable was calculated by multiplying the age at immigration by the WHO estimate of TB incidence in the country of origin in 2012.21 We assigned each participant to a latent class (LTBI or no LTBI) with probability equal to the Bayesian posterior mean of the probability of belonging to that class derived from the model and examined the relationship between quartile of the exposure variable and estimated LTBI infection status (defined by latent class membership). The χ2 test for trend was used to assess this relationship.
A total of 12 134 participants were enrolled in the study during 20 July 2012–28 September 2014. Of these, 10 740 had valid results for all three tests and were included in our study. Table 2 provides a summary of demographic characteristics for these participants. Table 3 includes a summary of test combinations for each of the five groups evaluated, and online supplementary table S4 shows the same summary considering US borderline results as negative. Certain test combinations were absent or rarely encountered in some groups (eg, negative TST with positive QFT-IT and TSPOT among foreign born, HIV-seronegative children aged <5 years).
Table 4 lists estimates for the prevalence of LTBI as well as test characteristics among the five groups derived from the three-test random effects model. The prevalence of LTBI in the groups ranged from 4.0% among foreign-born, HIV-seronegative children aged <5 years to 34.0% among foreign-born, HIV-seronegative persons aged ≥5 years. The sensitivity of the tests varied widely across the groups, usually with lower point estimates for sensitivity of all three tests among HIV-seropositive persons, compared with HIV-seronegative persons. Specificity of the TST was higher among US-born than among foreign-born groups, whereas specificities of the QFT-IT and TSPOT did not appear to vary by birthplace. The positive predictive value of the TST was only 10.0% among foreign-born, HIV-seronegative children aged <5 years and ranged from 40.4% among US-born, HIV-seropositive persons aged ≥5 years to 69.2% among foreign-born, HIV-seropositive persons aged ≥5 years. The positive predictive value of the QFT-IT was low (41.4%) among US-born, HIV-seropositive persons aged ≥5 years and ranged from 73.1% to 96.4% among other groups. The positive predictive value of the TSPOT ranged from 77.5% to 98.2% across groups. Negative predictive values of all tests ranged from 79.1% to 98.8% across groups. Using the US cut-off for TSPOT instead of the international cut-off had minimal impact on the prevalence, sensitivity and specificity estimates for most groups with the exception of prevalence in foreign-born, HIV-seropositive persons aged ≥5 years, but the credible interval for this group was very wide due to a small number of observations (online supplementary table S5).
Table 5 displays the association between our surrogate exposure variable (age at immigration multiplied by TB incidence in the country of origin) and probability of a positive test or latent class assignment among foreign-born, HIV-seronegative persons aged ≥5 years with valid data for the surrogate exposure variable (n=7880). The probability of a positive result increased substantially by increasing exposure quartile for all three tests as well as for the latent class assignment. The LTBI prevalence by LCA for each quartile of exposure was between the prevalence by interferon-γ release assay (IGRA) and TST.
Table 6 describes the positive predictive values of the various test combinations derived from the three-test, random effects model. Although credible intervals were wide for certain relatively uncommon test combinations, common themes emerged across groups. First, as expected, the positive predictive value of three positive tests was high (approximately 100%). Second, the positive predictive value of three negative tests was low but varied by underlying prevalence of LTBI within the group. Foreign-born persons aged ≥5 years, who had an underlying LTBI prevalence of 34.0% and 31.7% in HIV-seronegative and HIV-seropositive groups, respectively, both had non-zero proportions of participants (4.8% and 12.7%) with LTBI, despite having three negative tests. Conversely, having an isolated positive TST (with the other two tests negative) had similar positive predictive value to having all three tests negative across all groups.
LCA of this diverse cohort enrolled in a low-incidence setting provides insights into LTBI screening test performance. Consistent with other observations,22–24 we determined that test performance varied substantially, depending on the group tested. Specifically, we noted lower test sensitivity among HIV-seropositive persons than among HIV-seronegative persons and lower specificity for the TST among foreign-born persons (presumably because of a combination of BCG vaccination and exposure to non-tuberculous mycobacteria) than among US-born persons. More importantly, combining these test characteristics with estimated underlying LTBI prevalence enables calculation of the positive predictive value of tests and test combinations when there is no gold standard, which permits optimum use of these tests. For example, a foreign-born, HIV-seronegative adult in our cohort with a positive TST would have a calculated Bayesian probability of 57.9% of having LTBI, which is only slightly better than a coin toss. If that same person had a positive QFT-IT or TSPOT, the Bayesian probability of LTBI would be 96.4% or 98.2%, respectively. A positive TST with a negative QFT-IT or TSPOT reduces the probability of LTBI to 21%, whereas a positive TST followed by a positive QFT-IT or TSPOT is associated with a 99% probability of LTBI in a foreign-born, HIV-seronegative adult in our cohort (ignoring any potential booster effect of the TST on the QFT-IT or TSPOT). These findings support both CDC recommendations of using an IGRA for foreign-born persons who might have received the BCG vaccine and European guidelines recommending confirmation of a positive TST with an IGRA.25 26
Our data demonstrate certain interesting findings in some of the groups. First, because the estimated prevalence of LTBI was low among the US-born, HIV-seropositive study participants, the positive predictive value of the QFT-IT was low among this group (41.4% overall) despite high test specificity. This finding is consistent with other authors’ descriptions of what appear to be relatively frequent false-positive QFT-IT tests among US-born, HIV-seropositive persons.27 The TSPOT had a higher point estimate of positive predictive value among this population (77.5%), but this still implies that approximately one-fourth of positive TSPOT results among this group are false positives. Conversely, the negative predictive values of all the tests among this group were high (97%–98%); therefore, few HIV-seropositive persons with LTBI would be missed with any of the tests. These findings should be considered in the context of our study population, who had high CD4+ T-lymphocyte counts. Second, the vast majority of positive TSTs among foreign-born children aged <5 years were isolated positive tests, with concurrent negative QFT-IT and TSPOT. Our analysis demonstrates that a limited proportion of children with an isolated positive TST (1.2%) truly have LTBI, indicating that these isolated positive TSTs are almost all false positives, presumably because of recent BCG vaccination. Although concerns have been raised that IGRAs are less sensitive than TSTs among young children,28 29 our results demonstrate that the sensitivity of all three tests are suboptimal to approximately the same degree; however, the greater specificity of the IGRAs would avoid unnecessary LTBI treatment for many young foreign-born children. Furthermore, the negative predictive value of all the tests was good (>98%); therefore, few young children with LTBI would be expected to be missed because of false-negative tests. On the basis of these results, using an IGRA for LTBI screening among foreign-born children aged <5 years would be strongly preferred over the TST.
Multiple indirect lines of evidence support our findings. First, the strong association between our surrogate exposure variable (age at immigration multiplied by TB incidence in the country of birth) and LTBI as assigned by the latent class model supports the concept that the model is measuring LTBI. Second, the estimated LTBI prevalence among foreign-born children aged <5 years is in line with the expected prevalence among this group, given other studies that have estimated an annual risk for infection (with serial testing) of 0.2%–6% in TB-endemic countries,30 31 with the majority of estimates for high-incidence countries being 1%–3%.32 In this context, the latent class LTBI prevalence (4.2%) is a more plausible estimate than the 28% prevalence (131/464) obtained among our participants if the TST were used to ascertain LTBI. The estimated LTBI prevalence among certain groups was significantly higher than what has been reported among other population-based studies (eg, 34.0% among our foreign-born, HIV-negative participants aged ≥5 years, compared with approximately 9% in the National Health and Nutrition Examination Survey data).33 However, this difference is to be expected given that (A) our study was not population based but focused on recruiting persons believed to be at high risk for LTBI and (B) the methods used to estimate LTBI prevalence in our study take into account the imperfect sensitivity of currently used tests, which has not yet been done with the National Health and Nutrition Examination Survey data.
This analysis has potential limitations. The study was conducted as a pragmatic trial, so no special efforts were made to synchronise laboratory procedures across sites; this may have increased variability in test performance across sites and contributed to uncertainty in estimates of test characteristics. However, this reflects the ‘real-world’ performance of the tests, which makes our results more generalisable. LCA by definition examines phenomena (in this case LTBI) that cannot be directly observed. Although strong conceptual and biological plausibility exists for believing that the latent class represents a dichotomy between TB infection and lack of infection, the latent class might represent a subtly different biological phenomenon that is not concordant with the traditional dichotomy of LTBI. In addition, the choice of prior probabilities has significant influence on the model results; in the current manuscript, the sensitivity estimates were chosen based on the best available gold standard in the literature (progression to active TB after a test was performed), but use of different prior probabilities may well have produced different results. Furthermore, whereas adjusting for conditional dependence clearly had important effects on the results, several methods for performing such an adjustment exist, and each method might provide slightly different results, as discussed by Albert and Dodd.34 However, alternative proposed methods (eg, latent mixture models) have been problematic for some authors, involving a failure to converge.35 We attempted a latent mixture approach with our data and encountered problems with convergence as well; therefore, we discarded that approach. Finally, limited sample sizes for certain groups (eg, HIV-positive, foreign-born participants aged ≥5 years) and low frequency of observing certain test patterns (eg, negative TST, positive QFT-IT, positive TSPOT among HIV-negative, foreign-born children aged <5 years) limited our ability to precisely estimate selected parameters for these groups; additional data are needed for improving parameter precision for these groups.
In conclusion, Bayesian LCA of a large, prospectively enrolled, US-based cohort highlighted the limitations of LTBI diagnostic tests. Our findings demonstrate that IGRAs should be strongly preferred over the TST for both foreign-born children and adults. Although we hope that these findings can be used to optimise existing tests, they just as clearly demonstrate that improved diagnostic tests for LTBI are urgently needed.
The authors gratefully acknowledge the assistance of the TBESC project coordinators: Katya Salcedo, Richmond, California; Laura Romo, San Francisco; Christine Kozik, San Diego; Carlos Vera, San Diego; Juanita Lovato, Denver; Laura Farrow and Colleen Traverse, Durham North Carolina; Kristian Atchley and Fernanda Maruri, Nashville, Tennessee; Kursten Lyon and Debra Turner, Raleigh, North Carolina; Nubia Flores, Charlotte, North Carolina; Jane Tapia, Atlanta; Livia Sura and Joanne C Li, Gainesville, Florida; Marie McMillan, Fort Lauderdale, Florida; Stephanie Reynolds-Bigby, Miami and Fort Lauderdale; Angela Largen and Thara Venkatappa, Honolulu; Aurimar Ayala, Phoenix, Arizona; Elizabeth Munk and Gina Maltas, Baltimore; Yoseph Sorri and Kenji Matsumoto, Seattle; Amy Board and James Akkidas, Fort Worth, Texas.
The authors would also like to thank Dr Matthew G Johnson, who assisted with the literature review for developing the prior probabilities, and Dr Nandini Dendukuri for assistance with the statistical methods. Finally, the authors would like to thank all of the study participants.
Contributors JES designed the study, performed the analysis, drafted the manuscript and approved the final version. YW, CSH, ACP, P-JF, DJK and SG participated in study design and analysis, revised the manuscript for critically important content and approved the final version. TV participated in study design, revised the manuscript for critically important content and approved the final version. RL participated in the analysis, revised the manuscript for critically important content and approved the final version.
Funding The study was funded by a contract with the Centers for Disease Control and Prevention. Support for the analysis was also provided by a voucher from Duke Research Computing. References in this manuscript to any specific commercial products, process, service, manufacturer or company does not constitute its endorsement or recommendation by the US Government or the Centers for Disease Control and Prevention.
Disclaimer The findings and conclusions are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Competing interests None declared.
Patient consent Not required.
Ethics approval The study was approved by the Centers for Disease Control and Prevention Institutional Review Board (IRB) and by IRBs at those sites that did not defer to the CDC IRB.
Provenance and peer review Not commissioned; externally peer reviewed.
Collaborators California Department of Public Health (Richmond): Jennifer Flood, Lisa Pascopella (includes San Francisco Department of Public Health: Julie Higashi; County of San Diego Health and Human Services Agency: Marisa Moore (CDC); and University of California San Diego Antiviral Research Center: Richard Garfein and Constance Benson); Denver (CO) Health and Hospital Authority: Robert Belknap and Randall Reves; Duke University (Durham, North Carolina): Jason Stout (includes Carolinas Medical Center (Charlotte, North Carolina): Amina Ahmed; Vanderbilt University Medical Center (Nashville, Tennessee): Timothy Sterling and April Pettit; Wake County Human Services (Raleigh, North Carolina): Jason Stout); Emory University (Atlanta): Henry M. Blumberg (includes DeKalb County Board of Health: Alawode Oladele); University of Florida (Gainesville): Michael Lauzardo and Marie Nancy Séraphin; Hawaii Department of Health (Honolulu): Richard Brostrom; Maricopa County Department of Public Health (Phoenix, Arizona): Renuka Khurana; Maryland Department of Health (Baltimore): Wendy Cronin and Susan Dorman; Public Health—Seattle and King County: Masahiro Narita and David Horne; University of North Texas Health Science Center (Fort Worth): Thaddeus Miller.