Statistics from Altmetric.com
Preoperative staging of non-small-cell lung cancer with 18-fluorodeoxyglucose positron-emission tomography
R M Pieterman, J W G van Putten, J J Meuzelaar, E L Mooyaart, W Vaalburg, G H Koeter, V Fidler, J Pruim, H J M Groen
Background: Determining the stage of non-small-cell lung cancer often requires multiple preoperative tests and invasive procedures. Whole body positron-emission tomography (PET) may simplify and improve the evaluation of patients with this tumor. Methods: We prospectively compared the ability of a standard approach to staging (computed tomography (CT), ultrasonography, bone scanning, and, when indicated, needle biopsies) and one involving PET to detect metastases in mediastinal lymph nodes and at distant sites in 102 patients with resectable non-small-cell lung cancer. The presence of mediastinal metastatic disease was confirmed histopathologically. Distant metastases that were detected by PET were further evaluated by standard imaging tests and biopsies. Patients were followed postoperatively for six months by standard methods to detect occult metastases. Logistic-regression analysis was used to evaluate the ability of PET and CT to identify malignant mediastinal lymph nodes. Results: The sensitivity and specificity of PET for the detection of mediastinal metastases were 91% (95% confidence interval 81 to 100%) and 86% (95% confidence interval 78 to 94%), respectively. The corresponding values for CT were 75% (95% confidence interval 60 to 90%) and 66% (95% confidence interval 55 to 77%). When the results of PET and CT were adjusted for each other, only PET results were positively correlated with the histopathological findings in mediastinal lymph nodes (P<0.001). PET identified distant metastases that had not been found by standard methods in 11 of 102 patients. The sensitivity and specificity of PET for the detection of both mediastinal and distant metastatic disease were 95% (95% confidence interval 88 to 100%) and 83% (95% confidence interval 74 to 92%), respectively. The use of PET for clinical staging resulted in a different stage from the one determined by standard methods in 62 patients: the stage was lowered in 20 and raised in 42. Conclusions: PET improves the rate of detection of local and distant metastases in patients with non-small-cell lung cancer. (N Engl J Med 2000;343:254–61.)
Consequences of misdiagnosis in medicine
The epidemic in lung cancer due to tobacco smoking now accounts for 18% of cancer deaths worldwide.1 Although radical resection of locoregional disease (stage IIIa or less) is associated with a 5 year survival of 40–50%, no more than one in three cases presents in time for surgery.2 Most sites of postoperative relapse lie without the operative field,3 suggesting that the “curative” local procedure was, in hindsight, futile. Furthermore, thoracotomies and lung resections are major surgery with a mortality of 3% and an average stay in hospital of 1 month.4 What is often overlooked in the analysis of morbidity and mortality figures is the contribution of misallocation of treatment by inferior diagnostic techniques—that is, misdiagnosis. The use of more accurate diagnostic techniques promises to reduce excess mortality, morbidity, and cost associated with futile procedures on the one hand, and missed therapeutic opportunities on the other. In 1996 Gambhiret al 5 used decision tree analysis to put flesh on the case for 18-fluorodeoxyglucose positron emission tomography (FDG-PET) to stage non-small cell lung cancer (NSCLC). At that time the authors could refer to a total of three studies reporting the diagnostic accuracy of PET scanning in this indication. Their prediction that FDG-PET could cut US$1154 from the cost of care per patient, with no detriment to life expectancy, was at the nucleus of the expansion of clinical PET in the USA. In 2001 there are now 29 reported studies of FDG-PET in the staging of NSCLC, yet its application in clinical medicine in general remains piecemeal. What are the reasons for the prolonged adolescence of this technology, and can one more study with a positive result6 in a high impact journal change clinical practice?
Principles of FDG-PET scanning
Positronic scanning systems are complex, primarily because they use injected radiotracers. Whereas conventional scanners will generate X-ray beams or magnetic fields at the flick of a switch, radiotracers have to be produced in a separate facility and are subject to stringent pharmaceutical and isotopic quality controls. This entails high set up and ongoing costs and requires multidisciplinary expertise. Additionally, with radio decay limiting shelf life to minutes or hours, the source of supply needs to be physically close to the scanning unit. Radiotracers are, however, a uniquely advantageous diagnostic tool in that their corporeal uptake and distribution directly reflects physiological function. This is in contrast to conventional technologies which report mainly on the anatomical structure.
Tracer production starts with a high energy beam of subatomic particles produced by a cyclotron. Radioisotopes are generated when the beam collides with a target and are chemically processed into the pharmaceutical form for use as tracers. A significant reduction in the entry cost for clinical PET scanning has been made possible by the commercialisation of tracer production and supply which is feasible in well funded geographically central densely populated regions. Commercial willingness to carry the overheads of PET provision has extended to the recent introduction in the UK of a mobile PET scanning bus.7
The rationale for imaging with 18-fluorodeoxyglucose (FDG) in cancer stems back to work by Warburg from the 1930s.8 Almost all cancers are now known to share features of increased glucose uptake and altered glucose handling, approximately to the extent of their malignant potential. More recent work has identified molecular substrates including the Glut 1 receptor, type II hexokinase, andras group oncogene activation.9-11 The significance of these findings may outreach mere improvements in diagnosis, pointing as they do to another therapeutic paradigm than the antimitotic. FDG is synthesised by substitution of the radioisotope 18-fluorine for a hydroxyl group at the 2′ carbon on the d-glucose ring. Like glucose, FDG is avidly taken up by tumour cells and phosphorylated. Beyond this point it is metabolically trapped, its persistence within the cell corresponding to the rate of isotopic decay. This biochemistry was modelled in animals in the 1970s,12 13 pilot human studies in cancer followed in the 1980s,14 and clinical work, especially with NSCLC, from the 1990s.
The main contraindication to the use of FDG is hyperglycaemia which interferes with tissue uptake to cause false negative results. The usual dose range of administered radioactivity per scan is 350–400 MBq which translates to 7 mSv.* This can be compared with a mean of 2.2 mSv annual background radiation in the UK, 0.02 mSv for a chest radiograph, up to 8 mSv for a CT scan of the chest, 2–6 mSv per annum for aircrew,15 and 7.8 mSv per annum background exposure in Cornwall.16
18-Fluorine decays to the stable moiety 18-oxygen by emission of positrons with a half life of 120 minutes. Positrons are antimatter particles having positive charge, the counterpart to electrons. PET imaging systems respond to the mutual annihilation of positrons and electrons, which releases energy in the form of two 511 keV gamma rays travelling in opposite directions. These are detected by photoluminescent crystals and a computer calculates the spatial origin of the rays between the detectors. It is the “coincidental” appearance of a 511 keV signal on opposite sides of the emitting source that is the signature of positronic decay. The spatial origin of the gamma rays, however, is not identical with the locus of radiotracer, since positrons may travel up to 3 mm in tissue before annihilating. This contributes to the theoretical best anatomical resolution of the technique of approximately 2 mm with 18-F tracers. Importantly, this is distinct from the physiological resolution whereby PET is able to identify events occurring at a molecular level. In summary, PET detects the presence of molecular events and locates them within a volume defined by its anatomical resolution. It thereby remains complementary to the imaging modalities of computed tomographic (CT) scanning and magnetic resonance imaging (MRI) which serve to define the corresponding anatomy.
The current benchmark PET scanner technology is a ring-shaped unit containing many hundreds of bismuth germanium oxide (BGO) crystals resolving down to 5 mm. An alternative lower cost, lower performance technology has been developed in the “dual headed gamma camera”, but its place in oncological imaging at the current time is not secured.17 Combination CT-PET devices that could improve image co-registration are currently in development.18 It usually takes 1 hour to conduct a whole body PET scan and a scanner running at capacity can produce up to 15 studies in a day.
Cost of clinical PET
A high ratio of cost to credibility has slowed the advance of clinical PET in cash constrained public health systems. Current representative fees per procedure from an NHS provider are £800 within the NHS and £960 without. The total annual cost of running a PET facility (including capital overheads and depreciation) is of the order of £1.2 million. Accordingly, to “break even” on these fees in the NHS, 1500 procedures would be required which translates into six procedures per day for 233 working days. The resection rate for NSCLC in the UK at 10% (3000 cases per annum) is low in international terms. Should UK practice align itself with other countries, this rate might increase to 20%.19 To service these volumes with FDG-PET would thus require the equivalent of 2–4 dedicated facilities. That implies a total annual expenditure of £2.4–4.8 million if FDG-PET were to be incorporated routinely into practice for this one indication. One would expect to be able to mitigate this expenditure by cost or morbidity benefits in other areas, principally surgical. It is clearly a technology with major fiscal implications whose wholesale introduction requires a strong evidential base.
In the US context, health funders have found the evidence convincing. At the time of writing the US Health Care Funding Authority (HCFA)20 will approve reimbursement of the cost of FDG-PET studies in relation to diagnosis, staging, and re-staging of:
Non-small cell lung cancer
Lymphoma (including Hodgkin's disease)
Melanoma (excluding regional nodal evaluation)
Head and neck cancers (excluding CNS and thyroid)
Beyond what is funded, clinical applications of FDG-PET exist for a wide range of oncological indications.21 The demand for accurate diagnosis is strong since it links directly to the allocation of high cost, high risk procedures such as:
Thoracotomy and lung resection for NSCLC
Hemihepatectomy for isolated colorectal metastases
Consolidative high dose chemotherapy and radiation in lymphoma
Regional plastic surgical dissection and adjuvant immune therapy in melanoma
“Head to head” studies with FDG-PET
The evidence for clinical PET derives principally from “head to head” comparisons whereby patients are investigated by PET, a conventional technology, and a reference test. Study outcomes include values for sensitivity and specificity, and the effect of PET on clinical stage allocation. With respect to the staging of NSCLC, 21 groups have published 27 studies. The numbers reported in each instance are small—for example, the total of the disease positive subgroups is 600, the mean per study 17, and the range per study 1–44. This leads to difficulties with statistical significance. The mathematics of the “binomial test” are such that, in studies where either the disease positive or disease negative subgroup numbers less than 35, the lower bound of the 95% confidence interval for a reported characteristic of 100% cannot exceed 90%. Conceptually, FDG-PET studies in NSCLC reveal two categories of information: mediastinal nodal (N) status and distant metastatic (M) status. The English language published work is summarised in table 1.
It is puzzling that seven years of publication have not translated into generalised application of the technology. This may reflect a “Catch 22” scenario whereby those with the wherewithal for pilot studies have proceeded directly to clinical practice, and those without have stood still. It is also interesting to review the publishing journals which shows the literature to have only recently moved out of surgical and radiological departments and onto the respiratory, oncological, and general medical stage (see table 2).
Ultimately, however, the question arises as to whether this work has engaged its audience in a way conducive to progress.
The study by Pieterman et al is notable as the first publication about FDG-PET nodal status to appear in a high profile general medical journal. It stands out for its meticulous use of the reference standard surgical sampling at mediastinoscopy. Several authorities on assessment of the validity of diagnostic tests have published checklists against which this study can be assessed.47
Assessment of study quality and applicability
Pieterman et al in a tertiary referral setting evaluated 110 consecutive patients with potentially resectable NSCLC between September 1996 and December 1998. They compared the ability of a standard approach to staging (CT, ultrasound, and bone scanning) with one involving FDG-PET to detect metastases in mediastinal lymph nodes and at distant sites. The reference standards included mediastinal histology, needle biopsy, and findings on clinical follow up. An unspecified number of patients were excluded on the grounds of hyperglycaemia or prior mediastinal instrumentation. Four patients were excluded who did not have NSCLC. A further four who had inadequate mediastinal dissections were excluded from analysis. The demographic data and distributions of tumour types and stages were broadly compatible with other populations in this setting (table3).
PET scans were conducted on an ECAT model 951/31 full ring BGO device between 1996 and 1998. Each scan encompassed the whole body (the length of the vertebral column from C1 to L5). Observers noted the locations of qualitative “hot spots” of increased FDG activity compared with the background. No clinicians involved in the care of patients were aware of the results of the PET scans. All patients underwent a standard clinical assessment including a CT scan of the chest and upper abdomen. Imaging studies in the mediastinum were analysed by two independent observers who were unaware of the patients' clinical data. In keeping with routine diagnostic practice, observers were able to compare CT and PET scans. Findings from 102 patients were reported. All underwent cervical mediastinoscopy to the subcarinal level, and the 87 with negative mediastinoscopy proceeded to thoracotomy. No other interventions were performed during this time. 516 of 534 surgically reachable mediastinal lymph nodes were dissected. Metastatic disease was diagnosed on criteria including biopsy, serial imaging, and skeletal radiology. Metastatic disease was excluded ultimately based on 6 month clinical follow up.
The temporal relationship of the imaging investigations was not stated. Given that cancers grow, any delay between investigations would favour the sensitivity of the later test. Likewise, bias could be introduced where one imaging modality was viewed with foreknowledge of the other. These problems would be simply remedied by a study design incorporating random allocation of the order of tests.
The diagnostic threshold for an FDG-PET “hot spot” was not quantified. Although a high degree of interobserver concordance was achieved (κ = 0.98), this might not be generalisable outside the institution. Quantitative criteria such as the “standardised uptake value” (SUV) are a more freely convertible currency for comparing results of PET.
The validity of the test for N3 disease is theoretically less than for N2, for reasons including a low incidence (n = 3) and a less secure reference standard distal to the carina (the contralateral mediastinum is not explored at thoracotomy). In practice the chance of unsuspected contralateral lesions would be negligible, bearing in mind the axial orientation of mediastinal lymphatics and the low incidence of skip metastasis in this illness.51
The authors were subsequently criticised for their method of mapping mediastinal node stations.52 The key issue in the study, however, is allocation of the clinical stage (N status) and the mapping was more than adequate for this purpose.
The length of follow up to confirm M0 status would ideally have been at least 12 months in order to take in a minimum disease free interval justifying surgery as a means of local control. Distant metastases occurring after this time, although technically “false negatives”, would be less relevant (in hindsight) to the decision to proceed with surgery. Published actuarial data suggest that the number of extra distant metastases expected in a 102 patient cohort during the second 6 months would be approximately five,53 equating to five extra false negatives for both conventional staging and PET scanning.
The authors calculated that a sample size of 100 would be required to give a power of 0.85, at a two-sided α level of 0.05, to detect a 35% difference in the sensitivity of PET and CT scanning for mediastinal nodal disease. The predicted sensitivity of CT scanning was 60%, which was the mean value identified in a recent meta-analysis (range 25–89%).50 The actual measured sensitivity of CT scanning, however, was 75%, leading to a negative result for the study on its primary statistical end point (table 4).
It is not clear whether the authors could have foreseen the superior sensitivity of CT scanning in their hands. One problem with predicating a study on sensitivity is that it is a dependent variable. Sensitivity can always be increased by lowering the threshold for a positive test at the expense of specificity. Inflexible as the threshold for a positive CT scan may appear (nodal short axis diameter ⩾1 cm), it is notable that the authors' measured specificity was 11% less than the literature mean of 77%. The relationship between sensitivity, 1–specificity, and threshold—in other words, the ability to discriminate between disease and non-disease—is summarised in a test's receiver operator characteristic (ROC) curve. The ROCs of different tests can be compared by means of logistic regression analysis. Using similar methods, the authors were able to identify a statistically significant finding: “when the results of PET and CT were adjusted for each other, only PET results were correlated with the histopathological findings in mediastinal lymph nodes (p<0.001)”. This finding has the additional advantage of being predicated on the entire data set, not just the information about sensitivity. It would be ideal if all work of this nature was designed from the outset on ROC criteria.
The authors also found PET to outperform CT scanning in the detection of distant metastatic disease, this time with non-overlapping confidence intervals. PET had a sensitivity of 82% (95% CI 57 to 96%) and CT scanning had a sensitivity of 18% (95% CI 4 to 43%). However, given the small number of patients with metastases, this result is vulnerable to uncertainty deriving from the exclusion criteria and the short period of follow up.
Although individual studies in this area have borderline sample sizes, by 1998 there had been sufficient published work to generate a statistically significant result in a meta-analysis. This review summarised 13 studies of FDG-PET and nodal status, including almost 200 patients with confirmed mediastinal disease49: 95% CI bounds around the sensitivity of PET were 76–82% and around CT were 58–62%. Bounds for specificity were 89–93% and 75–79%, respectively. In summary, the literature as a whole shows that FDG-PET does increase the accuracy of mediastinal diagnosis, concordant with the findings of Pieterman et al.
A caveat relates to the differing performance of FDG-PET in populations selected on the results of CT scanning. In one overview of 12 studies the sensitivity and specificity of PET was found to change from 74% and 96%, respectively, after a negative CT scan to 95% and 76% after a positive CT scan.54 This expresses a simple correlation between the geometric quantity of a physiological process and the ability of PET to detect it. The “CT conditioned” characteristics of PET are relevant to any estimation of the extra information it can provide in the context of a prior CT scan.
Role of FDG-PET in the diagnostic process
Studies of the accuracy of diagnostic tests are studies of populations. A source of clinician resistance to population findings is that they have not been cashed out in terms of the effect on management of individual patients. This is something that more recent work has tried to achieve by summarising the changes in stage allocation associated with imaging modalities.46 In this vein, Pieterman et al report that the use of PET for clinical staging resulted in a different stage from the one determined by standard methods in 62 patients. Unfortunately, it is a spurious end point from the clinical perspective since, although it illustrates the superiority of PET over standard methods, it does not speak to the reference tests. Although PET correctly staged 89 mediastinal metastases compared with 70 for CT scanning alone, PET was still incorrect in 13 cases and mediastinoscopy and thoracotomy were required for a final answer. No statement has been made as to the criteria on which those tests could safely be omitted. In reality, it was only the 11 patients with unsuspected metastatic disease who had their stage changed by PET.
What is required at this point is a fuller analysis of where FDG-PET stands in relation to the other staging methods and the reference standards. The clinician needs algorithms for individualised patient management in which decisions are based on the probability of disease at each point. This need can be partly addressed by analysing models of cost effectiveness. For example, Dietlein et al 54 recommended that FDG-PET should be offered routinely to patients whose thoracic CT scan was negative for N2/N3 disease (“strategy B”). Certain patients with a positive thoracic CT scan could also benefit from PET (“strategy C”). The primary role of PET would be to identify a proportion of mediastinal disease that would otherwise have only been revealed at surgery. All patients would achieve a pathological diagnosis, therefore disease related mortality would not alter. PET would reduce surgical morbidity, mortality, and cost.
For strategy B, Dietlein et al predicted an increase in mean survival from 3.308 to 3.322 years, and an increase in cost from 16 890 to 16 892 Euros. This is an incremental cost effectiveness ratio (ICER) of 143 Euros per life year gained, which falls comfortably below the threshold of 50 000 Euros per life year proposed in the literature.5 Strategy C was associated with an ICER of 36 667 Euros, “barely below the threshold”.
It is interesting to convert this model into more familiar clinical terms. For example, with a cohort of 1000 patients, strategy B predicts that there will be 659 PET scans and a reduction in surgery from 832 to 764 cases. Thus, the “number needed to treat” (NNT) to prevent one futile operation is approximately 10. In the case of strategy C, a further 341 PET scans are associated with four fewer operations. Here the NNT is 85. We think this is “comfortably above the threshold” where most clinicians would consider a test to have vanishing returns.
Examination of the probabilities confirms that the utility of PET is mainly sensitive to the outcome of the preceding CT scan (highlighted in table 5).
This kind of analysis can be extended to a range of clinical vignettes, with the general aim of calculating the NNT with a given test for a given probability of disease. Some of the key questions are listed below:
- What is the general relationship between NNT with FDG-PET and probability of nodal disease after CT scanning? When would PET be omitted?
- What is the relationship between NNT with mediastinoscopy and probability of nodal disease after imaging? When would mediastinoscopy be omitted?
- When is the probability of nodal disease so low that the patient can proceed directly to surgery?
- What is the relationship between NNT with FDG-PET and probability of metastatic disease? When would FDG-PET be contraindicated due to low returns and an unacceptable rate of false positive results? It is worth noting some emergent screening modalities which may in future compete with whole body PET for this role—for example, new types of tumour marker,55 56 monoclonal antibody studies of bone marrow,57 reverse transcriptase PCR tissue studies for index genes,58 and genomic screening.59
Is a randomised controlled trial required?
The questions above invite more complex analysis than is provided by simple models of effects of dichotomous tests on homogeneous populations. Although workable models can be constructed, these increasingly resemble entire clinical pathways. Can we be confident that biases would not emerge in the clinical application of such models? Head to head studies such as the work of Pietermanet al do give a high grade of evidence on sensitivity and specificity for reasons that have been described elsewhere.20 60 However, it is beyond the scope of these studies to report on the cost or survival implications of their measurements. It is not too hard to see how clinical criteria for diagnosis of metastatic disease could in practice be used to shift poor risk patients from one imaging group to the other, confounding later comparisons of survival. Can the diagnostic elements of clinical pathways be regarded as interchangeable parts whose replacement has a predictable effect on the outcome of the whole? Or does the whole exceed the sum of its parts? The sceptical position is to assume this possibility. Opinion on these issues will decide whether clinical PET can be introduced on the strength of the above reviewed literature and associated models, or whether it must await a large study reporting actual costs and survival figures within different subgroups. The benchmark criterion for measuring patient related outcomes is a randomised controlled trial. The ideal randomised controlled trial would validate a management algorithm reflecting the kind of point decisions about disease probability that are the essence of normal clinical practice. The local environment of fragmented service provision favours recruitment into a randomised controlled trial—in many ways medical science has evolved as a series of genies being let out of boxes and it is those with uncertain access to new technologies who participate in trials.
After 7 years, 29 studies, and one statistically significant meta-analysis, we can reasonably conclude that the routine use of FDG-PET would lead to more accurate lung cancer staging on a population basis. The cost of setting up routine use in the UK would be of the order of £2–5 million per annum. Cost effectiveness models at the health funding level have laid out some financial bounds on the choice of interventions. Within these constraints there remains freedom for clinicians to determine what is actually sensible and reasonable to do. However, we presently lack algorithms for the judicious use of FDG-PET technology in individual lung cancer care. Such algorithmic approaches are well recognised in respiratory nuclear medicine—for example, the analysis of pre- and post-test probabilities of pulmonary embolus that arose from the PIOPED study.61 Bayesian probability analyses have been described in the diagnosis of solitary pulmonary nodules and compared with the results of FDG-PET for that indication.62 There may well be sufficient information already in the literature on which to construct workable estimates of pre- and post-test probabilities of NSCLC stage within different clinical scenarios. Presently, given the fragmented distribution of FDG-PET facilities in the UK, there is the opportunity to assess this type of approach in a randomised trial. Ultimately, possession of clinically validated, clinically relevant, clinical guidelines should be the key to activating a clinical lobby for the extension of FDG-PET infrastructure.
Incorrect staging of NSCLC leads to increased mortality, morbidity, and cost—mainly when patients undergo thoracotomy in the presence of advanced disease.
The principle behind functional imaging with FDG-PET is that malignant tumours avidly absorb and retain fluorodeoxyglucose (FDG).
FDG-PET scanning is associated with an increase in the accuracy of nodal staging of NSCLC from 72% to 87%.
Models predict that addition of FDG-PET to the preoperative work up of patients with negative mediastinal CT scans could prevent about one “futile” thoracotomy for every 10 scans.
Two small randomised controlled trials on the end point of thoracotomy have had conflicting results.
A larger randomised study is proposed for the UK (the authors can be contacted for further information).
After submission of this article, the authors became aware of two small randomised trials assessing the clinical utility of FDG-PET for lung cancer staging.63 64 Both have been presented, in abstract form only, at meetings of the American Society of Clinical Oncology (ASCO). They have reported conflicting results, and we conclude that more work will have to be done. Hence, we have proposed a larger randomised study in the UK.
↵* MBq = bequerel, the unit of radioactivity (1 Bq = 1 atomic disintegration per second); mSv = sievert, the unit of biological cumulative radiation exposure. 1 Sv is associated with a 5% lifetime risk of contracting fatal cancer. The conversion factor for intravenous 18-FDG = 1.9 × 10–2 mSv/MBq.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.