A critical appraisal of overdiagnosis: estimates of its magnitude and implications for lung cancer screening
- Dr J M Reich, Thoracic Oncology Program, Earl A Chiles Research Institute, Portland Providence Medical Center, 7400 SW Barnes Rd, A622, Portland, OR 97225, USA;
- Received 14 February 2007
- Accepted 30 April 2007
Background: The magnitude of overdiagnosis is a critical and unresolved issue in lung cancer (LC) screening:(1) its contribution to the increase in survival constitutes specious evidence of benefit;(2) overdiagnosed individuals who undergo resection will experience a reduction in life expectancy, partially or completely offsetting the benefit received by others in whom earlier intervention proves curative.
Method: Critical analysis of studies in opposition and support of the view that LC screening imposes a substantial burden of overdiagnosis.
Results: Approximately 25%, possibly more, of radiographically (chest x ray) diagnosed LC appears to be overdiagnosed. Based on the observed tumour volume doubling time of low dose CT identified small malignant pulmonary nodules, CT will markedly augment lead time, increasing exposure to competing lethal morbidities, thereby increasing overdiagnosis.
Conclusion: To reduce all-cause mortality, CT screening will need to reduce LC mortality by an amount that exceeds the increase in mortality attributable to surgery and loss of pulmonary reserve in persons who are overdiagnosed or pathologically understaged (ie, with occult micrometastases). Presently, there is no evidence that CT screening will achieve any reduction in LC mortality.
Because of its frequency and lethality, identifiable high risk population and absence of effective non-surgical treatment, lung cancer (LC) is an ideal screening candidate. However, because controlled trials combining radiographic (chest x ray (CXR)) and cytological screening achieved no reduction in mortality, screening is not recommended by recognised professional societies (summarised by Bach and colleagues1). The development of low dose, spiral CT screening has generated a resurgence of interest because it is about fourfold more sensitive than CXR at identifying small peripheral LC, and because approximately 80% of CT identified LC are stage IA, which have a highly favourable surgical prognosis2: 5 year LC survival of surgically staged IA LC is about 70% while overall 5 year LC survival is approximately 15%.3 CT screening advocates assert that improvement in LC survival, predicated on a predominance of stage IA at diagnosis, confirms its efficacy.4 Others point out that LC survival is an invalid metric of efficacy because screening entrains biases—lead time, length biased sampling and overdiagnosis—that convey a spurious appearance of benefit,5 and it fails to reflect related non-cancer deaths.
Non-cancer mortality is high in LC survivors. Brown and colleagues6 used the US National Cancer Institute sponsored Surveillance, Epidemiology and End Results (SEER) database7 to compare their mortality to the values from the National Center for Health Statistics.8 They reported that 10% died of non-LC, and that the non-cancer relative hazard of death was 2.73, nearly threefold that of colon and breast cancer (1.09 each). The age at death of 22 randomly selected persons who underwent successful (no recurrence) resection of stage IA non-small cell lung cancer (NSCLC) at Portland Providence Medical Center between 1980 and 1989 was 5 years less than the actuarial survival for healthy gender and age matched smokers (Reich JM, Asaph JW, unpublished data). The excess non-LC mortality most probably reflects the combined effects of cardiopulmonary morbidities attendant on cigarette smoking and surgical reduction of pulmonary reserve, which would be expected to foreshorten their course. The excess mortality will impact three categories of persons undergoing notionally curative resection:(1) those otherwise destined to die of LC in whom surgery proves curative;(2) those with occult micrometastatic disease (understaged), who account for an LC fatality of about 30% in stage IA NSCLC (fatality = 1–survival);(3) those in whom LC would not have become clinically evident in their lifetime (overdiagnosed). The benefit for persons in category 1 will be partially offset by this harm. Persons in categories 2 and 3 cannot benefit from surgery; in aggregate, it will diminish their longevity.
The crucial issue in LC screening therefore is whether earlier intervention in category 1 will achieve a mortality reduction that exceeds the mortality increase attributable to invasive procedures in persons with true and false positive tests, tissue over-interpretation (eg, classifying atypical adenomatous hyperplasia as cancer9), individuals pathologically understaged because of undetected micrometastases and overdiagnosed individuals. This analysis updates and augments Parkin’s 1998 presentation to the American Cancer Society International Conference.10
It will:(1) examine the concept of overdiagnosis;(2) critically evaluate the evidence of its magnitude; and (3) estimate its magnitude and effect on survival and mortality in CT screening.
CONCEPT OF OVERDIAGNOSIS
LC is generally conceived of as an inexorably progressive, highly lethal neoplasm. The mid-seventies Connecticut experience reflects this conception: the average annual prevalence (P) was 23/, and incidence (I), 46/100 000 person-years (p-y), giving a mean duration (D) = survival = P÷I of 0.5 years11(see glossary, appendix C). Screening programmes revealed a wide spectrum of aggressiveness. Peripheral adenocarcinomas and its variants (particularly bronchioloalveolar carcinoma—BAC), presenting as solitary pulmonary nodules (SPN), exhibit the slowest growth,12 and are therefore most susceptible to being overtaken by competing lethal morbidities.
Overdiagnosis denotes detection of disease that would never have become clinically evident, a pathological finding lacking clinical import, detectable only by special means (screening or autopsy). Overdiagnosis, pseudodisease, clinically irrelevant cancer, iatrogenic pseudodisease and lanthanic disease are interchangeable terms. Overdiagnosed LC may be conceptualised as a clinical–pathological false positive diagnosis because it erroneously implies a lethal outcome.
Overdiagnosis13 occurs under two circumstances:(a) some indolent cancers prove non-lethal (eg, prostate and thyroid cancers are frequent incidental autopsy findings);(b) the course of aggressive cancers may be overtaken by lethal competing morbidities. LC, due to a common cause (smoking), is frequently accompanied by chronic obstructive pulmonary disease (COPD) and coronary artery disease. Screening, by imposing length biased sampling and lead time bias, increases overdiagnosis by prolonging exposure to competing lethal morbidities.
The incidence of clinically relevant LC is numerically bounded: it is given by its annual mortality (M)+ the proportion of cured LC that is not overdiagnosed; it can be approximated as M÷0.85, assuming 15% survival and zero overdiagnosis. Diagnostic methods, in proportion to their sensitivity, can increase the aggregate number of diagnosed LC, thus exceeding this limit.13 This spurious incidence increase is due to clinically irrelevant cases. If one subscribes to the belief that minute cancers arise frequently and are held in abeyance by immunological surveillance,14 it follows that the number of diagnosed cancers (incidence) is limited only by screening frequency and tumour size resolvable by the method used.
EFFECT AND QUANTIFICATION OF OVERDIAGNOSIS
Overdiagnosis combines a specious improvement in outcome, as measured by cause specific survival,13 with a genuine increase in mortality. Definitive evidence for and quantification of overdiagnosis would require a study of untreated subjects with screen identified, histologically confirmed, pathologically staged LC. Decades later, a panel reviewing survival curves, pathology, medical records and autopsy findings would define LC and all-cause survival. Those dying with clinically silent LC would constitute the overdiagnosed cases. Such a study is, of course, ethically and practically unfeasible. A discussion of overdiagnosis quantification is inseparable from the measurements by which it is operationally inferred.
STUDIES ADVANCED IN SUPPORT OF A LOW RATE OF OVERDIAGNOSIS
Survival in unoperated persons
Flehinger and colleagues15 reported on the pooled results of the National Cancer Institute sponsored Early Lung Cancer studies performed at the Mayo Clinic, Johns Hopkins and Memorial Sloan Kettering, identifying all cases of stage I (T1NOMO and T2NOMO) NSCLC, detected by symptoms, incidentally or by screening. They identified 45 clinically staged (c staged) T1 and T2 persons who either refused or were judged to have medical contraindications for surgery. Twenty-one (47%) were stage IA. Twenty (44%) received radiation therapy. Five of 45 (11%) not undergoing surgery did not die of LC within the 5 year follow-up period (fig 1 in Flehinger and colleagues15).
Overdiagnosis was estimated by 5 year surgically untreated LC survival as 11%. This estimate is fallacious because non-LC deaths (overdiagnosed, by definition, if their LCs were clinically silent) were treated as withdrawals. Furthermore, the survival figure underestimates untreated stage I LC survival because the authors conflated screen identified patients with those identified by symptoms (rare in stage IA) and incidental radiographs, because somewhat more than half were stage IB and because all were c staged (ie, understaged).
Sobue and colleagues16 reported all-cause survival of 42 persons with CXR detected c stage I LC diagnosed in 1976–1981, collected from 20 institutes comprising the Japanese Lung Cancer Screening Research Group, who either refused surgery or were judged not to be surgical candidates. In contrast with the predominance of prognostically favourable stage I adenocarcinoma12 and its variants in the Early Lung Cancer Action Project (ELCAP) prevalence screening trial,2 the authors reported a less favourable histological distribution: 25 (60%) were squamous cell and four (10%) were small cell. Radiation or chemotherapy, or a combination, was provided to 81%. All-cause 5 year survival was 14%. Medical record review showed that seven of 41 (17%) patients died of causes other than LC or its treatment in the 10 years of follow-up.
It is impossible to determine whether the 17% non-LC deaths in unresected individuals constitutes evidence of overdiagnosis or the (limited) efficacy of non-surgical therapy. It is likely that many were understaged (mediastinal lymph node involvement was assessed by linear tomography in almost all), which has the opposite effect of stage migration (ie, the outcome for the assigned stage would be less favourable (reverse Will Rogers effect)).
Henschke and colleagues17 used the 1988–1994 SEER registry data to characterise the survival of resected and unresected persons with stage IA NSCLC ascertained by any means. The authors computed an 8 year LC death rate “which adjusted for deaths from all other causes”(the phrasing implies that these values were cause specific). For those undergoing lobectomy, 8 year LC survival ranged from 60% to 75%, depending on size (0.6–3.0 cm), and for those receiving no treatment, from 6% to 13%.
Reporting LC rather than all-cause survival can be misleading, because if deaths within 30 days of surgery are excluded, or if a disproportionate number of individuals allocated to surgical intervention die of comorbidities and are excluded from the analysis—treated as withdrawals (ie, censored data)—the consequence will be a spuriously favourable “5 year survival.” In effect, people who die from adverse effects of diagnosis or treatment are not considered LC deaths, even though LC was a directly contributing cause. Survival of unoperated persons does not quantify overdiagnosis; it identifies individuals who, because their neoplasms are indolent, are at risk of overdiagnosis. Censoring non-LC deaths in persons with clinically silent LC excludes those who are, by definition, overdiagnosed.
Unlike the Sobue and Mayo Lung Programme analyses, in which the cause of death was established by the investigator’s review of medical records, the authors relied on causes of death abstracted from the National Center for Health Statistics database of consolidated death certificates from each state’s Vital Statistics Office, which are less accurate.18 19 For example, some states employ a hierarchical designation in which LC is assigned as the cause of death if it is listed as a prior diagnosis.
Yankelevitz and colleagues20 concluded that frequent overdiagnosis could be excluded in the Mayo Lung Project and Memorial Sloan Kettering studies because of the similarity of tumour volume doubling time (TVDT) of 87 cases of stage I LC to that of non-screen identified adenocarcinomas. The estimated median TVDT—101 days in the Mayo and 144 days in the Memorial study—was similar to the range of median TVDTs (61–269 days) reported for 97 non-screen identified lung adenocarcinomas compiled from several centres.21
The analysis assumes that all non-screen identified adenocarcinomas to which it compared the Mayo and Memorial screen identified adenocarcinomas proved or would have proven lethal. For example, a 1 cm SPN with a 144 day TVDT would require about 4 years to achieve a lethal size (see appendix A). Approximately one-third of healthy white male smokers in their late sixties would be expected to succumb to all causes within 4 years,22 during which time a 1 cm LC might remain clinically silent.
In addition to methodological limitations in computing TVDT described in the paper, the authors make an implicit assumption that growth rate, because of its relationship to lethality, is the sole determinant of overdiagnosis. Other variables influencing overdiagnosis are: patient age and presence of competing lethal comorbidities, tumour size at diagnosis,12 metastatic propensity and ability to circumvent immunological surveillance.14
Mulshine and Sullivan advanced the similarity between complementary DNA (cDNA) composition of screen identified and clinically identified LC23—implying similar biological potential—to support the view that overdiagnosis is “unlikely.”24 Bianchi and colleagues23 found a similar gene expression profile in 18 CT detected and 18 stage and cell type matched “symptom detected” cases. The accompanying editorial,25 which addressed a concern that CT detected LC might be overdiagnosed if they were of limited biological potential, stated, “ . . .quantitative real-time PCR and immunohistochemistry suggests that the aggressive biological potential of the CT detected cancers is similar to that of symptom-detected cases.”
Despite the limitation imposed by overmatching (see below), Bianchi and colleagues23 reported that nine genes (see table 2 in Bianchi and colleagues23) involved in tumour growth were differentially expressed in the CT detected versus the “symptom detected” group. “All were expressed in the symptomatic tumours at a lower level than in the CT scan detected cases and loss of expression of five of these genes is known to be associated with tumour progression.” The study design may have been flawed by overmatching. It largely (11 of 18 cases) compared cDNA phenotypes in persons with screen versus “symptom detected” stage IA LC. Stage IA is virtually never “symptom detected.”12 Ascertainment typically results from an incidental radiograph taken for unrelated reasons. In effect, they are comparable to screen detected cases and might therefore be expected to have a similar gene expression profile. A cDNA comparison with advanced LC of matched cell type would be more illuminating.
That the “aggressive biological potential” of CT detected LC can be inferred from its cDNA phenotype is open to question. “Often, the gene expression signatures obtained with the use of microarray analysis are difficult to interpret with respect to the biology of the underlying disease.”26 The biological potential of LC is a function of several known variables—size, growth rate, angiogenic and metastatic potential, ability to evade immunological surveillance, mitotic rate and mutation rate—among other factors.12 Some of these variables are, presumably, subject to epigenetic influences. None are directly measurable with cDNA microarray phenotyping. Growth rate is the simplest measure of biological potential. CT identified LC presenting as SPN exhibit a vast difference—almost 35-fold—in TVDT (see below), from 46 days to 4 years. It is implausible that the biological potential of LCs possessing similar cDNA profiles and widely divergent TVDTs are similar.
STUDIES ADVANCED IN SUPPORT OF A HIGH RATE OF OVERDIAGNOSIS
Two large randomised controlled trials allocated older male smokers to a screened versus an unscreened cohort following a prevalence CXR and cytological baseline screen to exclude persons with identifiable LC.
In the incidence portion of the Mayo Lung Programme,27 206 cases of LC were identified in the screened versus 160 in the unscreened cohort. The excess ((206−160)÷206 = 22%) 46 cases were characterised as “missing cases,” as a comparable number did not appear in the usual care cohort after extended follow-up.28 Eddy postulated that they were overdiagnosed (ie, the number of LCs was approximately equal in both cohorts), the usual care arm counterparts with unidentified LC dying of competing morbidities.29
Kubïk and colleagues30 undertook a similar study in Czechoslovakia. Following a baseline screen, male smokers were randomly allocated to semi-annual CXR and cytological screening or to an unscreened control cohort. At 3 years and annually for 3 years thereafter, both cohorts were CXR screened. The LC cases in the screened versus the initially unscreened cohort, after prolonged follow-up, was (108−82)÷108 = 24%.
LC screening is predicated on the assumption that resection of curable LCs, premised progenitors of advanced LC, will achieve a commensurate, reciprocal reduction in the latter (stage shift). Neither trial demonstrated this essential effect: in the Mayo Lung Project, almost twice as many LCs (94, 46%) were completely resected in the intervention than in the usual care cohort (51, 32%). However, the number of advanced LCs identified during the entire study—112 in the intervention versus 109 in the usual care cohort—was not reduced. Similarly, in the Czech trial, nine persons identified in the intervention during the initial 3 year screen underwent “curative resection” versus eight in the control cohort.31 At completion of the study, however, the number of resectable cases was similar in both cohorts: intervention 25/108 (23.1%); control 19/82 (23.2%).32 Fontana and colleagues33 summarised the three National Cancer Institute sponsored CXR and cytological screening studies, reporting that there were 303 advanced LC cases (stages III, IV) in the intervention groups compared with 304 in the controls.
Plausible explanations for failure of CXR screening to interdict advanced LC are:
The majority of advanced LCs originate in proximal endobronchial sites, where radiographic screening lacks sensitivity.
Advanced LCs that originate in the periphery of the lung, and are therefore potentially susceptible to early diagnosis, have such an aggressive natural history that early detection within practicable screening intervals is not achievable.
Metastasis may occur in some cancers before they reach an actionable size. Each explanation for the absence of stage shift implies that few of the excess screen identified, resectable LCs (principally stage I) were progenitors of advanced LC (ie, they were clinically irrelevant).
McFarlane and colleagues34 reported that the age adjusted incidence of previously unsuspected autopsy detected LC at Yale New Haven Medical Center was almost fourfold the community incidence rate in men and almost 15-fold higher in women. These “surprise” LCs neither caused nor contributed to the deaths of their patients.
In an updated review from the same site, Chan and colleagues35 reported that one in six autopsy detected LCs were unrecognised before death; that 1% of men autopsied in 1973–1982 had previously unsuspected LC; and that most (70%) were resectable (TNM stages 0 to II) and were therefore presumably asymptomatic. Two other autopsy series reported similar findings.36 37 These series demonstrate the existence of a large “reservoir” of CXR detectable, clinically irrelevant LC.
Chan’s analysis showed that the incidence of surprise LC was sixfold the estimated incidence of overdiagnosis—1.2/1000 p-y—in the Mayo study, indicating that the 22% estimate is conservative (appendix B). Autopsies are likely to underestimate the “reservoir” of CT identifiable malignant SPN: Dammas and colleagues38 reported that, among 28 persons who had a CT scan demonstrating an SPN within 2 months of death, nine (22%) were not identified in the autopsy report.
Histology and epidemiology:
If CT screening identified solely clinically relevant cancers, their histological and epidemiological features would approximate that of clinically identified cancer in the population.
Sone and colleagues39 reported on a mass population CT screening of Japanese men and women over age 40 years (median 64), smokers and non-smokers. They identified 23 histopathologically staged LCs in the initial (prevalence) screen: 21 were IA and two were IB. Sixteen were BAC or well differentiated adenocarcinomas; three poorly differentiated adenocarcinoma; four squamous carcinoma. The proportion of cases in never smokers and females (each 0.44%) exceeded that in smokers and males (each 0.40%). The detection rate in the 3 year annual screening programme—406/100 000 p-y—was 11-fold the annual LC mortality rate—37.3/100 000 p-y—in this country region of Japan.
Yang and colleagues,40 in a review of LC at the Mayo Clinic in 1997–2003, found that BAC constituted 3.1% of all cases (Ping Yang, MD, PhD, written communication, 2005). Swensen and colleagues41 reported that CT screening identified 11 cases of BAC (17% of screen identified LC), sixfold their clinical representation.
Bach and colleagues42 pointed out that “In the . . . ELCAP study, the number of cancers detected in the annual follow-up CT43 is far less than that detected during the initial scan, even though the size of the detected lesions is similar. This difference in rates of detection is inconsistent with what one would expect if all the lesions grew at similar rates (ie, all were aggressive malignancies). Instead, that the size of the prevalent lesions detected at initial scan are consistent with those found at subsequent scans but that the rates of detection are substantially higher suggests that a meaningful proportion of cancers detected during the initial scan would have behaved in an indolent manner”.
QUANTIFICATION OF OVERDIAGNOSIS
The belief that overdiagnosis is unlikely entails its biologically implausible corollary: that a screen diagnosis of LC confers virtual immunity to death from all other causes. While the magnitude of CXR screening overdiagnosis cannot be determined with certainty, the per cent of “missing cases” in the Mayo and Czech studies, 22 and 24, respectively, furnish a lower bound. In the Mayo study, 95% of the intervention cohort received CXRs every 4 months; the control group received “usual care”—more than half of whom elected to have one or more CXRs in the study period. The computation of putative overdiagnosis assumes that none of the usual care cases was overdiagnosed. This is likely to be incorrect because some cases in the usual care cohort were screen identified. If the same percentage (22) were overdiagnosed in the controls who had one or more CXRs in the last 2 years of the trial,33 there would be 125 clinically relevant LC. The percentage of overdiagnosed cases would then be (206−125)÷206 = 39%. No generally applicable figure, however, can be given for the magnitude of overdiagnosis, for it will be influenced by methodology (see below), smoking history, histological distribution, age, ethnicity and comorbidities of the screenees.
MITIGATION OF OVERDIAGNOSIS
Overdiagnosis can be reduced by excluding individuals with clinically discernable, potentially lethal comorbidities (eg, COPD) from screening. However, this strategy will diminish screening efficiency because the incidence of LC is far higher in persons with impaired pulmonary function,44 45 and a reduction in the absolute number of LCs will increase the proportion with false positive tests. For these reasons, Petty46 advocated screening the high risk group, persons with identifiable COPD. Subset screening creates an irresolvable dilemma: the benefit of reducing overdiagnosis by screening healthy smokers is offset by its diminished efficiency. The greater efficiency conferred by screening persons with COPD (who are likely to have concurrent, latent, coronary artery disease) is offset by a reduction in benefit and an increase in overdiagnosis; both are due to their lower life expectancy. Furthermore, limited pulmonary reserve will increase their surgical morbidity and diminish their post-surgical quality of life.
TUMOUR GROWTH KINETICS AND OVERDIAGNOSIS
Based on observations of tumour growth, both Weiss47 and Geddes12 calculated that the natural history of LC could be characterized by exponential growth, encompassing, at most, 40 tumour volume doublings (TVD)(appendix A)
Aoki and colleagues48 reported that, among resectable adenocarcinomas <3 cm in diameter, TVDT ranged from 42 to 1486 days, and half had a TVDT greater than 1 year. Hasegawa and colleagues49 reported a mean TVDT of 452 days (1.24 years) in 61 malignant SPN identified by mass CT screening. Based on these TVDTs, approximately 10 years would be expected to elapse between screening diagnosis at 1 cm and death from LC (assuming constant TVDT); for 0.5 cm, the interval would be about 14 years.
Life expectancy ( = median survival) in the USA at age 67 years—the median age of enrolment in the ELCAP trial2—in healthy male smokers is 16 years and in healthy female smokers 19 years.50 Based on these TVDT, and assuming a linear inverse relationship between age and life expectancy in this age group, one would predict that, after prescreening (to exclude individuals with significant comorbidities), 31% of males and 26% of females with a 1 cm SPN would die of all causes (of which LC would constitute a small per cent) had they not been screen identified.
INFLUENCE OF CT IDENTIFICATION ON OVERDIAGNOSIS AND OUTCOME
Because CT augments lead time bias and introduces a favourable biological (phenotypic—see below) bias, it will increase overdiagnosis and hence both survival and mortality. Bach and colleagues51 pooled the results of three one-armed CT screening trials and compared their outcomes with LC incidence and stage expectations modelled on SEER data. They reported 144 LC cases observed versus 44.5 expected. There was no diminution in advanced stage LC compared with the predicted values, implying that some of the 99.5 excess cases were overdiagnosed (99.5÷144 = 69%).
Lead time bias
With a mean TVDT of 1 year, 3 years are required for a tumour diameter to increase from 0.5 to 1 cm. For this reason, and because 1 cm tumours are frequently obscured in CXRs, CT, in comparison with CXR screening, will increase survival of stage I adenocarcinomas by approximately 3 years because of lead time bias. Under the dual assumptions that all-cause mortality remains constant and that earlier resection does not reduce the occurrence of occult metastases, 5 year stage IA LC fatality (≈30%) would be reduced by a factor of 3/5, increasing 5 year LC survival from ≈70% to ≈90%.
Non-LC mortality in the screened cohort of the Mayo trial27 was 21.6/1000p-y = all-cause mortality, 24.8/– LC-mortality, 3.2/. The ratio of non-LC-mortality to LC-mortality was (21.6÷3.2) = 7:1. Based on the assumption that CT increases lead time by 3 years, the percentage of individuals diagnosed with clinically irrelevant stage I LC would increase to an unquantifiable but substantial amount:(1) the lead time estimate is conservative;(2) persons who develop LC have more severe obstructive airways disease (and coronary heart disease52) than the cohort from which they are drawn, and would therefore be expected to have a higher non-LC-mortality;(3) pulmonary resection is likely to foreshorten the course of lethal cardiopulmonary comorbidities.53
CT screening introduces a phenotypic bias because of the identification of an LC population differing in histological composition and biological potential from those identified either clinically or by CXR screening.41 BACs have greater transradiancy because they retain the underlying air containing alveolar structure (lepidic growth). This “ground glass” appearance offers far less contrast with the surrounding lung.54 For this reason, they are less detectable with CXR than are adeno- and squamous carcinomas, which, by contrast, have a solid density (hilic growth). Thus BACs are overrepresented in CT screenings, relative to both their proportion in CXR screenings and reflecting their slow growth, to the proportion of LC that become clinically evident.
Shimizu and colleagues54 reported on 136 thin section CT evaluated air containing carcinomas under 20 mm in size, p-stage IA. One per cent showed pleural involvement; 2% vascular invasion; none showed lymphatic invasion. Relapse free 5 year survival was 100%. Hasegawa and colleagues49 reported on TVDT of CT identified small SPN according to their high resolution CT characteristics. The mean TVDT was 813 days among 19 with a pure ground glass appearance. The authors pointed out that a large number of these were CXR undetectable. Assuming no evolution to a more aggressive phenotype, a 1 cm BAC with an 813 TVDT would require 22 years to achieve a lethal size (10 cm).
Overdiagnosis and outcome
Although LC mortality is the most sensitive test of screening efficacy, outcome can best be gauged by complementing LC mortality with all-cause mortality.5 55 All-cause mortality assesses the benefit conferred by earlier intervention with the offsetting harm done to persons with overdiagnosed and understaged LC and false positive tests.
All-cause mortality was higher in the screened than in the control cohort in both the Mayo and Czech trials.55 Under the assumption that vital status has been ascertained in all participants after an adequate period of observation, which appears to be correct,28 30 this suggests that CXR with cytological screening exerts a negative net effect. To offset the postulated adverse effect of increased overdiagnosis, CT screening would have to substantially reduce the number of cases with occult metastases and diminish the absolute number with advanced (unresectable) LC. Whether it will achieve the former by advancing the diagnosis by about 3 TVD is unknown. No reduction in the number of advanced LC was achieved in the CXR imaging studies27 30; none is evident in preliminary analyses of CT screening.41 51 56 57
EFFICACY VERSUS EFFECTIVENESS
Mortality (efficacy), surgical morbidity and overdiagnosis will be far lower in screening trials conducted in centres of excellence, which systematically exclude persons with identifiable significant comorbidities41 than in population based settings (which gauge effectiveness).58
Although difficult to quantify, CXR with cytological LC screening results in a clinically important amount of overdiagnosis.
To improve outcome, CT screening, which will increase the proportion of overdiagnosed cases, will need to achieve a reduction in LC mortality sufficient to offset the immediate and long term adverse effects imposed by surgery in overdiagnosed, understaged and false positive cases.
Overdiagnosis will be greatest in series that confine their outcome assessment to CT identified stage I LC.
I am greatly indebted to Pamela Marcus, MS, PhD, epidemiologist, National Cancer Institute, for her critical reading of the manuscript and for her numerous incisive comments and contributions.
Tumour diameter can be computed from the exponential equation:
y = 0.001(1.26)x
whose derivation is supplied in a previous paper,59 where y is tumour diameter and x is the number of tumour volume doublings (TVD). The log form
x = ln1000y÷ln1.26
facilitates computation. The salient points agree with Geddes’12 comprehensive analysis (see table A1 and fig A1).
27 TVDs are required to achieve a 0.5 cm diameter, the intervention threshold in US CT trials.
30 TVDs are required to achieve a 1 cm diameter (0.5 g), the minimal threshold for CXR recognition, at which time the neoplasm has completed 75% of its natural history.
35 TVDs are required to achieve a diameter of 3 cm, the upper limit of stage IA LC.
At 40 TVDs, tumour diameter is 10 cm (500 g), a lethal size.
An increase in diameter from 0.5 to 1 cm requires an eightfold increase in volume, which is accomplished by 3 TVDs.
At a mean TVDT of 1 year, 10 years are required to grow from a CXR threshold of 1 to 10 cm.
The LC incidence in the Mayo screened cohort was 5.5/1000 p-y. The proportion (46/206) of putatively overdiagnosed cases is 0.22. Overdiagnosis incidence is therefore 0.22×5.5/1000 p-y = 1.2/1000 p-y or 16% of the incidence of clinically unsuspected LC discovered as a “surprise”(7.7/1000 p-y) in a concurrent US series of male autopsies.35
Four biases introduced by screening:
Lead time bias refers to earlier detection of disease, in which intervention, as long as it is not harmful, will lengthen survival without necessarily improving longevity.
Length biased sampling refers to the screen identification of a more phenotypically favourable population of LCs due to the lengthier preclinical detectable phase of slower growing cancers. It is most in evidence in prevalence screens. Preclinical detectable phase refers to the asymptomatic period during which LC is detectable.
Overdiagnosis bias designates screen detection of cancers that would not otherwise have become symptomatic or clinically apparent before the individual died of other causes.
Biological (phenotypic) bias designates the histologically and prognostically favourable population of cases identified by screening versus clinically diagnosed lung cancer. It is most evident in CT screening, in which well differentiated adenocarcinomas and bronchioloalveolar carcinomas are disproportionately represented. These histological types are far slower growing and therefore prognostically more favourable than the population of clinically diagnosed cancers, in which small cell, squamous cell and undifferentiated lung cancer are better represented.
Survival designates the per cent of cases alive x years—conventionally, 5 years—after diagnosis. It may be given as a cause specific or an all-cause value.
Fatality is the complement of survival (1−survival). Both survival and fatality may be reported as cause specific or all-cause values.
Mortality is a measure of longevity. It is a rate, usually age adjusted, expressed as the number of deaths, over time, in a population or sample. In common with survival, mortality may be provided as a cause specific or an all-cause figure. Non-LC mortality = (all cause mortality)–(LC-mortality).
Prevalence designates the number of cases at a given time in the sampled population.
Incidence designates the number of new cases over time, usually annually, in the sampled population.
P = I × D
Prevalence (P) can be computed as the product of incidence (I) and mean duration (D) of the condition. For example, the prevalence of stage IA LC would be the product of its incidence and the mean period of time it remains in stage IA.
Interval cancers are those that typically present symptomatically between screenings. Their absence on earlier screenings may reflect either exceedingly rapid growth or their location within the mediastinum or proximal bronchi, sites in which radiographic imaging lacks sensitivity.
Stage IA classified as T1NOMO, are peripheral tumours (T1)⩽3 cm, surrounded by lung or visceral pleura, with bronchoscopic evidence of spread no more proximal than the lobar bronchus. NO signifies no hilar or mediastinal metastases; MO indicates no discernable extranodal metastases. Stage IB (T2NOMO) is distinguished from IA by any of the following features:(1) T2 diameter >3 cm;(2) involvement of the main bronchus ⩾2 cm from the main carina;(3) invasion of the visceral pleura;(4) obstructive pneumonitis or atelectasis not involving the entire lung. The prefixes “c” and “p” refer to clinical versus surgical–pathological staging.
Stage migration(“Will Rogers effect”) refers to the stage advance imposed by diagnostic methods that are more sensitive (eg, surgical–pathological versus radiographic staging). Stage migration improves the prognosis for a given stage.
Stage shift refers either to a change in proportion of lower versus higher stages of cancer or to a reduction in the absolute number of advanced stage cases. Both a diminution in the proportion and a reduction in the absolute number of advanced stage cancers are necessary, although insufficient, indicators of screening efficacy. Lung cancer screening can be efficacious if, and only if, it achieves a reduction in the absolute number of advanced LC.
Hazard is the rate of death within 1 year for persons alive at the beginning of the year. Non-cancer hazard is the rate of death from some other cause. Non-cancer relative hazard is the ratio: non-cancer hazard of a population with cancer ÷ non-cancer hazard in a matched population.