Article Text

PDF

Population screening for lung cancer using computed tomography, is there evidence of clinical effectiveness? A systematic review of the literature
  1. Corri Black1,
  2. Robyn de Verteuil2,
  3. Shonagh Walker3,
  4. Jon Ayres2,
  5. Angela Boland4,
  6. Adrian Bagust4,
  7. Norman Waugh2
  1. 1Aberdeen Health Technology Assessment Group, Department of Public Health, University of Aberdeen, Foresterhill, Aberdeen, UK
  2. 2University of Aberdeen, Aberdeen, UK
  3. 3NHS Grampian, Aberdeen, UK
  4. 4University of Liverpool, Liverpool, UK
  1. Correspondence to:
    Dr C Black
    Aberdeen Health Technology Assessment Group, Department of Public Health, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, UK; Corri.black{at}abdn.ac.uk

Abstract

Lung cancer is the leading cause of death among all cancer types in the UK, killing approximately 34 000 people per year. By the time symptoms develop, the tumour is often at an advanced stage and the prognosis is bleak. Treatment at a less advanced stage of disease by surgical resection has been shown to substantially reduce mortality. Screening would be attractive if it could detect presymptomatic lung cancer at a stage when surgical intervention is feasible but has been the subject of scientific debate for the past three decades. The aim of this review was to examine the current evidence on the clinical effectiveness of screening for lung cancer using computed tomography. A systematic literature review searching 15 electronic databases and Internet resources from 1994 until December 2004/January 2005 was carried out. Information was summarised narratively. A total of 12 studies of computed tomography screening for lung cancer were identified including two RCTs and 10 studies of screening without comparator groups. The two RCTs were of short duration (1 year). None examined the effect of screening on mortality compared with no screening. The proportion of people with abnormal computed tomography findings varied widely between studies (5–51%). The prevalence of lung cancer detected was between 0.4% and 3.2% (number needed to screen to detect one lung cancer  = 31 to 249). Incidence rates of lung cancer were lower (0.1–1%). Among the detected tumours, a high proportion were stage I or resectable tumours, 100% in some studies. Currently, there is insufficient evidence that computed tomography screening is clinically effective in reducing mortality from lung cancer.

Statistics from Altmetric.com

Lung cancer is the leading cause of death among all cancer types in the UK, killing approximately 34 000 people each year and is the most common cancer worldwide accounting for around 1.2 million incident cases annually (http://www.who.int). Advances in chemotherapy and radiotherapy, in particular the introduction of continuous hyperfractionated accelerated radiotherapy, have resulted in modest improvements in outcomes for people with lung cancer but despite this the principal hope for curative treatment remains surgical resection.1,2 For the surgery to be effective, tumours need to be recognised early, before local invasion or remote spread of disease prevents resection. By the time symptoms develop the tumour is often at an advanced stage and the prognosis is bleak (<10% survival at 5 years).3

Carcinoma of the bronchus can be broadly classified into two main types: non-small-cell lung cancer (NSCLC) and small-cell lung cancer. NSCLC accounts for around 80% of all lung cancers and, because surgical resection is feasible if it is identified at an early stage, people with NSCLC have the greatest potential to benefit from any screening programme. Survival varies substantially with the clinical stage of the tumour at the time of diagnosis, with a 60–70% 5-year survival among those with stage 1A disease but <10% for those with stage III disease or worse; >50% of people with lung cancer present with tumours at stage III or later.3,4 Histologically, the most common type of NSCLC presenting clinically is squamous cell carcinoma. In the UK, it accounts for 35–45% of all lung cancers with adenocarcinoma and large-cell lung cancer accounting for around 15% and 10%, respectively.

The incidence of lung cancer rises with age and is rare below the age of 40 years. The highest incidence rate occurs in those aged >70 years with the median age of diagnosis in the UK being around 70 years.3 In the UK, lung cancer remains more common in men than women despite the incidence falling in men in recent years. Lung cancer now exceeds breast cancer as the most common cause of cancer deaths in women in the UK.5 The biggest single risk factor for lung cancer is smoking, with 85–90% of all people who present with lung cancer having a history of smoking.6 However, occupational exposures provide an often forgotten risk (with the exception of asbestos for mesothelioma). The International Agency for Research in Cancer (IARC) has identified 16 specific occupations, exposures or processes that they class as having “sufficient evidence” for being causes of lung cancer (IARC grade 1; table 1).7,8 So, while older smokers represent a potential group for screening for lung cancer, some workforce groups may also merit consideration.

Table 1

 Agents with sufficient evidence (IARC grade 1)to be classed as occupational lung carcinogens8

Despite the high number of annual deaths associated with lung cancer, and evidence to support the effectiveness of treatment for early disease, screening for lung cancer has been the subject of debate for the past three decades. Previous systematic reviews of screening for lung cancer using chest x ray, alone or in combination with sputum cytology, concluded that there is insufficient evidence of benefit in terms of reduction in disease-specific mortality.9–11 There were several problems with the research base: firstly, several of the studies did not use “no screening” as the comparator and instead compared intensive screening to less frequent screening, or chest x ray alone to chest x ray plus sputum cytology. These studies could not, therefore, assess the main issue of whether screening per se was better than no screening. Secondly, the studies, despite consistently showing improved survival, did not show improvements in disease-specific or total mortality. Comparison of survival may be subject to three types of bias: over-diagnosis, length bias and lead-time bias (box 1).12

Box 1: Defining bias in screening trials

  • Over-diagnosis bias*: Cancers are detected by screening that may never become symptomatic or detected within a patient’s lifetime in the absence of screening because of competing causes of mortality.

  • Length bias: Screening introduces a bias in relation to expected survival by detecting more patients with less aggressive disease (who have longer survival) and fewer with more aggressive disease, because the duration of asymptomatic disease is longer in less aggressive tumours.

  • Lead-time bias: Screening-detected patients are accorded extended survival times solely because cancer was detected earlier due to screening, though death occurred at the same time as would have happened without screening (ie, the intervention yields no benefit).

The past three decades have brought revolutionary developments in imaging techniques, in particular, the development of computed tomography. Increasing data-acquisition speeds have made it feasible to image the lungs within one breath hold, thus making movement artefacts a much less significant problem, enhancing the potential of computed tomography as a method of screening for lung cancer.13 The sensitivity of computed tomography to detect pulmonary nodules, using surgical exploration and histological analysis as the gold standard, has been reported to be 100% for intrapulmonary nodules >10 mm, 95% in nodules >5 mm and 66% in nodules ⩽5 mm.14

In screening, the balance between image quality and radiation dose is particularly important. To minimise the radiation dose, low-dose schedules have been developed (effective radiation dose 0.3–0.6 mSv,15 similar to that of mammography) without considerable negative effect on sensitivity or specificity.14,16 Discrete pulmonary nodules are the most commonly reported abnormality suggestive of malignancy but abnormal scarring and ground glass opacities are also recognised as potentially malignant changes. Unfortunately, computed tomography abnormalities are not specific for malignancy and most series report >90% of computed tomography nodules to be benign.17 Nodules suspicious of malignancy are often referred to as non-calcified nodules (NCNs) in the screening literature. The term NCN refers to the fact that a nodule cannot be regarded as non-malignant on the basis of the pattern of calcification.18

The potential of computed tomography as a screening tool for lung cancer has been the subject of several trials. The UK Health Technology Assessment (HTA) Programme, on behalf of the National Screening Committee, commissioned a literature review and economic assessment to update their policy statement regarding computed tomography screening published in 2004. Here, we report the findings of the systematic literature review. The full HTA report has been published as part of the monograph series.19

AIM

To systematically review and critically appraise the evidence for the clinical effectiveness of computed tomography screening for lung cancer.

METHODS

After preliminary searches showed a lack of reports from RCTs of screening versus no screening, we decided that all primary studies and systematic reviews evaluating computed tomography screening for lung cancer should be considered for inclusion. A sensitive search strategy including key and text word searches for the terms lung cancer, computed tomography examination and mass screening was constructed to search MEDLINE, Embase, Cochrane Database of Systematic Reviews, Cochrane CENTRAL Register of Controlled Clinical Trials, NHS EED, HTA database, DARE, Bandolier, Health Management Information Consortium, American Society of Clinical Oncology, Research Findings Register, National Horizon Scanning Centre, SCI, Web of Science Proceedings, and National Research Register. The register of projects held by INAHTA was also checked. For completeness, the search strategy was not restricted by language; where foreign language reports were identified, they were noted but translations were not sought. The searches were restricted to cover from January 1994 to January 2005. The bibliographies of included studies were hand-searched but authors of included studies were not contacted for further information. Systematic reviews were used as a source of references for primary studies.

Each title and abstract was reviewed against the inclusion and exclusion criteria (box 2) by two of the authors independently (CB and RdV). We did not include studies evaluating the use of methods for screening for lung cancer other than computed tomography, or studies evaluating the use of computed tomography for diagnostic or staging purposes in lung cancer.

Box 2: Inclusion/exclusion criteria

  • Screening for lung cancer was the principal theme of the paper.

  • Primary research (RCT, cohort or case–control) or systematic review.

  • Computed tomography screening compared with no screening (or, if a study included a comparison group that were screened using an alternative screening method then only data from the computed tomography screening arm of the study were included).

The checklists and methods described in Centre for Research and Dissemination Report 4 were used for assessing quality.20

We included studies from any country and made no restrictions based on age, sex or smoking history of the study participants. In this review, we were not simply interested in the effectiveness of computed tomography in detecting lung cancer but in the effectiveness in the context of a mass population-screening programme. The principal outcome of interest was the effect of the screening programme on lung cancer mortality and total mortality.

Data were extracted on detection of NCN, detection of lung cancer, histology and survival. We also sought outcomes likely to have a service effect—that is, follow-up requirements, quality of life issues and adverse events. We identified a priori several subgroups of interest defined by: age, sex, smoking status and occupation. Each is reported under a separate subheading.

We have reported two components of the screening programmes: baseline screening and subsequent screening rounds. Baseline screening, also referred to as prevalence screening, describes the first time a population is screened for lung cancer. Subsequent, or incidence, screening refers to all computed tomography examinations conducted at a known time interval and where only new or altered NCNs are reported.

RESULTS

Descriptive summary of included studies

A total of 12 studies of computed tomography screening for lung cancer were identified for inclusion in the review (fig 1). Several of these studies have been described in multiple publications. Table 2 summarises these 12 studies. Two RCTs were identified but one of these used a comparator group who received chest x ray screening.21,22 We therefore only included the experience from the computed tomography screened arm in the analysis. A further 10 studies without comparator groups were reported. Five of the 12 studies were conducted in the USA and three in Japan. Only five of the studies reported the findings after one round of computed tomography screening; the other seven reported after at least one subsequent screening round.

Table 2

 Summary of the 12 studies included in the review of clinical effectiveness

Figure 1

 Flow of studies through the review process.

In total, 25 749 people have participated in at least baseline computed tomography screening and, in all, 54 342 computed tomography examinations for screening have been undertaken.

All studies based the entry criteria for screening on three participant characteristics: age, smoking history and fitness for surgery. The youngest study participants reported were 40-years old. Two of the studies from Japan included smokers and non-smokers.32,42 The remaining studies restricted their screened populations to those with a smoking history of at least 10 “pack-years” (that is, an average of one pack of 20 cigarettes per day for 10 years). Three studies restricted screening to those who had been smokers within the 10 years before recruitment.22,37,43 Most studies reported some degree of assessment of fitness for surgery before proceeding to screening but none reported the proportion of people failing to meet this criteria. In addition to the population criteria identified above, one study included only workers who had been exposed to asbestos.44 A further three studies included subgroups (2–14% of the total study population) who had a history of asbestos exposure.17,30,43 None of these trials quantified the extent of asbestos exposure. One study which dealt with occupational exposure, reported a screening programme in a workforce used in the nuclear fuel industry with exposure to several risk factors including radiation, asbestos and beryllium.46

Quality of studies

There were three main issues of study quality that had implications for the interpretation of results. Firstly, none of the studies reported information about the representativeness of their samples. All samples were obtained on a volunteer basis and it is difficult to interpret how well these volunteers represent the general population. Secondly, the duration of follow-up was limited in most studies, with few presenting data beyond 2 years. Given the outcomes of interest, total and disease-specific mortality, this short duration is a problem. It was further complicated by the high attrition rates in the studies of longer duration, where compliance with the screening programme of annual computed tomography appeared to be poor. Finally, the lack of comparator groups in most of the studies meant that it was not possible to determine the effect of screening on lung cancer and total mortality rates in comparison with no screening.

Principal outcome: mortality rate

None of the studies reported total or disease-specific mortality rates for the screened population (or comparator populations where present). In the most part, follow-up was too short. Where several years of screening had occurred, follow-up was limited to those still participating in the screening programme. The one randomised trial comparing computed tomography screening to no screening was a pilot study and did not continue long enough to determine the effect on mortality. None of the other studies had comparator groups; therefore, change in disease-specific mortality between those screened with computed tomography and those not screened could not be assessed.

In the absence of any trial data regarding the effect of computed tomography screening on mortality, we summarise below what evidence exists regarding the effectiveness of screening in detecting lung cancer.

Other outcomes

Positive computed tomography examinations

The number of positive screenings in computed tomography examinations ranged from 5.1%32 to 51%,37 at baseline. Some of this variation could be explained by the variation between studies in the definition of a positive computed tomography result. However, even in the three studies which restricted the definition of positive to only those with NCNs at least >5 mm in diameter,41,42 two reported baseline screening to be positive in 5.9–6.8% of the population and the third44 reported baseline positive to be high (18.4%). Sone et al32 required the radiologists to identify lesions as “suspicious of cancer” or “indeterminate” where there was doubt, producing the lowest positive screening results (5.1%) but Sobue et al27, using similar definitions, found 11.5% of baseline computed tomography examinations to be positive. It therefore seems unlikely that the definition of a NCN alone can explain the variation described. Differences in common benign nodular lung conditions in certain countries—for example, histoplasmosis in the USA—may also contribute to the variations reported.

Seven studies reported results from incidence computed tomography screening. The incidence rate for positive computed tomography was lower than baseline (2.7–11.5%; table 2).

Detection of lung cancer

Between 1.8%43 and 32%41 of people with positive screening computed tomography examinations went on to receive a diagnosis of lung cancer.

A total of 215 patients with lung cancers were diagnosed as a result of baseline computed tomography screening (including one person falsely reassured by biopsy but confirmed later42). The highest prevalence of lung cancer was reported in the five studies from the USA (0.6–3.2%). Prevalence rates in the studies from Europe and Japan were generally lower (<1.1%), except for the German study with a reported prevalence of 2.1%.30 In the seven studies with incidence data, a further 87 cancers were identified. Incidence rates varied from 0.07%42 to 1%.41

Most lung cancers detected by baseline screening were NSCLC (205 of 215, 95.3%). At incidence screening, a similar histological pattern was seen (table 2).

Stage

Between 53% and 100% of tumours were identified as stage I disease at baseline screening (with the exception of the small Huuskonen study where none of the five tumours identified were stage I).44 For incident tumours, 63–100% (except Diederich where only three incidence tumours were detected30) were reported to be stage I disease. Where reported, the resectability of tumours found by screening was high, >78% in most studies.

Survival

Only one study27 reported 5-year survival; 76.2% of the people with cancer detected at baseline (n = 14) survived 5 years with 64.9% of the patients with incident computed tomography-detected cancers surviving 5 years (n = 22). In ELCAP, after 2 years of follow-up for tumours diagnosed at the baseline screening computed tomography, none had died from cancer (n = 27).17 Sone et al32 report two deaths among the 56 people diagnosed and surgically treated in the first 2 years of their screening programme (follow-up period estimated to be 2–3 years). In one of the occupational screening group studies (Huuskonen), survival was particularly poor, with only one surviving for >2 years (n = 6).44 Follow-up in the remaining studies was short and the duration of individual follow-up was not adequately reported in these studies to comment on survival.

Test accuracy results

One of the difficulties in estimating test accuracy was the absence of a short-term gold standard. Therefore, true cases of lung cancer were determined by tissue confirmation at biopsy or surgery, or, in a few cases, the diagnosis was based on detailed computed tomography enhanced by contrast medium (where a tissue sample was not possible). Truly negative results could only be determined by the absence of presentation with disease over a prolonged period (or at subsequent screening). Interval tumours were reported in three of the studies and some authors commented on whether, in retrospect, these lesions were visible at screening but had been missed. The positive predictive value was universally poor (<20%), regardless of the protocol adopted for defining a positive computed tomography, whereas the negative predictive value of computed tomography was high (>95% where it could be estimated). In the studies where it was possible to make some estimate of false negatives the sensitivity was around 80–90%.

Quality of life

None of the studies reported any data about quality of life in the screening participants, nor the effect of a false positive screening computed tomography.

At risk subgroups

Age

Three of the studies27,30,32 presented results by age band. All were consistent in not identifying cases of lung cancers in those aged <40 years. Higher rates were identified in those aged >60 years.

Gender

None of the studies reported findings separately for men and women.

Smoking

Two of the Japanese studies included smokers and non-smokers in the screening programme32,42 but only one32 reported results in sufficient detail to allow comparisons between smokers and non-smokers. Of the 5483 screening participants, 54% were people who had never smoked. The proportion of people at baseline screening with lung cancer who never smoked was 0.44%, similar to that of smokers (0.40%). In the non-smokers, tumours were more likely to be well-differentiated adenocarcinomas (90% of tumours in non-smokers v 48% in smokers, p<0.001). The proportion of adenocarcinomas is higher than usually seen in lung cancer.

Occupational exposure

Huuskonen reported a workforce-screening programme for 602 participants with asbestos-related disease identified in an earlier study of asbestos-exposed workers in Finland.44 Of the workers screened, 65 (11%) required post-computed tomography follow-up, six were found to have lung cancer (1% lung cancer prevalence), five of whom died within 21 months of diagnosis. None of the other studies reported their findings for asbestos-exposed participants separately.

A US study in the nuclear fuel industry workforce showed 32% with positive computed tomography examinations on screening but only 0.61% were diagnosed with lung cancer.46 The low prevalence of cancer may reflect the screened population being restricted to working age although the healthy-worker effect may also have contributed.

The two studies do not report the age distribution of the screened population in any detail. Neither workforce study reported the extent of exposure to potential carcinogens among the workforce.

Adverse events

General

Adverse events, as a result of any part of the screening programme, were poorly reported. Only Gohagan et al47 fully reported adverse events potentially associated with investigation for positive screening computed tomography. Six patients experienced adverse events (a total of eight complications: three pneumothorax; two infections/fever requiring antibiotics; one atelectasis; one stroke; one acute respiratory failure) out of a total of 1586 screenings. Kaneko et al28 reported two deaths in the 6 months after surgery from infection in the absence of tumour recurrence. No surgical deaths or surgery-related morbidity were reported.

Incidental findings

Reporting of incidental findings (ie, other than lung cancer) was variable and related in part to the specified screening protocol. Two studies reported in detail findings other than NCNs showing 14%37 and 49.2%43 with non-NCN findings that merited further investigation. Swensen et al37 reported 17 other cancers, 35 adrenal masses and 33 renal masses among the 817 screened individuals. MacRedmond et al43 reported COPD (29%) and coronary artery calcification (14.3%) as the most common findings.43 Sobue et al27, who only reported other malignancies, identified 14 additional malignancies of the chest wall and mediastinum among the 7891 computed tomography screening examinations conducted.

Service implications of screening

Management of positive findings on screening varied but generally involved either follow-up by further computed tomography to look for change in size, or biopsy. Almost all of those with positive computed tomography screening were recommended for follow-up with at least one further computed tomography. Of positive screenings, 3.7–27% underwent biopsy after baseline computed tomography. After an incidence computed tomography, 4.9–33% of positive screenings underwent biopsy (table 2).

While several studies recognised the substantial costs to health services in terms of screening per se and the follow-up of large numbers of positive screening computed tomography examinations, none reported the costs beyond that of conducting the screening computed tomography (including the examination itself, staff and reporting costs). No discussion of quality control criteria or mechanisms, administration requirements or effect on oncology or surgical services was reported.

DISCUSSION

Evidence from RCTs is one of the criteria used by the UK National Screening Committee to evaluate screening technologies because the RCT is regarded as the gold standard design with the lowest risk of bias.20 In the absence of RCT evidence of adequate duration, the clinical effectiveness of computed tomography screening for lung cancer remains unclear. Screening did identify more stage I disease than the literature reports for series of lung cancer in the absence of screening. Resection rates were high (89–100%), substantially higher than the current UK resection rates of <10%,4,48 but this only implies a reduction in disease-specific mortality if we accept the assumption that screening-detected lung cancer is essentially the same as lung cancer that presents clinically and, as such, is a universally aggressive and fairly rapidly progressing condition.

However, there are several pieces of evidence from the screening studies that raise doubts about the validity of this assumption. Firstly, the histopathology of tumours among smokers detected by screening was not typical of that recognised in clinical practice, with a higher than usual prevalence of adenocarcinoma. Further, the Japanese studies that included non-smokers32,42 identified a rate of cancer similar among smokers and non-smokers. This is not in keeping with experience from tumours presenting clinically, where >80% occur in smokers. In the non-smokers, the tumours identified were again well-differentiated adenocarcinoma or bronchoalveolar carcinoma.

The natural history of these well-differentiated, small, screening-detected adenocarcinomas is not yet well understood but raises the possibility that screening is detecting tumours that are different from those seen presenting clinically and that some screening-detected tumours may never have caused symptoms or death, and would therefore have gone undetected in the absence of screening (ie, over-diagnosis). This may be because computed tomography is better at picking up peripheral tumours, which are more likely to be adenocarcinomas than central ones.

From a health service perspective, one of the largest challenges of introducing computed tomography screening for lung cancer would come from the substantial number of positive computed tomography examinations, particularly at baseline screening. As a result, a high proportion of screening participants in the reviewed studies underwent further follow-up, either by further computed tomography or biopsy. The rate of biopsy for benign disease varied with different follow-up protocols (table 2).

None of the studies were of sufficient duration to assess the risk associated with radiation exposure. None of the studies explored the psychological or quality of life effect of the high false positive rates of screening computed tomography, in particular the potentially long periods of uncertainty while follow-up was undertaken.

Substantial variation was seen between studies in terms of the proportion of screenings with positive computed tomography examinations and the rate of cancer detected. The difference in results is to some extent explained by the different definitions of a positive computed tomography examination. Despite this, where similar definitions were used, variation still existed, and therefore caution must be used when generalising data from one country to another.

To show a reduction in mortality, RCT evidence or at least a population-based study with a similar population-based control group, will be necessary. We identified several proposed RCTs in the literature.49–52 Several of these appear to have had funding difficulties. A USA trial (National Lung Screening Trial www.cancer.gov/nlst) was launched in 2002, with recruitment completed in April 2004 and results of screening available in the next few years.49 The status of other studies was not clear from the literature.

In terms of what information is needed to inform a decision about the clinical effectiveness of computed tomography screening for lung cancer, we can identify the following research needs:

  1. Controlled trial evidence that computed tomography screening reduces mortality either with whole population screening, or for particular subgroups.

  2. There is a need to better understand the natural history and epidemiology of screening-detected lung cancers, particularly small, well-differentiated adenocarcinomas.

  3. Assessment of any morbidity improvements arising from early detection and the quality of life effects of a positive screen while waiting for a diagnosis (whether eventually malignant or benign).

In conclusion, the current evidence base for computed tomography screening for lung cancer is insufficient to show clinical effectiveness in terms of a reduction in mortality. The ongoing uncontrolled studies may provide a better understanding of the natural history of computed tomography screening-detected NCNs and lung cancer.

Acknowledgments

Acknowledgements to Lynda Bain for conducting the literature searching, to Prof D Godden and Prof J Weir for their expert advice. We also acknowledge Sain Thomas for her contribution to the data extraction. We thank Prof AK Dixon, Dr T Eisen (on behalf of the NCRI Lung Clinical Studies Group) and Dr J Baird for their comments on the original HTA Monograph from which this paper is derived.

REFERENCES

View Abstract

Footnotes

  • * Patients with lung cancer have, because of age and smoking, high mortality from other causes such as ischaemic heart disease.

  • Funding: Full systematic review and economic assessment was funded by the NHS R&D Health Technology Assessment Programme.

  • Competing interests: None.

  • The views and opinions expressed do not necessarily reflect those of the Department of Health.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.