Article Text
Statistics from Altmetric.com
Two large randomised controlled trials of screening for lung cancer with low-dose CT (LDCT)—the National Lung Screening Trial (NLST) and the Dutch-Belgian lung cancer screening trial (Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON) trial)—have both shown substantial lung cancer mortality reduction in the LDCT arm.1
Although there is now strong evidence that screening for lung cancer with LDCT reduces lung cancer mortality, concerns have been raised about the potential costs and harms of implementing annual lung cancer screening programmes. Annual screening with LDCT has been recommended in the USA, based on the evidence provided by the NLST design and modelling extrapolations.2 3 However, uptake of screening has been poor, with the latest data showing only 3.9% of those eligible have actually been screened.4 A cost-effectiveness analysis for Ontario, Canada, showed that annual screening scenarios were more cost-effective than biennial screening,5 but there is ongoing debate about whether all of those eligible for lung cancer screening actually require an annual LDCT.
NELSON was the only trial to compare the effects of different screening intervals between rounds. While the proportion of advanced stage cancers increased somewhat, a 2-year interval still had a good performance compared with a 1-year interval. However, there was an increase in interval cancers and a higher proportion of more advanced stage cancers for a 2.5-year interval compared with a 2-year interval suggesting that this may be too long.6
Risk prediction models have been suggested to select eligible participants at high risk of lung cancer and have been shown to be superior to selection using age and smoking status alone.7 8 Incorporating information from LDCT results may allow them to aid in the personalisation of the screening regimen. In NELSON, participants with negative LDCT results were less likely to have lung cancer detected at a subsequent screening round.9 Post-hoc analyses of the NLST showed that those with a negative baseline LDCT have a lower lung cancer incidence in subsequent rounds and may not therefore need annual screening.10 A recent publication by Tammemägi et al updated the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO)m2012 risk prediction model with LDCT results from NLST to derive a new model called PLCOm2012 results.11 They suggested that the addition of previous screening round findings to the PLCOm2012 risk score could be used to identify a lower risk group for whom biennial screening may be beneficial.
In this edition of Thorax, Yong et al 12 used data from NLST to look at the role of radiographic emphysema in those with a negative LDCT at baseline and after the first annual screen to see if it was possible to further distinguish high-risk subgroups. They used three well-validated risk stratification models which have been suggested to identify those at high risk for entry into screening- Bach, PLCOm2012 and Liverpool Lung Project, and looked at the effect of adding radiographic emphysema to these models. All of these models include some assessment of smoking status and age (which influence development of emphysema) and PLCOm2012 also includes self-reported chronic obstructive pulmonary disease (COPD). The authors showed that presence of radiographic emphysema is an independent predictor of lung cancer risk; those with emphysema had nearly double the hazard of lung cancer diagnosis at a subsequent screening round compared with those with no emphysema. Furthermore, the number needed to screen to detect one cancer was lower in the participants with radiological emphysema. This study supports the work previously discussed which suggests that additional risk stratification on the basis of LDCT, alongside risk prediction models, may be used to inform future screening interval and provide a more individualised and risk-based approach. While risk models may be used to identify those at high risk of developing lung cancer, it is also important to ensure that those invited for screening are likely to benefit as the two are not necessarily the same. For example, Young et al used data from NLST to stratify patients according to Global Initiative for Obstructive Lung Disease score, which is used to categorise patients with COPD into groups by severity of airflow limitation. They showed that the benefit of annual screening by LDCT was greatest in those with normal or mild to moderate impairment of lung function, while those with severe or very severe COPD showed no mortality benefit.13 It may be that future risk scores will also combine clinical data on comorbid conditions to ensure that only high-risk individuals who are likely to benefit from screening for lung cancer are invited. This will minimise harms from overdiagnosis in those who have a greater likelihood of dying of a non-lung cancer cause.
Pulmonary nodules are a common finding on CT scans; in the NELSON study, around 50% of those screened had at least one pulmonary nodule on the baseline LDCT and around 5%–7% of participants develop a new nodule each year.14 15 Accurate assessment of the malignant potential of a nodule is therefore important to minimise harm from over-investigation of benign nodules, particularly as over 99% of screen detected nodules are benign. One criticism of NLST was the high false-positive rate (23%)1 which generates costs both in terms of follow-up diagnostic procedures and psychological harm to the participant. A clinical demonstration project conduced in selected Veterans Health Administration hospitals in the USA reported that 60% of participants has a ‘positive’ scan; however, 55% of these positive results were for nodules <5 mm in diameter which would not routinely warrant follow-up.16 The use of semiautomated nodule volumetric measurements and robust, evidence-based nodule management protocols are therefore key to reducing false positives and potential harm. The British Thoracic Society (BTS) guideline for the investigation and management of pulmonary nodules advocates the use of the Brock score to assess risk of malignancy in a pulmonary nodule on a baseline LDCT.17 This was derived by McWilliams et al 18 using data from the Pan-Canadian Early Detection of Lung Cancer Study and has performed well across multiple external validation studies but in general calibration was not assessed, nor whether the model would benefit from recalibration in any of these studies. In this issue of Thorax, Winter et al 19 use data from the NLST to try to refine the Brock score to meet the requirements of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. This comprises a 22-item checklist and is designed to ensure that the results of prediction modelling are reported in a more standardised fashion. In their study, the Brock score performed well in external validation, with area under the curve (AUC) of 0.91 but was shown to overestimate the probability of cancer when recalibrated to the NLST dataset (who were a lower risk population than the derivation dataset). The authors therefore used the TRIPOD statement requirements to recalibrate and refine the Brock score. When the model was recalibrated, the sensitivity of the score remained the same but specificity improved and the authors suggest that using the recalibrated score would avoid 176 unnecessary investigations for one missed cancer in their population. A model’s estimated risk is dependent on the population it was estimated in, which is why model-specific thresholds are required.8 However, one can also argue that good application of a prediction model requires a population-specific threshold for that model; for which you need good calibration within that population. It is therefore important to check whether recalibration is needed. If the derivation population and new population have similar background risks and the effects of the various risk factors in the model are the same for both populations, then recalibration may not be required. However, if the baseline risk is different or the effect of a risk variable differs, then the model may benefit from updating. For external validation, one usually requires at least 100 events and 100 non-events but the exact number required to appropriately update a model will depend on which factor or factors differ between the two populations.20
Only one study has looked at validating the Brock score in the UK population and this was a single-centre study comprising 244 patients.21 It showed that the discrimination of the model was good in this group (AUC 0.902). In this population, most non-cancer cases using the Brock model had a risk of <5%, while most lung cancers had a risk of 20% or higher. There remained some overlap in the 5%–15% range, similar to the paper by Winter et al, which suggests that there may be a benefit in recalibration in the UK population. This is important given that the BTS guideline recommendation is to consider further investigation with Positron Emission Tomography CT (PET-CT) in those with a risk of ≥10%. If the specificity could be improved (particularly for risk thresholds between 5% and 15%) by recalibration without losing sensitivity, then this has the potential to reduce the proportion of patients who require additional investigation with PET-CT. This would reduce harm from radiation exposure and the psychological distress of undergoing additional investigations and also free up radiological resources.
Using the Brock score appropriately also relies on accurate radiological interpretation and reporting of the pulmonary nodules. A recent study asked 107 radiologists from 25 countries to evaluate 69 CT-detected nodules and showed radiological variation in the reporting of nodule type, assessment of size (particularly for smaller nodules), reporting of spiculation and also with regards to radiological management recommendations.22 It seems likely that moving forwards computer-aided detection and artificial intelligence (AI) systems will have an increasing role to play in the analysis of LDCT scans and in nodule detection and interpretation. Automated reporting systems may also help to mitigate some of the radiology workforce concerns which have been raised when implementation of screening for lung cancer is discussed and reduce variability in reporting.
Lung cancer screening risk prediction models are likely to evolve to incorporate previous LDCT results, possibly with the addition of AI reporting of parenchymal abnormalities such as emphysema and nodule interpretation, which in the future may be used to more accurately refine individual patient management. Validation, recalibration and testing of these models in clinical practice are needed to ensure that they work as well in a real-life setting as in a trial setting and also to assess the cost-effectiveness of implementing them.
References
Footnotes
Contributors EO'D and KtH co-wrote this editorial.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests KtH is a researcher affiliated with the NELSON trial and the Cancer Intervention and Surveillance Modelling Network. KtH was involved in a Health Technology Assessment study for CT Lung Cancer Screening in Canada (Dr Paszat, Cancer Care Ontario) and received a grant from the University of Zurich to assess the cost-effectiveness of CT lung cancer screening in Switzerland.
Patient consent for publication Not required.
Provenance and peer review Commissioned; externally peer reviewed.