The effect of different radiological models on diagnostic accuracy and lung cancer screening performance

Henry M Marshall; Henry Zhao; Rayleen V Bowman; Linda H Passmore; Elizabeth M McCaul; Ian A Yang; Kwun M Fong

doi:10.1136/thoraxjnl-2016-209624

Article Text

PDF

Research letter

The effect of different radiological models on diagnostic accuracy and lung cancer screening performance

Free

http://orcid.org/0000-0002-9626-8014Henry M Marshall,
Henry Zhao,
Rayleen V Bowman,
Linda H Passmore,
Elizabeth M McCaul,
Ian A Yang,
Kwun M Fong

University of Queensland Thoracic Research Centre and Department of Thoracic Medicine, The Prince Charles Hospital, Queensland, Australia

Correspondence to Dr Henry M Marshall, University of Queensland Thoracic Research Centre, Department of Thoracic Medicine, The Prince Charles Hospital, Rode Rd, Chermside, QLD 4032, Australia; henry.marshall{at}health.qld.gov.au

Abstract

High false-positive (FP) scan rates associated with low-dose computed tomography (LDCT) lung cancer screening result in unnecessary follow-up tests and exposure to harm. The definition of a ‘positive’ scan can impact FP rates and screening performance. We explored the effect of Lung Imaging Reporting and Data System (Lung-RADS) criteria, PanCan Nodule Malignancy Probability Model and varying nodule size thresholds (≥4 mm, ≥6 mm, ≥8 mm) on diagnostic accuracy and screening performance compared with original trial definitions (National Lung Screening Trial (NLST) criteria) in a secondary analysis of a lung cancer screening cohort. We found Lung-RADS criteria and the PanCan Nodule Malignancy Probability Model could substantially improve screening performance and reduce FP scan rates compared with NLST definitions of positivity but that this needs to be balanced against possible risk of false-negative results.

Trial registration number Australian New Zealand Clinical Trials Registry, ACTRN12610000007033.

Lung Cancer
Imaging/CT MRI etc

https://doi.org/10.1136/thoraxjnl-2016-209624

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Background

The landmark National Lung Screening Trial (NLST)1 demonstrated lung cancer mortality reduction by low-dose computed tomography (LDCT) screening, paving the way for US Preventative Services Taskforce and Medicare screening recommendations. However, a major limitation of LDCT screening is the high positive scan rate, averaging 24.2% in NLST (threshold axial diameter ≥4 mm)1; furthermore, over 95% of these nodules were benign.1 Detected nodules require radiological follow-up creating a burden on healthcare systems and exposing participants to potential harm.

Strategies to improve this situation could include the following: (1) increasing the threshold nodule size, reflecting lower cancer risk in smaller nodules,2 ,3 (2) estimating the probability of malignancy using the PanCan multivariable Nodule Malignancy Probability Model (incorporating nodule size, location, attenuation, total count, spiculation, participant age, sex, family history of lung cancer and emphysema, henceforth referred to as the PanCan Model)4 and (3) categorisation using the American College of Radiology Lung Imaging Reporting and Data System (Lung-RADS).5 The Lung-RADS classification is based on nodule average diameter, density and growth and also recommends the use of the PanCan Model to guide management of larger nodules (category 4B).

Larger nodule size thresholds are simple to implement but ignore other risk factors.2 ,3 The PanCan Model is more complex to administer but appears highly discriminatory4 ,6 ,7 and is recommended in British Thoracic Society guidelines8 (suggested cut-point for investigation ≥10% risk). The Lung-RADS system is simple to use and improves false-positive rate (1–specificity, FPR) at the cost of lower sensitivity.9 ,10 Until now, no study has simultaneously compared these competing methods of nodule assessment on false-positive (FP) scan rates and other screening performance metrics.

Aim

To evaluate the impact of different definitions of positive scan results on screening performance by retrospective application of Lung-RADS, the PanCan Model and various nodule size thresholds to screening cohort data.

Method

Participant eligibility and recruitment have been described elsewhere.11 ,12 Two hundred and fifty-six healthy current or former smokers (171 men; 85 women) aged 60–74 years were enrolled in a LDCT screening study; median age 64.5 years; median pack-years of smoking 55 and 47% current smokers. Participants received baseline scans (T₀) and up to two annual incidence scans (T₁ and T₂) using a 64-detector helical CT based on NLST protocols. Criteria for scan positivity, based on NLST, were any nodule ≥4 mm diameter (baseline scan) and any new or growing nodule (incidence scan).11 ,12 Indeterminate nodules underwent 2 years radiological follow-up. Health status follow-up continued for 5 years. Data were censored 1 March 2016.

Scan results were reclassified using Lung-RADS (category 3 or 4 considered positive), the PanCan Model (full model with spiculation, ≥10% risk considered positive) and increasing nodule size thresholds (exploratory cut-points set at maximum diameter of ≥4 mm, ≥6 mm or ≥8 mm and labelled d4, d6 and d8, respectively). Performance metrics were calculated at the participant level using the scan result and NLST Lung Cancer Status definition (cancer present; cancer absent13). Scans were excluded if cancer status could not be determined (eg, participant lost to follow-up). Model discrimination, assessed using the area under the receiver operating characteristic curve (AUC), and calibration assessed by visual plot and Hosmer-Lemeshow test, were calculated at the nodule level.

95% CIs were calculated for sensitivity, specificity, positive predictive value and negative predictive value (Clopper-Pearson exact method14) and FP rate reduction (1000 bootstrapped samples). AUCs were estimated and compared non-parametrically using the method of Obuchowski which accounts for clustering of nodules within individuals and correlation between AUC comparisons.15 Statistical analysis was performed using R V.3.2.4.

Results

Baseline scans

Two hundred and fifty-six participants received T₀ scans (table 1); 136 participants had 301 nodules (median diameter 5.4 mm, range 4.0–34.0 mm). One hundred and twenty-eight (50.0%) scans were deemed positive (127 scans without historical images demonstrating nodule stability plus one scan with suspicious non-nodule findings). Lung cancer was diagnosed in five (2.0%) participants. One individual with missing lung cancer status was excluded. PanCan, Lung-RADS, d4, d6 and d8 correctly identified all participants with lung cancer. PanCan and d8 produced the greatest reduction in FP scans (89.4% reduction, 95% CI 83.6 to 95.2 and 78.0% reduction, 95% CI 69.4 to 86.6, respectively), improving specificity and positive predictive value without reducing sensitivity or negative predictive value (table 2).

View this table:

Table 1

Lung cancer screening scan results: comparison of Queensland Lung Cancer Screening Study original definitions, Lung-RADS criteria, PanCan Model and differing nodule size thresholds

View this table:

Table 2

Lung cancer screening scan performance metrics: comparison of Queensland Lung Cancer Screening Study original definitions, Lung-RADS criteria, PanCan Model and differing nodule size thresholds

Incidence scans

One hundred and fifty-six of 472 (33.1%) incidence scans (239 T₁; 233 T₂) were positive according to our original criteria (new nodule of any size or growth detected) (table 1). Lung cancer was diagnosed in three participants (1.3%) at T₁ and four (1.7%) at T₂. Six individuals without lung cancer status were excluded. Increasing nodule size threshold for a positive scan result reduced FPR but also reduced sensitivity. Lung-RADS and d4 had similar performance characteristics (table 2).

Downstream effects

At baseline, the PanCan Model would have avoided 110/127 (86.6%) interval CT scans and 3/5 positron emission tomography (PET) scans in participants without cancer; Lung-RADS would have avoided 75 interval CT scans and no PET scans. At the incidence rounds, The Queensland Lung Cancer Screening Study generated 229 interval CT scans from FP nodules and Lung-RADS would have reduced this number to 44. Across three rounds of screening, Lung-RADS would have avoided 260/356 (73.0%) interval CT scans, 2/7 (28.6%) PET scans and 2/3 (66.7%) bronchoscopies in participants without cancer. However, Lung-RADS would have misclassified one participant with lung cancer present at the T₂ screening round (tables 1 and 3).

View this table:

Table 3

Downstream tests generated from false-positive (FP) scan results

Model performance

Ten baseline nodules in eight participants were diagnosed as lung cancer during follow-up (median 29.9 months, range 2.0–69.7). PanCan Model discrimination was very good in the 301 baseline nodules (AUC 0.90; 95% CI 0.75 to 1) but not statistically different from Lung-RADS (AUC 0.84; 95% CI 0.69 to 0.98; difference in AUC=0.06, 95% CI −0.04 to 0.16, p value=0.25). Visual plot and Hosmer-Lemeshow goodness-of-fit test (χ²=8.8, df=8, p=0.36) did not indicate significant miscalibration.

Discussion

As lung cancer screening gains traction internationally, attention focuses on minimising harm and controlling costs by reducing FP scan rates. In this exploratory comparative study, stricter definitions of positivity decreased FP results, improved performance metrics and reduced downstream tests at the risk of increasing false-negative scans at the incidence round.

At baseline, all methods were highly sensitive. Increasing nodule size thresholds and Lung-RADS improved specificity to 76–89%, but the PanCan Model had the highest specificity of 94.8%. At incidence scan rounds, stricter size definitions improved specificity and reduced FP scans at the cost of lower sensitivity. FPR was reduced by a similar degree to that seen in a retrospective application of Lung-RADS to NLST data (52–89% and 74–92% reduction at baseline and incidence rounds, respectively, compared with 52% and 76%, respectively10).

Since baseline scans have no historical comparison against which to assess growth, they inherently generate more positive scans and downstream tests than incidence scans.1 In contrast, incident nodules are more likely to be malignant.16 In the absence of a validated multivariable risk model for incidence scans, a more conservative, smaller size threshold compared with the baseline scan, such as recommended by Lung-RADS, seems prudent. However, defining an optimal threshold at prevalence and incidence rounds goes beyond the pure metrics of the test and requires a health economic perspective which may differ between countries and healthcare settings. All measurements in this study were based on maximum axial diameter yet volumetric nodule analysis reportedly gives a more accurate assessment of size and interval growth. The NELSON trial compared diagnostic accuracy between diameter-based management and volume-based management and found similar sensitivity but higher specificity for a volumetric approach17 (diameter sensitivity 92·4% and specificity 90·0%; volumetry sensitivity 90·9% and specificity 94·9%). The diameter measurements were generated from volumetric software results and it is possible this may have underestimated the true difference by avoiding the variability associated with human reader measurements. The true advantage of volumetry, which requires specialised software analysis, against a multivariable risk assessment merits further investigation.

Our study is the first to simultaneously compare these differing methods of scan classification. It has limitations inherent to retrospectivity and relatively small cohort size; however, participants were well characterised and long, near-complete follow-up made cancer status misclassification unlikely.

In conclusion, we demonstrate lung cancer screening performance and FP scan rate can be improved by varying the definition of a positive scan balanced against possible false-negative results. At baseline, the PanCan Model yielded best results, whereas at incidence scan, d4 followed by Lung-RADS yielded the best specificity and sensitivity.

Acknowledgments

The Queensland Lung Cancer Study team based at The University of Queensland Thoracic Research Centre, The Prince Charles Hospital, Brisbane: Dr John Ayres, BM, FRCR; Dr Jane Crossin, MB BCh, FRCR, FRANZCR; Dr Melanie Lau, MBBS, FRANZCR; Adjunct Professor Richard E Slaughter, MBBS, FRANZCR; Stanley Redmond, Dip App Sci; Deborah Courtney, BN; Dr Steven C Leong, MBBS, FRACP; Dr Morgan Windsor, MBBS, FRACS; Associate Professor Paul V Zimmerman, BSc, MBBS, FRACP, MD. Patients and staff at The Prince Charles Hospital, Brisbane.

References

↵
1. Aberle DR,
2. Adams AM,
3. Berg CD, et al
., National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395–409. doi:10.1056/NEJMoa1102873
OpenUrl CrossRef PubMed Web of Science
↵
1. Gierada DS,
2. Pinsky P,
3. Nath H, et al
. Projected outcomes using different nodule sizes to define a positive CT lung cancer screening examination. J Natl Cancer Inst 2014;106:dju284. doi:10.1093/jnci/dju284
OpenUrl CrossRef PubMed
↵
1. Henschke CI,
2. Yip R,
3. Yankelevitz DF, et al
. Definition of a positive test result in computed tomography screening for lung cancer: a cohort study. Ann Intern Med 2013;158:246–52. doi:10.7326/0003-4819-158-4-201302190-00004
OpenUrl CrossRef PubMed Web of Science
↵
1. McWilliams A,
2. Tammemagi MC,
3. Mayo JR, et al
. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369:910–19. doi:10.1056/NEJMoa1214726
OpenUrl CrossRef PubMed Web of Science
↵
American College of Radiology. Lung CT Screening Reporting and Data System (Lung-RADS). http://www.acr.org/Quality-Safety/Resources/LungRADS (accessed 21 Oct 2016).
↵
1. Winkler Wille MM,
2. van Riel SJ,
3. Saghir Z, et al
. Predictive accuracy of the PanCan lung cancer risk prediction model -external validation based on CT from the Danish lung cancer screening trial. Eur Radiol 2015;25:3093–9. doi:10.1007/s00330-015-3689-0
OpenUrl
↵
1. Al-Ameri A,
2. Malhotra P,
3. Thygesen H, et al
. Risk of malignancy in pulmonary nodules: a validation study of four prediction models. Lung Cancer 2015;89:27–30. doi:10.1016/j.lungcan.2015.03.018
OpenUrl CrossRef PubMed
↵
1. Callister MEJ,
2. Baldwin DR,
3. Akram AR, et al
. British Thoracic Society guidelines for the investigation and management of pulmonary nodules: accredited by NICE. Thorax 2015;70(Suppl 2):ii1–54. doi:10.1136/thoraxjnl-2015-207168
OpenUrl FREE Full Text
↵
1. McKee BJ,
2. Regis SM,
3. McKee AB, et al
. Performance of ACR Lung-RADS in a clinical CT lung screening program. J Am Coll Radiol 2015;12:273–6. doi:10.1016/j.jacr.2014.08.004
OpenUrl CrossRef PubMed
↵
1. Pinsky PF,
2. Gierada DS,
3. Black W, et al
. Performance of Lung-RADS in The National Lung Screening Trial: a retrospective assessment. Ann Intern Med 2015;162:485–91. doi:10.7326/M14-2086
OpenUrl
↵
1. Marshall HM,
2. Bowman RV,
3. Ayres J, et al
. Lung cancer screening feasibility in Australia. Eur Respir J 2015;45:1734–7. doi:10.1183/09031936.00208714
OpenUrl Abstract/FREE Full Text
↵
1. Marshall HM,
2. Bowman RV,
3. Crossin J, et al
. Queensland lung cancer screening study: rationale, design and methods. Intern Med J 2013;43:174–82. doi:10.1111/j.1445-5994.2012.02789.x
OpenUrl CrossRef PubMed
↵
1. Church TR,
2. Black WC,
3. Aberle DR, et al
., The National Lung Screening Trial Research Team. Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med 2013;368:1980–91. doi:10.1056/NEJMoa1209120
OpenUrl CrossRef PubMed Web of Science
↵
1. Clopper CJ,
2. Pearson ES
. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934;26:404–13. doi:10.1093/biomet/26.4.404
OpenUrl CrossRef
↵
1. Obuchowski NA
. Nonparametric analysis of clustered ROC curve data. Biometrics 1997;53:567–78. doi:10.2307/2533958
OpenUrl CrossRef PubMed Web of Science
↵
1. Walter JE,
2. Heuvelmans MA,
3. de Jong PA, et al
. Occurrence and lung cancer probability of new solid nodules at incidence screening with low-dose CT: analysis of data from the randomised, controlled NELSON trial. Lancet Oncol 2016;17:907–16. doi:10.1016/S1470-2045(16)30069-9
OpenUrl CrossRef PubMed
↵
1. Horeweg N,
2. van Rosmalen J,
3. Heuvelmans MA, et al
. Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol 2014;15:1332–41. doi:10.1016/S1470-2045(14)70389-4
OpenUrl CrossRef PubMed

View Abstract

Footnotes

Contributors HMM had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: HMM, KMF, IAY, RVB. Acquisition, analysis or interpretation of data: all authors. Drafting of the manuscript: HMM, HZ. Critical revision of the manuscript for important intellectual content: KMF, IAY, RVB. Statistical analysis: HMM, HZ. Obtained funding: KMF, IAY, RVB. Study supervision: KMF, IAY, RVB.
Funding National Health and Medical Research Council (Practitioner Fellowship 1019891 (KMF); Career Development Fellowship 1026215 (IAY); Medical PhD Scholarship 631306 (HMM)); Smart State Project Grant, Queensland Health; National Centre for Asbestos Related Diseases Project Grant and The Prince Charles Hospital Foundation.
Competing interests None declared.
Ethics approval The Prince Charles Hospital human research ethics committee.
Provenance and peer review Not commissioned; externally peer reviewed.

[1] ↵
Aberle DR,
Adams AM,
Berg CD, et al
., National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395–409. doi:10.1056/NEJMoa1102873
OpenUrl CrossRef PubMed Web of Science

[2] Aberle DR,

[3] Adams AM,

[4] Berg CD, et al

[5] ↵
Gierada DS,
Pinsky P,
Nath H, et al
. Projected outcomes using different nodule sizes to define a positive CT lung cancer screening examination. J Natl Cancer Inst 2014;106:dju284. doi:10.1093/jnci/dju284
OpenUrl CrossRef PubMed

[6] Gierada DS,

[7] Pinsky P,

[8] Nath H, et al

[9] ↵
Henschke CI,
Yip R,
Yankelevitz DF, et al
. Definition of a positive test result in computed tomography screening for lung cancer: a cohort study. Ann Intern Med 2013;158:246–52. doi:10.7326/0003-4819-158-4-201302190-00004
OpenUrl CrossRef PubMed Web of Science

[10] Henschke CI,

[11] Yip R,

[12] Yankelevitz DF, et al

[13] ↵
McWilliams A,
Tammemagi MC,
Mayo JR, et al
. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369:910–19. doi:10.1056/NEJMoa1214726
OpenUrl CrossRef PubMed Web of Science

[14] McWilliams A,

[15] Tammemagi MC,

[16] Mayo JR, et al

[17] ↵
American College of Radiology. Lung CT Screening Reporting and Data System (Lung-RADS). http://www.acr.org/Quality-Safety/Resources/LungRADS (accessed 21 Oct 2016).

[18] ↵
Winkler Wille MM,
van Riel SJ,
Saghir Z, et al
. Predictive accuracy of the PanCan lung cancer risk prediction model -external validation based on CT from the Danish lung cancer screening trial. Eur Radiol 2015;25:3093–9. doi:10.1007/s00330-015-3689-0
OpenUrl

[19] Winkler Wille MM,

[20] van Riel SJ,

[21] Saghir Z, et al

[22] ↵
Al-Ameri A,
Malhotra P,
Thygesen H, et al
. Risk of malignancy in pulmonary nodules: a validation study of four prediction models. Lung Cancer 2015;89:27–30. doi:10.1016/j.lungcan.2015.03.018
OpenUrl CrossRef PubMed

[23] Al-Ameri A,

[24] Malhotra P,

[25] Thygesen H, et al

[26] ↵
Callister MEJ,
Baldwin DR,
Akram AR, et al
. British Thoracic Society guidelines for the investigation and management of pulmonary nodules: accredited by NICE. Thorax 2015;70(Suppl 2):ii1–54. doi:10.1136/thoraxjnl-2015-207168
OpenUrl FREE Full Text

[27] Callister MEJ,

[28] Baldwin DR,

[29] Akram AR, et al

[30] ↵
McKee BJ,
Regis SM,
McKee AB, et al
. Performance of ACR Lung-RADS in a clinical CT lung screening program. J Am Coll Radiol 2015;12:273–6. doi:10.1016/j.jacr.2014.08.004
OpenUrl CrossRef PubMed

[31] McKee BJ,

[32] Regis SM,

[33] McKee AB, et al

[34] ↵
Pinsky PF,
Gierada DS,
Black W, et al
. Performance of Lung-RADS in The National Lung Screening Trial: a retrospective assessment. Ann Intern Med 2015;162:485–91. doi:10.7326/M14-2086
OpenUrl

[35] Pinsky PF,

[36] Gierada DS,

[37] Black W, et al

[38] ↵
Marshall HM,
Bowman RV,
Ayres J, et al
. Lung cancer screening feasibility in Australia. Eur Respir J 2015;45:1734–7. doi:10.1183/09031936.00208714
OpenUrl Abstract/FREE Full Text

[39] Marshall HM,

[40] Bowman RV,

[41] Ayres J, et al

[42] ↵
Marshall HM,
Bowman RV,
Crossin J, et al
. Queensland lung cancer screening study: rationale, design and methods. Intern Med J 2013;43:174–82. doi:10.1111/j.1445-5994.2012.02789.x
OpenUrl CrossRef PubMed

[43] Marshall HM,

[44] Bowman RV,

[45] Crossin J, et al

[46] ↵
Church TR,
Black WC,
Aberle DR, et al
., The National Lung Screening Trial Research Team. Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med 2013;368:1980–91. doi:10.1056/NEJMoa1209120
OpenUrl CrossRef PubMed Web of Science

[47] Church TR,

[48] Black WC,

[49] Aberle DR, et al

[50] ↵
Clopper CJ,
Pearson ES
. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934;26:404–13. doi:10.1093/biomet/26.4.404
OpenUrl CrossRef

[51] Clopper CJ,

[52] Pearson ES

[53] ↵
Obuchowski NA
. Nonparametric analysis of clustered ROC curve data. Biometrics 1997;53:567–78. doi:10.2307/2533958
OpenUrl CrossRef PubMed Web of Science

[54] Obuchowski NA

[55] ↵
Walter JE,
Heuvelmans MA,
de Jong PA, et al
. Occurrence and lung cancer probability of new solid nodules at incidence screening with low-dose CT: analysis of data from the randomised, controlled NELSON trial. Lancet Oncol 2016;17:907–16. doi:10.1016/S1470-2045(16)30069-9
OpenUrl CrossRef PubMed

[56] Walter JE,

[57] Heuvelmans MA,

[58] de Jong PA, et al

[59] ↵
Horeweg N,
van Rosmalen J,
Heuvelmans MA, et al
. Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol 2014;15:1332–41. doi:10.1016/S1470-2045(14)70389-4
OpenUrl CrossRef PubMed

[60] Horeweg N,

[61] van Rosmalen J,

[62] Heuvelmans MA, et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Background

Aim

Method

Results

Baseline scans

Incidence scans

Downstream effects

Model performance

Discussion

Acknowledgments

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password