Article Text
Abstract
Lung cancer screening is effective if offered to people at increased risk of the disease. Currently, direct contact with potential participants is required for evaluating risk. A way to reduce the number of ineligible people contacted might be to apply risk-prediction models directly to digital primary care data, but model performance in this setting is unknown.
Method The Clinical Practice Research Datalink, a computerised, longitudinal primary care database, was used to evaluate the Liverpool Lung Project V.2 (LLPv2) and Prostate Lung Colorectal and Ovarian (modified 2012) (PLCOm2012) models. Lung cancer occurrence over 5–6 years was measured in ever-smokers aged 50–80 years and compared with 5-year (LLPv2) and 6-year (PLCOm2012) predicted risk.
Results Over 5 and 6 years, 7123 and 7876 lung cancers occurred, respectively, from a cohort of 842 109 ever-smokers. After recalibration, LLPV2 produced a c-statistic of 0.700 (0.694–0.710), but mean predicted risk was over-estimated (predicted: 4.61%, actual: 0.9%). PLCOm2012 showed similar performance (c-statistic: 0.679 (0.673–0.685), predicted risk: 3.76%. Applying risk-thresholds of 1% (LLPv2) and 0.15% (PLCOm2012), would avoid contacting 42.7% and 27.4% of ever-smokers who did not develop lung cancer for screening eligibility assessment, at the cost of missing 15.6% and 11.4% of lung cancers.
Conclusion Risk-prediction models showed only moderate discrimination when applied to routinely collected primary care data, which may be explained by quality and completeness of data. However, they may substantially reduce the number of people for initial evaluation of screening eligibility, at the cost of missing some lung cancers. Further work is needed to establish whether newer models have improved performance in primary care data.
- imaging/CT MRI etc
- lung cancer
Data availability statement
Data may be obtained from a third party and are not publicly available. Data available through CPRD and NCRAS.
Statistics from Altmetric.com
Key messages
What is the key question?
How do multivariable risk prediction models used to identify people at risk of lung cancer for CT screening perform when applied directly to routinely collected primary care electronic data?
What is the bottom line?
When restricted to people who have ever smoked between the ages of 50 and 80, two multivariable models, recommended for use in the NHS England Targeted Lung Health Check, showed only moderate discrimination and over-estimated risk but applying the models at low risk thresholds could substantially reduce the number of people contacted and scanned, although some people with lung cancer are missed.
Why read on?
Targeted CT screening for lung cancer has the potential to save lives but the cost-effectiveness of the intervention is under scrutiny so using primary care data as a way to exclude people who are at low risk may be one way to reduce the number of people invited and therefore limit cost.
Introduction
Randomised controlled trials have shown that screening with low dose CT (LDCT) reduces lung cancer mortality.1–3 Many countries are therefore planning implementation, but questions remain around how to identify the population most likely to benefit. Most lung cancer screening trials used age and smoking pack year criteria to select participants. However, since the publication of the National Lung Screening Trial (NLST), further analyses have demonstrated that substantial variations in risk exist within trial populations.4 Risk prediction models have been suggested to select eligible participants at high risk of lung cancer and have been shown to be more sensitive and specific compared with using age and smoking history alone.5–10 This may be in part due to these models incorporating more detailed smoking data and considering other risk factors such as chronic obstructive pulmonary disease (COPD) or asbestos exposure. The UK Lung Screening (UKLS) trial used a multifactorial risk prediction model (Liverpool Lung Project V.2 (LLPV2)) to select patients.11 12 The results of the UKLS showed that the proportion of patients in whom lung cancer was detected was similar in a single screening round to that achieved by three annual rounds in the NLST. However, the trade-off of selecting higher risk groups is that only a small proportion of the total population at risk of lung cancer is included and there is potential for those selected to be at greater risk of competing causes of death. To maximise the impact of a screening programme, models with better sensitivity and specificity are needed to ensure the greatest number of eligible people benefit while reducing the number of LDCTs required. As well as being accurate, models for use on a whole population need simple methods of data collection, or must use existing high-quality data.
In the UK primary care records have been used to identify ever-smokers for further risk stratification as a way to limit the number of approaches that have to be made to cover the target population.13 14 Although, inevitably, a small proportion may be missed, a much larger proportion of ineligible people are not approached reducing inconvenience, worry and costs. UK pilots have used both the LLPv2, at a threshold of either 2.5% or 5%, and the Prostate Lung Colorectal and Ovarian (modified 2012) (PLCOm2012) at a threshold of 1.51%. An earlier version of this model was used to select subjects for the Pan-Canadian Early Detection of Lung Cancer (PanCan) study.7 The UK pilots found baseline cancer rates of 2%–3% and the PanCan study 5%.5 The NHS England Lung Health Check targeted screening programme has therefore recommended using either a PLCOm2012 6-year risk-threshold for lung cancer of 1.51% and/or an LLPv2 5-year risk-threshold of 2.5% to define eligibility. However, these risk models have not been validated or calibrated in primary care data. Previous external validations have compared models in well-defined data derived from screening trials.15 16 A recent ‘real-world’ UK pilot screening programme found that the PLCOm2012 model performed much as expected, although the investigators found some degree of miscalibration.17 However, their population had received screening, which may partly account for this miscalibration. It is therefore important to understand whether models can be applied to routinely collected primary care data of non-screened individuals and to establish the most appropriate risk threshold for further evaluating screening eligibility.
Methods
Data source
We used data from the Clinical Practice Research Datalink (CPRD), a computerised, longitudinal primary care database, linked to a range of other health-related data to provide a representative UK population health data set. The data encompass 50 million patients, including 14 million who are currently registered.18 All symptoms, medical diagnoses, prescriptions, investigations and results are entered into the computer system either during a consultation with a general practitioner (GP) or following communication from other healthcare providers.
Patient data
A general population cohort ≥40 years of age who were registered and contributing data for at least 12 months between 1 January 2000 and 31 December 2015 was extracted from CPRD. Patients ≥40 years of age who were diagnosed with lung cancer during this timeframe were identified from this cohort. To ensure that these were incident rather than prevalent cases, we excluded patients who registered less than 12 months prior to their diagnosis date. Data for English patients were linked to Cancer Registry data which provided additional information, including lung cancer pathological subtype and stage at diagnosis.
Lung cancer prediction models
This study evaluates two lung cancer prediction models: the LLPv2 and the PLCOm2012.5 8 19 CPRD data were used to identify and categorise the required variables to fit the models and derive a risk score for 5-year (LLPv2) and 6-year (PLCOm2012) risk of lung cancer, respectively. Personal history of pneumonia, COPD, smoking status, any cancer and family history of cancer were identified using medical code lists. Asbestos exposure is not routinely available in CPRD and to avoid bias by assuming that all patients were not exposed to asbestos, we searched CPRD for medical codes indicating ‘asbestosis’.20 Data on ethnicity and education were not available so we assumed all patients to be white and have basic education (ie, assuming normal secondary school completion in the UK approximates to completing high school in the USA). Additional CPRD files were used to extract data on body mass index (BMI). LLPv2 incorporates age at lung cancer diagnosis for a first-degree relative; however, details on the type of cancer in family members and age at diagnosis are not routinely collected in CPRD and therefore any lung cancer in a first-degree relative was considered to be early onset (age <60). Models were also assessed with this variable excluded.
Smoking data in CPRD
Unlike trial data, which record detailed individual smoking data at the time of a risk assessment, smoking data in CPRD are recorded whenever the person visits the GP. The GP records the details using medical and Read codes to indicate a patient’s smoking status; current, ex or never and the intensity of smoking in categories. These categories are defined as: (1) very heavy smoker 40+ cigarettes/day, (2) heavy smoker 20–40 cigarettes/day, (3) moderate smoker 10–19 cigarettes/day, (4) light smoker 1–9 cigarettes/day, (5) trivial smoker <1 cigarettes/day and (6) smoker quantity unknown. In the PLCOm2012 model, smoking intensity is incorporated as a continuous variable7 so in order to apply the model we had to convert the categorical variable to a specific number of cigarettes smoked per day. Therefore we assumed very heavy smokers to have smoked 40 cigarettes/day, heavy smokers 20 cigarettes/day, moderate smokers 10 cigarettes/day, light smokers 5 cigarettes/day and trivial smokers to have smoked 2 cigarettes/day. Patients with missing smoking data throughout their follow-up were considered to be never-smokers.
Only 10% of the population who were categorised as ever-smokers had a documented age of starting smoking. Based on published literature, we assumed the age at which people started smoking to be 18 years, which also coincides with the legal age to buy cigarettes in UK from 2007.21–23 Date of smoking cessation is recorded in the additional CPRD files for 68% of ex-smokers. Median day difference between smoking cessation date and the risk assessment date was calculated and substituted for 32% of the ex-smokers with missing date of smoking cessation.
Data setup
LLPv2 and PLCOm2012 predict 5-year and 6-year lung cancer incidence, respectively. We calculated 5-year and 6-year risk scores for all CPRD patients registered on 1 January 2009. We looked at 5-year incidence of having lung cancer for LLPv2 model, that is, until 31 December 2013; and 6-year incidence of having lung cancer for PLCOm2012 model, that is, until 31 December 2014. Lung cancer screening is unlikely to be offered to people aged below 50 years or above 80 years based on current modelling: therefore we excluded people aged <50 years or >80 years at the point of the risk assessment (1 January 2009). Similarly lung cancer screening is unlikely to be offered to never-smokers and so only ever-smokers were included in the cohort. This resulted in 842 109 individuals in our CPRD cohort.
Statistical analysis and multiple imputation
All data management and statistical analysis were performed using Stata V.16 (StataCorp) and the study was conducted and reported in line with the Transparent Reporting of a multivariate prediction model for Individual Prediction or Diagnosis guidelines.24 Occurrence of lung cancer was treated as a binary outcome at 5 years for LLP and 6 years for PLCOm2012. Distributions of demographic variables between patients with lung cancer and patients with non-lung cancer were evaluated. All patients actively participating in CPRD on 1 January 2009 were used to assess the performance of LLP (n=842 109). Multiple imputation by chained equation, to replace missing data on BMI (10%) and smoking quantity (28%), was performed based on all candidate predictors before applying the PLCOm2012 model. We created five imputed data sets for our cohort and combined them using Rubin’s rule to obtain final model estimates.25 On the basis of the most conservative figure of 7123 lung cancer events during the 5-year post risk assessment for LLP and 11 risk predictors in PLCOm2012, we had a sample size of 648 lung cancer diagnoses per predictor, well above the minimum requirement of 100 (or preferably 200) events per predictor suggested by Collins et al.26
We compared the demographics of the CPRD derived data set with that of the original development sets for the LLP and PLCOm2012.7 19
We assessed the performance of the models in terms of discrimination and calibration plots.6 27 The area under the receiver operating curve (AUC) was used to assess discrimination, ranging from 0.5 indicating no discrimination to 1 indicating perfect discrimination. ‘Pmcalplot’ package on Stata was used to plot observed and predicted risk probabilities. We also assessed the performance of models by risk-thresholds and calculated the number needed to screen to identify one patient with lung cancer based on those risk-threshold figures. For the LLPv2 model the risk-quartiles were set at risks: <1%, 1%–<2.5%, 2.5%–<5% and 5% or greater, while for PLCOm2012, our cohort was divided into risk tertiles of risks: <0.15%, 0.15%–1.5% and >1.5%. The values of the considered risk thresholds for the two models differ, as the models differ in absolute risk estimates due to differences in risk-levels between their development data sets. For comparison we also calculated AUCs for each model using data from NLST and PLCO.
Sensitivity analyses
Further analyses were conducted to assess if model performance was affected by exclusion of family history for both models and without the inclusion of cases with missing data of BMI and smoking intensity for PLCO.
Results
Study participants and comparison with original models
We analysed data on all (n=5 997 270) people actively contributing to CPRD between 1 January 2000 and 31 December 2015. Lung cancer incidence was 85.8 per 100 000 person-years. The overall incidence was higher for men compared with women (98.5 vs 73.4 per 100 000 person-years). Smoking status was recorded in 98% of the records. People aged between 50 and 80 years who were ever-smokers were selected to form the evaluation population for LLPV2 and PLCOm2012 models. This comprised 842 109 participants. An overview of the demographics/model characteristics of the LLPV2 and PLCOm2012 development cohorts and the CPRD cohort is presented in the online supplemental material with a description of the differences. Table 1A,B show these details for the CPRD cohort for LLP and PLCO, respectively. Complete information for all risk factors was available for 100% of the population for LLPV2 evaluation, but only 66% had complete information for PLCOm2012, mainly due to missing data on BMI and smoking intensity.
Supplemental material
Comparison of risk prediction model performance in CPRD
LLPV2
In CPRD, 7123 lung cancer events took place in 5 years between 1 January 2009 and 31 December 2013 (table 1A). The original LLPV2 model, which included never-smokers, produced a c-statistic of 0.70 in 10-fold cross validation.19 After recalibration of the model intercept, the evaluation in CPRD of LLPV2 produced a c-statistic of 0.700 (0.694–0.710) in CPRD data (table 2). The calibration plot of the recalibrated model is shown in figure 1. The calibration slope was 0.675 and intercept 0.
There was an under-prediction of lung cancer cases at the lowest risk scores, followed by an over-prediction. The overall mean predicted risk of patients with lung cancer in the CPRD cohort was 4.61%. This compares with the actual risk of 0.9%. The calibration slope was 0.679 and intercept 0.005.
Table 3 shows the patient features, proportion of patients with lung cancer identified and number of individuals needed to screen to detect one patient with lung cancer using a variety of risk categories. Patients with lung cancer had a higher mean predicted risk score compared with non-lung cancer cases in each category. Approximately 71% of the patients with lung cancer had a predicted risk score of >2.5%. The number of individuals needed to screen to detect one cancer (NNS) ranged from 322 in individuals with a risk of <1% to 54 in individuals with a risk >5%. A risk threshold of >5% included 43.7% of lung cancers and 20% of the total cohort. The corresponding figures for >2.5% and >1% were 70.8% of cancers and 40.8% of the cohort, and 84.5% of cancers and 57.5% of the cohort. Setting a risk threshold of 1% gives a NNS of 80, but would still miss 15.6% of the lung cancer cases, chiefly those with a younger median age (56 years) and shorter duration of smoking (all ≤40 years duration). However, 42.7% of the cohort without cancer would not need to be screened.
PLCOm2012
We identified 7876 lung cancer events that took place in the 6 years following PLCOm2012 risk assessment on 1 January 2009 (table 1B). After imputing missing BMI and smoking intensity values, PLCOm2012 produced a c-statistic of 0.679 (0.673–0.685) in CPRD data. Furthermore, even following recalibration of the model intercept (figure 2) there was still poor calibration of PLCOm2012 in CPRD data. The overall mean predicted risk for patients with lung cancer by PLCOm2012 model in the cohort was 3.76%. Similar to LLPv2, PLCOm2012 under-predicted lung cancer cases at the lowest risk scores, followed by over-prediction.
Table 4 shows the patient features, proportion of patients with lung cancer identified and number of individuals needed to screen to detect one patient with lung cancer using a variety of risk thresholds. Using imputed data, if a PLCOm2012 risk-threshold of >1.51% were applied to the CPRD population, it would detect 48.6% of the total lung cancer cases, with a NNS of 52 (23.5% of the total cohort selected). Setting the risk threshold to 0.15% increases the NNS to 88 (72.8% total cohort). This misses 11.41% of lung cancer cases, predominantly those with the lowest smoking intensity or where smoking data were incorrectly recorded in CPRD (table 4). However, 32.3% of people without cancer would not need to be screened. Those in the highest risk threshold group (>1.5%) had a higher median age (71 years vs 66 years for <0.15%) and were more likely to be current smokers (63% vs 36%, respectively). Proportionally, more in the highest risk threshold also had a diagnosis of COPD (29% vs 14%). At the higher risk threshold, PLCOm2012 selected a similar proportion of male and female cases to the overall population of ever-smokers, but selected slightly more males than females at lower thresholds. In total there were 72 fewer females selected that would be expected from the overall population, approximately 1% of the total cancers.
Sensitivity analyses
LLPv2 (recalibrated) and PLCOm2012 showed similar discrimination when family history was excluded. AUCs were 0.697 (0.691–0.702) and 0.679 (0.672–0.684), respectively. The race classifications in PLCOm2012 (which were based on US classifications) do not reliably match to UK classifiers and were therefore not appropriate for sensitivity analysis. Analysis of PLCOm2012 restricted to cases with complete data showed similar findings for the AUC (0.680 (0.673–0.687)) and calibration plot (online supplemental file and figure 1A).
Discussion
Main findings
This is the first study to evaluate and recalibrate the LLPv2 and PLCOm2012 models using primary care data. After restricting the primary care data to include only ever-smokers aged 50–80 years, our work showed that discrimination was only moderate for both models and, following recalibration of the model intercept, there was still poor calibration of the PLCOm2012 model in CPRD. Both models showed under-prediction at low risk followed by an over-prediction of those at highest risk. The detail required to use these models in practice is considerable (particularly with regards to smoking data) and would require a face-to-face or telephone consultation in order to replace the data already held in primary care records. Using both models at the current suggested risk thresholds (>1.51% for PLCOm2012 and >2.5% or >5% for LLPv2) missed 51%, 29% and 56% of lung cancer cases, respectively. This concerned largely those with younger median age and lower smoking duration for LLP. Those who were missed by PLCOm2012 at lower thresholds were less likely to have COPD and were more likely to be ex-smokers with lower smoking intensity. The relatively poor performance of the models in terms of discrimination and calibration (even after recalibration of the model intercept) has implications for the choice of risk threshold for selecting individuals for screening. Comparatively low risk-thresholds were required to capture a worthwhile proportion of the people who develop lung cancer; however, this would also select substantial numbers of low-risk individuals while the cost-effectiveness of lung cancer screening will depend on the total number selected for screening. In CPRD we show how many more cancers are detected at lower thresholds and how this impacts the number of screens, highlighting the need for a two-step approach to improve the assessment of screening eligibility.
Strengths and limitations
This is the largest external evaluation of the LLPv2 and PLCOm2012 risk models in the literature and, to our knowledge, the first using primary care data. Data in CPRD are prospectively recorded at the time of consultation in primary care which minimises reporting and recall bias, however, the information relies on accurate coding and timely data entry in primary care. To minimise errors related to this, we only used data entered by practices after the practice met the CPRD data quality and completeness standard. This study has tested the risk models in a context outside of their intended use. Both models should be populated with data collected from a potential participant in screening. Instead the study shows how the models perform in routinely collected primary care data. Therefore, one of the key limitations is the lack of detailed smoking data in CPRD. One could argue that this places the PLCOm2012 model at a disadvantage in validation, as the primary risk factors which drive the model are age and detailed smoking history. Evaluations for the USA, such as the recent study by Pasquinelli et al, highlight differences in lung cancer risk by race and ethnicity.28 However, the racial and ethnic groups evaluated in the PLCOm2012 model probably do not reflect the racial and ethnic groups present in the UK, nor their lung cancer risk. A previous investigation found lung cancer incidence rate-ratios for different ethnic groups to be much lower than for white men and women, with the exception of men of Bangladeshi descent.29 Future research might quantify the latest magnitude of lung cancer risk differences across racial and ethnic communities in the UK. Similarly, while educational level (an indicator for socioeconomic status) was not available, the UKLS indicated lung cancer risk is higher in socioeconomically deprived groups.30 Consequently, efforts should be made to integrate information on socioeconomic status in the assessment of lung cancer risk in the UK.
While BMI is included in factor in PLCOm2012 it was not a significant risk factor in CPRD. Although the latter is a risk factor in PLCOm2012, other large studies have not found a simple relationship. In one large study of Americans, Europeans and Asians, with 23 732 incident lung cancers, BMI was associated with decreased risk but measures of central obesity with higher risk.31 BMI is a calculated field in primary care electronic data and has shown to be reliable.32 Despite our finding that many of the weaker risk factors were either not recorded or at a low frequency, both models still over-estimated risk. This might suggest that if the additional data were available, they may over-estimate to an even greater extent.
However, one of the objectives of this piece of work was not only to evaluate and calibrate the models in primary care data, but also to look at the feasibility of applying these models in routinely collected data to select patients for entry into screening. Smoking data in primary care are recorded at the time of registration at a surgery using a questionnaire or during face-to-face consultations. Many practices record smoking status at regular intervals as part of the Quality Outcome Framework (QOF).33 The 2004 QOF mandated recording every 15 months in patients with comorbid illness and in 2006, recording smoking status in non-morbid patients was required every 27 months to attract payment. In the full CPRD data set, prior to restricting to ever-smokers, only 2% of the population had missing smoking data.
Studies looking at the validity of smoking records in electronic primary care data have shown that it is in line with that obtained from population surveys such as the Health Survey for England in terms of proportion of people who are current or ex-smokers in age categories.34 Those with no recorded smoking status are likely to be never-smokers or smokers who quit before the age of 30 years, so it is unlikely that we would be excluding or misclassifying a substantial number of eligible smokers by labelling these people as never-smokers.35–37 As smoking intensity is grouped into categories in CPRD we had to assign each participant a specific number of cigarettes smoked per day, which was largely in multiples of 5 or 10 (apart from trivial smokers). Work by Shiffman has shown that even when a contemporaneous smoking history is taken from a person it is prone to digit bias.38 In his study, two-thirds of participants asked about daily smoking consumption recorded smoking quantity in multiples of 10, suggesting that our approach is not unreasonable. The key drivers in these risk models are age, sex and smoking, so missing data on other predictor variables is less likely to impact the performance of the risk models. This was confirmed in our sensitivity analyses where we found the impact of other variables to be minimal. Thus, although it is easy to criticise data completeness and accuracy in routinely collected primary care data, the reality is that it is often better than assumed. Therefore, using these data has to be balanced against the extra cost of directly acquired data which itself may be subject to incompleteness and bias. It may be that less costly methods such as the use of online forms or mobile applications might be a solution, although it is important to establish how effective these are, particularly in the deprived population.
Recently the LLPv2 risk model and a recalibrated version (LLPv3) have been validated and calibrated using questionnaire data from the 75 958 UKLS individuals who responded to the first approach questionnaire and have been followed up for lung cancer for over 5 years.39 This cohort included never-smokers (47%), which may inflate measures of discrimination. The authors found the AUC to be 0.81 for both LLPv2 and LLPv3 but LLPv2 was found to overestimate the absolute risk approximately twofold. The LLPv3, which was calibrated to contemporary English incidence, achieved substantially more accurate prediction of absolute incidence, and would now be an appropriate update to LLPv2 in selecting a high-risk group for screening in the UK.
Other work in the literature
A study by Li et al 16 compared the performance of four risk prediction models including LLP and PLCOm2012 in 20 700 German participants of the European Prospective Investigation into Cancer and Nutrition cohort. This showed better discrimination for the PLCOm2012 model (c-index 0.81, 95% CI 0.76 to 0.86) compared with the LLP model (c-index 0.79, 95% CI 0.73 to 0.83). However, the cohort had an overall rate of lung cancer of less than 0.5% with fewer than 100 lung cancer events.
Weber et al 40 externally validated PLCOm2012 in a cohort of 95 882 Australian ever-smokers aged 45 years and older. They used questionnaire data completed as part of the 45 and Up Study,41 linked to a number of population data sets. They demonstrated an AUC of 0.80 (95% CI 0.78 to 0.81) with good calibration in their population (mean and 90th percentile absolute difference between observed and predicted probabilities of 0.006 and 0.016, respectively). The authors assessed the model performance at a risk threshold of 1.51% and showed a sensitivity of 70% (95% CI 67.1% to 72.7%) and specificity of 75.4% (95% CI 75.2% to 75.7%). In a subset of the population (those aged 55–74 years) they also assessed a variety of additional risk thresholds, namely 1.49%, 1.73% and 2%, but did not show that changing the risk threshold made a substantial improvement to the sensitivity and specificity. The 45 and Up Study cohort may be more similar to those who participate in trials, from which PLCOm2012 was derived, as people had to personally complete a questionnaire, consent form and mail it to the study centre to be included. As the authors acknowledge, this means that there may be a selection bias in favour of less deprived people. The good calibration suggests that the population is similar to that from which the model was derived. The data required to compute the risk score, particularly with regards to detailed smoking data, were largely derived from these questionnaires rather than already available in routinely collected data.
Ten Haaf et al 6 conducted a retrospective validation of nine risk prediction models using data from NLST and PLCO. Both calibration and discriminative ability were better for all models using PLCO data than NLST. PLCOm2012 showed better discrimination than LLP (0.789 (95% CI 0.781 to 0.797) vs 0.745 (95% CI 0.736 to 0.755) in the PLCO control arm) but the PLCOm2012 was derived from this data set which places PLCOm2012 at an advantage compared with LLPv2. Interestingly, most of the models tested in this study had greater discriminative ability in predicting 6-year lung cancer mortality rather than 6-year lung cancer incidence.
Katki et al 9 evaluated nine risk prediction models in US data on ever-smokers from the National Institutes of Health–AARP Diet and Health Study (NIH-AARP) and the Cancer Prevention Study II Nutrition Survey (ACS CPSII) cohort to compare model performance. Both LLP and PLCOm2012 showed some overestimation of risk, but PLCOm2012 was better calibrated in this cohort. Both showed moderate discrimination; PLCOm2012 with AUC of 0.769 (95% CI 0.766 to 0.772) and 0.754 (95% CI 0.741 to 0.767) for NIH-AARP and ACS CPSII, respectively, and LLP with AUC values of 0.726 (95% CI 0.722 to 0.731) and 0.726 (95% CI 0.711 to 0.740), respectively. When the authors set screening eligibility at 2% lung cancer risk over 5 years the well calibrated models, including PLCOm2012 selected fewer participants for inclusion (ranges between 7.6–10.9 million, compared with 14.5–26 million for the less well calibrated models).
Clinical relevance and conclusions
Primary care records in the UK are likely to be used to identify those who may be eligible for CT screening because they provide an efficient way to identify ever-smokers and thus minimise contact with people who would not be eligible and reduce cost and potential distress from being contacted about cancer screening when there is no benefit. Other countries, where similar data exist will likely do the same. Ideally, a model with good sensitivity and specificity should be applied directly to primary care data and only then would potential participants be contacted. We have shown that two existing models, even at very low risk thresholds, would miss a significant number of people if applied in this way. We do not know how much better the models perform when applied in the lower risk categories to more detailed, directly-derived data from participant questionnaires, but it is likely that a significant proportion of people who develop cancer would be below the threshold. If models are to be used to derive a first-step ‘enriched’ population, then the second step would likely involve increasing the risk threshold to comply with cost-effectiveness standards. The principle of the two-step approach is the use of an initial model at a low risk threshold in order to maximise sensitivity, with a second model that uses the integrity of detailed and directly-acquired data to improve specificity and reduce cost. This study has tested models at different thresholds and we conclude that specificity in the first step of the two-step approach would only be improved by obtaining more accurate data to use in the risk prediction, or by the development of new models. Obtaining better data in a first-step approach could place a considerable burden on services with limited gain and extra cost. However, once national screening programmes are in place, this could be the subject of data quality improvement in primary care, with additional data fields completed that are important in risk prediction, for example, detail of family history. It will be important to compare the performance of two-step approaches with newer, single-step models developed in primary care data. The main value of improved models is in identifying those who are at lower or intermediate risk on the basis of current risk models, but who arguably may have more life-years to gain from screening due to younger age, lower smoking intensity and consequently fewer comorbidities. It is key that future risk prediction models are able to predict not only eligibility for entry into screening but also whether, and by how much, participants can expect to benefit. Some studies have suggested that risk models may identify those who are less likely to benefit from screening due to competing causes of mortality and morbidity.42 Optimal risk thresholds need to be identified based on local population data and further work is needed to determine what the best strategy is for identifying and inviting those who have most to gain from screening for lung cancer.
In conclusion, two validated multivariable models perform less well than previously reported when applied to routinely collected primary care data restricted to ever-smokers aged 50–80 years, which may be explained by the quality and completeness of the data. However, they may be used as a way to reduce the total number of ever-smoking people in this higher risk group who are contacted as part of a screening programme by a third to a half, but with 10%–15% of people who develop lung cancer excluded from more detailed evaluation. The cost-effectiveness of screening programmes is currently under evaluation and the cost of the CT is a major driver.43 Hence reducing the total number screened could be pivotal. While many of the excluded people may not be at high enough risk to be eligible, further work is needed to establish how many are incorrectly excluded and to what extent newer models can improve on this, as even the best models will miss some lung cancers.
Data availability statement
Data may be obtained from a third party and are not publicly available. Data available through CPRD and NCRAS.
Ethics statements
Patient consent for publication
Ethics approval
Approval for use of data for this project was granted by the CPRD Independent Scientific Advisory Committee (ISAC) (Protocol numbers 18_223 and 20_014R).
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Correction notice This article has been corrected since it was published Online First. The funding statement and competing interests statements have been updated.
Contributors All authors contributed to the research concept and design. All authors edited the manuscript and approved the final version. Analysis was conducted by JK and EO and further refined by DRB, SWD, KtH and RH. All these latter authors verified the underlying data. DRB is guarantor for the work.
Funding This research was funded by Cancer Research UK C35238/A26388. This research is linked to the CanTest Collaborative, which is funded by Cancer Research UK [C8640/A23385].
Competing interests KtH reports grants from Cancer Research UK, during the conduct of the study; grants from European Union (Horizon 2020), grants from University of Zurich, Switzerland, non-financial support from International Association for the Study of Lung Cancer, non-financial support from International Association for the Study of Lung Cancer, non-financial support from Russian Society of Clinical Oncology, non-financial support and other from Biomedical Research In Endstage And Obstructive Lung Disease Hannover (BREATH), grants from NIH/National Cancer Institute, outside the submitted work. WH is Co-PI of CanTest Collaborative, funded by Cancer Research UK. RBH reports personal fees from Galapagos, outside the submitted work. SMJ reports grants from GRAIL, personal fees from AstraZeneca, personal fees from BARD1 Bioscience, personal fees from Achilles Therapeutics, grants from Owlstone, other from Optellum, personal fees from Johnson and Johnson, other from AstraZeneca, outside the submitted work. HJdK reports grants from Cancer Research UK, during the conduct of the study; grants from European Union (Horizon 2020), personal fees from University of Zurich, Switzerland / MSD, personal fees from IPSOS London, grants from NIH/National Cancer Institute, personal fees from Teva, Copenhagen, Denmark, outside the submitted work. DRB reports grants from Cancer Research UK, during the conduct of the study; personal fees from Roche, personal fees from AstraZeneca, personal fees from MSD, personal fees from BMS, outside the submitted work.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.