Background: Effective strategies for managing patients with solitary pulmonary nodules (SPN) depend critically on the pre-test probability of malignancy.
Objective: To validate two previously developed models that estimate the probability that an indeterminate SPN is malignant, based on clinical characteristics and radiographic findings.
Methods: Data on age, smoking and cancer history, nodule size, location and spiculation were collected retrospectively from the medical records of 151 veterans (145 men, 6 women; age range 39–87 years) with an SPN measuring 7–30 mm (inclusive) and a final diagnosis established by histopathology or 2-year follow-up. Each patient’s final diagnosis was compared with the probability of malignancy predicted by two models: one developed by investigators at the Mayo Clinic and the other developed from patients enrolled in a VA Cooperative Study. The accuracy of each model was assessed by calculating areas under the receiver operating characteristic (ROC) curve and the models were calibrated by comparing predicted and observed rates of malignancy.
Results: The area under the ROC curve for the Mayo Clinic model (0.80; 95% CI 0.72 to 0.88) was higher than that of the VA model (0.73; 95% CI 0.64 to 0.82), but this difference was not statistically significant (Δ = 0.07; 95% CI −0.03 to 0.16). Calibration curves showed that the probability of malignancy was underestimated by the Mayo Clinic model and overestimated by the VA model.
Conclusions: Two existing prediction models are sufficiently accurate to guide decisions about the selection and interpretation of subsequent diagnostic tests in patients with SPNs, although clinicians should also consider the prevalence of malignancy in their practice setting when choosing a model.
Statistics from Altmetric.com
The solitary pulmonary nodule (SPN) is a common and challenging clinical problem.1 2 In patients who are surgical candidates, malignancy should be identified promptly (when present) to permit timely resection. Ideally, surgery should be avoided in patients with nodules that prove to be benign. Previous studies have shown that the effectiveness and cost-effectiveness of SPN management strategies depend critically on the “pre-test” probability of malignancy, that is, the probability of malignancy based on clinical characteristics and radiographic findings before performing other tests.3–6 While most clinicians use their intuition and clinical judgement to make this assessment, quantitative prediction models7 8 and neural networks9–11 have been developed to facilitate this task.
Swensen and colleagues7 at the Mayo Clinic retrospectively reviewed the medical records and imaging tests of 629 patients (51% male) with lung nodules measuring 4–30 mm in diameter that were newly discovered between 1984 and 1986; 65% of the nodules were benign, 23% were malignant and 12% were without a final diagnosis. The authors divided the sample into development (n = 419) and validation sets (n = 210). Using logistic regression analysis, they identified six independent predictors of malignancy: older age, a history of smoking, a history of an extrathoracic cancer >5 years before nodule detection, larger nodule diameter, upper lobe location and spiculated margins (box 1). Model accuracy was good, with an area under the receiver operating characteristic (ROC) curve of 0.83 and a standard error (SE) of 0.02 in the development set. The area under the ROC curve was 0.80 (SE = 0.04) in the validation set, and the model was well calibrated in both sets, that is, the observed frequency of malignancy was similar to the predicted probability of malignancy in patients grouped by deciles of predicted probability. Of note, this study excluded patients with a history of lung cancer and patients with a history of an extrathoracic cancer that was diagnosed within 5 years of nodule identification.
Members of our group8 derived another model using data from 375 patients (98% male) with solitary nodules measuring 7–30 mm (inclusive) in diameter who were enrolled in a VA Cooperative Study that examined the diagnostic accuracy of positron emission tomography (PET) imaging for SPN diagnosis. The prevalence of malignancy was 54%. We identified four independent predictors of malignant SPNs by using logistic regression analysis: older age, a history of smoking, larger nodule diameter and shorter time since quitting smoking (box 1). This model had accuracy similar to the Mayo Clinic model with an area under the ROC curve of 0.79 (SE = 0.02) in the development set and an area under the ROC curve of 0.78 (SE = 0.02) by internal cross-validation. The model was well calibrated in patients with predicted probabilities that were <29% or >66%, but it slightly overestimated the probability of malignancy in patients with low to moderate predicted probability (29–48%) and slightly underestimated the probability of malignancy in patients with moderate to high predicted probability (49–66%).
In this study we validated two existing clinical prediction models that estimate the probability that an indeterminate SPN will be malignant.
Box 1 Equations for models that estimate the pre-test probability of malignant SPN
The Mayo Clinic model is defined by the equations:
Pre-test probability of a malignant SPN = ex/(1+ex)
x = −6.8272 + (0.0391*age) + (0.7917*smoke) + (1.3388*cancer) + (0.1274*diameter) + (1.0407*spiculation) + (0.7838*upper)
where e is the base of the natural logarithm; age indicates the patient’s age in years; smoke indicates smoking history (1 = current or former smoker, 0 = never smoker); cancer indicates history of an extrathoracic cancer ⩾5 years before nodule identification (1 = yes, 0 = no or not specified); diameter indicates the largest nodule measurement (in mm) reported on initial chest radiograph or CT scan; spiculation indicates mention of nodule spiculation on any imaging test report (1 = yes, 0 = no or not specified); and upper is location of the nodule within the upper lobe of either lung (1 = yes, 0 = no).
The VA model is defined by the equations:
Pre-test probability of a malignant SPN = ex/(1+ex)
x = −8.404+ (2.061*smoke) + (0.779*age10) + (0.112*diameter) − (0.567*yearsquit10)
where e is the base of the natural logarithm, smoke indicates smoking history (1 = current or former smoker, 0 = never smoker); age10 indicates age in years at the time of nodule identification, divided by 10; diameter indicates the largest nodule measurement (in mm) reported on initial chest radiograph or CT scan; and yearsquit10 indicates the number of years since quitting smoking, divided by 10 (0 indicates not applicable).
We retrospectively reviewed the electronic medical records of an independent sample of 151 asymptomatic veterans with SPNs discovered incidentally and collected information about patient history, nodule characteristics and final diagnosis. Using these data, we evaluated the accuracy and calibration of the Mayo Clinic and VA models for estimating the probability of malignancy in patients with indeterminant lung nodules.
We examined the records of 991 consecutive patients with known or suspected lung cancer who underwent PET imaging at a veterans hospital between January 2000 and February 2006 (fig 1). During that time almost all patients with an SPN underwent both PET and CT scanning at this institution. A trained medical abstractor reviewed the medical records to identify patients who had an SPN that measured 7–30 mm in largest diameter on the chest radiograph or CT scan. We excluded patients who did not have an SPN based on the presence of any of the following radiographic findings: lung mass >3 cm in diameter, atelectasis (except linear or subsegmental), pneumonia, consolidation, infiltrate, pleural effusion, hilar or mediastinal lymphadenopathy, or bony abnormalities suggestive of malignancy. In addition, we excluded patients with more than one nodule on the chest radiograph or more than six nodules on the chest CT scan. We also excluded patients who did not have a CT scan or whose initial CT scan was performed more than 30 days after the PET scan (n = 31), because we used data from this same sample to validate a decision analysis model that estimates post-test probabilities of malignant nodules following CT and PET.12 Finally, we excluded patients who lacked a final diagnosis (n = 94), patients with nodules that were observed to grow before the initial PET scan was performed (n = 64) and patients for whom the diagnosis was made before the PET scan (n = 11).
We considered the final diagnosis to be malignant when histopathology revealed a specific malignant diagnosis. We considered the final diagnosis to be benign when histopathology revealed a specific benign diagnosis, when the nodule was stable over 2 years of clinical and radiographic follow-up, or when the nodule decreased in size at any time. We also considered the final diagnosis to be benign when patients died with a nodule that was stable after at least 1 year of follow-up.
For eligible patients the abstractor reviewed the initial chest radiograph and chest CT reports to collect information about nodule diameter, location and edge characteristics. In addition, the abstractor reviewed progress notes, admission reports and demographic files to collect information about age, gender, race, ethnicity, pulmonary function, history of cancer and smoking behaviour. Information about smoking behaviour included smoking history (ever or never), smoking status (current or former), average packs/day, years of smoking, pack-years and number of years since quitting.
We tested for differences in clinical and radiological characteristics between patients with malignant and benign nodules using t tests for continuous variables with a normal distribution and the Mann-Whitney U test for continuous variables with a distribution that differed significantly from normal, as determined by the one-sample Kolmogorov-Smirnov test. For categorical variables we performed χ2 tests or Fisher exact tests as indicated. Mean values and standard deviation (SD) are reported for continuous variables.
Identical validation techniques were used for both models. We described accuracy by comparing the predicted probability of malignancy with the final diagnosis and constructing ROC curves.13 We report the area under the ROC curve and 95% confidence intervals (CI). To compare the accuracy of the two models we sampled with replacement and used the bootstrap to generate a 95% CI for the difference in the areas under the ROC curve.14
Calibration refers to the relationship between the observed and predicted probabilities of malignant SPNs. We calibrated each model by dividing the sample into five equal groups based on predicted probability and plotting the median probability of each quintile against the observed frequency of malignancy for that group. Analyses were performed using SPSS for Windows Version 15.0 (SPSS, Chicago, Illinois, USA) and SAS for Windows Version 9.1 (Cary, North Carolina, USA). Two-sided p values <0.05 were considered to be statistically significant.
The sample of 151 patients was predominantly male (96%) with a mean (SD) age of 66.9 (10.1) years (table 1). Information about race was not available for 59 patients (39%) and information about ethnicity was not available for 66 patients (44%), but of those for whom it was available, 74 (80%) were white and 81 (95%) were not Hispanic or Latino. The prevalence of malignancy was 44%.
Patients with a malignant diagnosis had larger nodules than those with a benign diagnosis (p<0.001) and were more likely to have nodules with spiculated borders (p<0.001, table 1). Participants with a malignant diagnosis were somewhat more likely to be older and to be current or former smokers, although these differences were not statistically significant. Former smokers with malignant nodules tended to have quit smoking more recently than those with benign nodules, but these differences were also not statistically significant.
Validation of the Mayo Clinic model
We excluded 33 patients from the validation of this model because they had a history of lung cancer or had been diagnosed with an extrathoracic cancer within 5 years of nodule identification. The prevalence of malignancy in the validation sample was 45%. Of the remaining 118 patients, three were missing information about nodule location and no information on smoking history was available for one.
When these four patients were excluded from the validation, the area under the ROC curve (AUC) was 0.80 (95% CI 0.72 to 0.88), as shown in fig 2A. The AUC changed only slightly when we performed a sensitivity analysis in which we assumed that all four patients with missing data were smokers with nodules in the upper lobe (AUC = 0.81, 95% CI 0.73 to 0.89). The AUC and 95% CI were identical when we assumed that patients with missing data were smokers with nodules in the middle or lower lobes. In these analyses, we always assumed that the one individual with missing information about smoking history was a smoker because the prevalence of smoking was so high in this sample (89%).
A calibration curve showed that, for patients in all but the lowest quintile of predicted probability (fig 2B), the median predicted probability was lower than the observed frequency of malignant nodules by approximately 2% (fourth quintile) to 12% (second quintile). However, the observed frequency of malignant nodules fell within the range of predicted probabilities for patients in each quintile, except for patients in the third quintile in whom the observed frequency of malignant nodules was 48% and the range of predicted probabilities was 31–47%.
The accuracy (AUC 0.78, 95% CI 0.71 to 0.86) and calibration of the Mayo Clinic model were similar when we included the 33 patients who had a prior diagnosis of lung cancer or an extrathoracic cancer that was diagnosed within 5 years of nodule identification.
Validation of the VA model
We excluded 12 patients from validation of this model because they were included in the sample used to develop the model.8 Excluded patients tended to be younger, were more frequently current smokers, were heavier smokers, had a higher proportion of positive PET scans and had larger nodules, although none of these differences was statistically significant. The prevalence of malignancy among patients used to validate the VA model was 42% compared with 67% for those excluded. These differences probably reflect the very high prevalence of malignancy in the development set (54%). Patients excluded from the analysis did not differ significantly or meaningfully with respect to sex, race, ethnicity, nodule location or spiculation.
Of the remaining 139 patients used for validation, 15 (11%) were missing information necessary for estimating pre-test probability with the VA model. Of these, four patients were known to be former smokers but were missing information about time since quitting; seven patients were known to have smoked but were missing information about their current smoking status and years since quitting smoking; and four patients were missing all information about smoking history.
Patients missing values for smoking-related variables did not differ significantly from those with complete information with respect to age, sex, race, ethnicity, cancer history, nodule location, nodule edge characteristics, method of first nodule detection or method of final diagnosis. However, patients with missing information tended to have smaller nodules (mean difference −5 mm, p = 0.07) and were more likely to have a benign final diagnosis (p = 0.07).
Excluding patients with missing data, the area under the ROC curve for the VA model was 0.73 (95% CI 0.64 to 0.82), as shown in fig 3A. The results for the VA model did not vary when we used three different sets of assumptions (Appendix) to impute values for missing data (AUC 0.72–0.75; lower limit of 95% CI 0.64 to 0.67; upper limit of 95% CI 0.81 to 0.83). In contrast to the Mayo Clinic model, the median predicted probability for patients in each quintile was higher than the observed frequency of malignant nodules by approximately 2% (first quintile) to 16% (second quintile) (fig 3B). When constructing the ROC and calibration curves shown in fig 3, we excluded patients with missing data.
The Mayo Clinic model appeared to be slightly more accurate than the VA model, although the difference in the areas under the ROC curves was not statistically significant (Δ = 0.07; 95% CI −0.03 to 0.16).
We have validated two models that use clinical history and radiographic findings to estimate the probability of malignancy in patients with SPNs. Both models appeared to be sufficiently accurate to inform clinical decisions about the choice and interpretation of subsequent diagnostic tests. The accuracy of both models was similar to that reported in the papers that described their development.7 8 The Mayo Clinic model was slightly more accurate than the VA model, but this difference was not statistically significant.
In our validation, the Mayo Clinic model underestimated the probability of malignancy for patients in all but the lowest quintile of predicted probabilities. This is perhaps not surprising, given that the Mayo Clinic model was developed in a sample of patients with a much lower prevalence of malignancy (23%) than our sample (44%).
Another group recently examined the accuracy of the Mayo Clinic model and obtained similar results. Using data from 106 patients with SPNs in the Netherlands, of whom 61 (57%) had malignant nodules, Herder and colleagues15 found that the AUC was 0.79 (95% CI 0.70 to 0.87), almost identical to our estimate. As in our study, the authors found that the model underestimated the clinical probability of malignancy.
In contrast to the Mayo Clinic model, the VA model was developed from a sample with a relatively high prevalence of malignancy (54%). As might be expected, this model overestimated probabilities of malignancy for patients in our validation sample, among whom prevalence was lower (44%).
Thus, one option for practising clinicians would be to use both models to generate a range of pre-test probabilities: the Mayo Clinic model would provide the lower bound and the VA model would provide the upper bound. Alternatively, it may be reasonable to use the VA model instead of the Mayo Clinic model in some circumstances because it may be preferable to overestimate, rather than underestimate, pre-test probability. While overestimates might result in unnecessary surgery or biopsy in some patients with benign nodules (false positive diagnosis), underestimates might lead to a more dire delayed diagnosis and missed opportunities for surgical cure in patients with malignant nodules (false negative diagnosis).
At present we recommend that, when choosing a model, clinicians should carefully weigh the relative consequences of a false positive diagnosis compared with those for a false negative diagnosis. They should also be mindful of the prevalence of malignant nodules in their practice setting when interpreting model estimates. Although further research is needed to determine whether the pattern of over- and under-calibration that we observed is generalisable to other patient groups or other models, the Mayo Clinic model is probably a better choice in practice settings with a low prevalence of malignant lung nodules. Conversely, the VA model may be a better choice in settings with a high prevalence of malignant SPN.
For an example of how to use these models, we present the hypothetical case of a previously healthy 54-year-old former smoker who quit smoking 10 years ago. The patient has a smooth-bordered non-calcified nodule in the right upper lobe that measures 16 mm in diameter. Using the equations presented in box 1, the estimated probability of a malignant SPN is 25% according to the Mayo Clinic model and 29% according to the VA model. Previous research has shown that use of PET imaging was most cost-effective in patients with indeterminate nodules on CT scanning and a low to moderate (<40%) probability of malignancy,6 so it would be reasonable to order a PET scan in this case. In contrast, the estimated probability of malignancy in a 68-year-old smoker with a 22 mm spiculated nodule in the right upper lobe is 78% by the Mayo Clinic model and 81% by the VA model. If this patient is a good surgical candidate and has no evidence of intrathoracic or distant metastasis on the CT scan, the probability of malignancy is so high that it would be reasonable to proceed directly to surgical resection. The probability estimates generated by the models could also be incorporated into future clinical decision support systems.
Our study has several limitations. We collected information about nodule size, location and spiculation from radiology reports rather than by examining chest radiographs and CT images directly. Similarly, we relied on progress notes in patients’ medical records for information about smoking and cancer history. Inaccurate or incomplete information taken from these records, when used to calculate probabilities of malignancy, would result in lower model accuracy. Accordingly, this study provides a conservative test of model accuracy. More accurate data would yield higher, rather than lower, measures of accuracy.
When available, we used the largest nodule diameter from the first abnormal chest radiograph to calculate nodule size. However, 73 patients (48%) did not have a chest radiograph or measurements from the first abnormal chest radiograph were not recorded. When a chest radiograph measurement was unavailable, we calculated nodule size using measurements from the CT scan instead. If chest radiograph and CT measurements differed substantially, this would affect our results. However, when we used only CT measurements, which were available for 96% of patients, we found that the areas under the ROC curve and 95% confidence intervals for both models were nearly identical to results using both chest radiograph and CT measurements. Model calibrations were also very similar. This suggests that both chest radiograph and CT measurements may be used with these models without affecting accuracy, making them more widely applicable in clinical practice.
Our validation was also limited by missing data. In the sample used to validate the VA model, 11% of patients were missing information about smoking history. However, we found that model accuracy and calibration were not sensitive to multiple methods of imputing missing values for these patients. Fewer than 5% of patients in the sample used to validate the Mayo Clinic model were missing information. Again, we found very similar model accuracy and calibration using multiple methods of imputing missing values. In general, characteristics of patients with and without missing data were similar.
Finally, because our sample was composed primarily of older white men with a history of smoking, our results may not be generalisable to other groups of patients. In particular, this study does not directly address questions about the accuracy of these models when predicting pre-test probabilities of malignant SPNs in women. However, there is no evidence that characteristics of benign and malignant pulmonary nodules differ between men and women. The potential advantages of quantitatively estimating the pre-test probability of malignancy in women with SPNs probably outweigh the risk in generalising to this population. In addition, the validity of the VA model for predicting malignancy in nodules that measure <7 mm in diameter has not been established. Additional research using this or other models to accurately assess the probability of malignancy in small “sub-centimetre” nodules is needed because the detection of small nodules is on the rise as more CT scans of the chest are being performed.
In conclusion, our study demonstrates that both the Mayo Clinic and VA models predict the pre-test probability of malignant SPNs with accuracy that is sufficient to make them clinically useful, especially since neither model is intended to be used as a stand-alone diagnostic test. By using a validated model to estimate pre-test probability, clinicians will be able to make better informed decisions about the selection and interpretation of subsequent diagnostic tests in patients with SPNs, and may thereby deliver more effective and efficient care.
We would like to thank Mithat Gönen for his assistance in applying the bootstrap method.
(A) Assumptions for imputing missing data
We used three different sets of assumptions to impute values for missing data about smoking behaviour when validating the VA model. In all three cases, if a patient was known or assumed to be a former smoker, we assumed that time since quitting smoking was 15.1 years (the mean value in all former smokers within the VA validation sample). In the first set of assumptions we assumed that participants with missing data about smoking history had smoked and that patients with missing data about smoking status were current smokers. In the second set of assumptions we assumed that patients with missing data about smoking history had smoked and that patients with missing data about current smoking status were former smokers. In the third set of assumptions we assumed that patients with missing data about smoking history had never smoked and that patients with missing data about current smoking status were former smokers. All three sets of assumptions yielded similar results.
(B) Characteristics of samples used to validate each model
Of the 151 patients included in the study, 33 were excluded from validation of the Mayo Clinic model because they had a history of lung cancer or a history of an extrathoracic cancer within 5 years of nodule identification. The characteristics of the 118 patients used to validate the Mayo Clinic model are shown in table B1. Of the 151 patients included in this study, 12 were excluded from validation of the VA model because those patients were used to develop the model in previous work.8 The characteristics of the 139 patients used to validate the VA model are shown in table B2.
Funding: This study was supported by the National Cancer Institute (RO1 CA117840). The National Cancer Institute had no role in study design, data collection, data analysis or manuscript preparation. The views expressed in this paper are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs or the National Cancer Institute.
Competing interests: None.
Ethics approval: The Institutional Review Board at Stanford University approved this study and waived the requirements for informed consent.