Statistics from Altmetric.com
The contribution of exercise testing for risk assessment for lung resection is well established and has been embedded in international guidelines from Europe1 and the USA.2 There are many forms of exercise tests (6 min walk, 12 min walk, shuttle walk, stair climbing), but the most established investigation is formal assessment of maximum oxygen consumption during exercise (Vo2 max). British and American (American College of Chest Physicians (ACCP)) guidelines use Vo2 max as the ultimate assessment of operative risk, positioned at or near the end of the functional algorithm,3 whereas European guidelines recommend the use of this test much earlier in patients with a forced expiratory volume in 1 s (FEV1) or carbon monoxide transfer factor (Tlco) <80% predicted.1 2
Numerous cohort studies and a meta-analysis report the association of low Vo2 max and ‘high risk’ lung resection.4–18 However ‘high’ is not quantified and ‘risk’ is not defined, two fundamentally important definitions if guidelines that use these terms are to be applied clinically. Here we focus on validity of the Vo2 max studies and the clinical utility of the available evidence with respect to individual interpretation of risk.
Sample size and precision of risk estimation of death
Arguably, the most important outcome when considering surgery for lung cancer is the ability to survive the procedure. The most apparent limitation of the currently available evidence is the lack of appropriately powered studies to address this. The precision of a risk model is not specifically dependent on sample size, but rather the number of events—that is deaths—an uncommon outcome in thoracic surgery. In the UK, lobectomy, the most common procedure for lung cancer, carried an operative mortality of ∼2% in 2004–2005,19 and in the USA the mortality rate has been reported to range from 2.3% to 4.1%.20 Reflective of this, the largest study in this context on Vo2 max (422 patients) had only 15 deaths. What is clearly more disconcerting is that publications for which recommendations on estimation of operative mortality risk have been based have sample sizes ranging from 8 to 160.21
Upper limits of uncertainty for safe cut-off values
Many studies have defined arbitrary cut-off values ranging from 15 to 20/ml/kg/min as a ‘safe’ cut-off value4 12–14 because above these levels no patient experienced an adverse event. What is the validity of this type of recommendation?
The answer lies in the uncertainly that surrounds the observation of no events (ie, upper 95% CI), a function of the sample size. For a standard binomial distribution, the upper limit of the CI of zero events with the sample size of 8–160 corresponds to 42–2.7%, respectively (figure 1), illustrating high limits of uncertainty in the majority of studies with smaller sample sizes.
Alternative models for risk estimation of death
Given these limitations, are there any other alternatives for the risk assessment for operative morality? Thoracoscore is a composite scoxring system that can be used to quantify risk. It is a logistic regression-derived model with coefficients provided for individual risk factors, calculated to provide a percentage probability of death. It is currently the best model and was developed from a sample size of 15 183 patients with 338 deaths, and provides excellent discrimination with an area under the curve of 0.82.22 Furthermore, it has been validated in different populations.23 Apart from superior statistical power, much larger sample size, external validity and excellent performance, the logistic risk model carries two further attractive advantages compared with Vo2 max assessment: it is cost free and can be universally available.
Other outcomes and composite end points
The consistent message that lower values of Vo2 max are associated with higher risk of complications is to be expected as a measure of cardiovascular fitness. From a patient's and clinician's perspective, however, the nature of the complications is of central importance. All studies to date have used composite end points and, when multiple outcomes are combined, it becomes difficult to interpret the impact of each individual component. It has been recommended that each outcome should have a similar weighting or clinical importance to facilitate clinical interpretation.24 For example, death and myocardial infarction would be combined to estimate the total number of patients that may have experienced a myocardial infarction and survived, added to those that have (presumably) experienced a myocardial infarction and died.
Researchers, however, may use composite outcomes to increase the power of the study (by increasing event rate) and therefore increasing the chances of achieving statistically significant results.25 The corollary is that important outcomes such as death can be piggybacked within the pool of less important outcomes such as atelectasis5 7–9 11–14 16 or purulent sputum,4 giving rise to considerable difficulties for the clinician and, more importantly, the patient to evaluate the importance of the overall result. We believe that most patients would not consider readmission to the intensive care unit,11 atelectasis,5 7–9 11–14 16 arrhythmia4 6–8 10 13 14 or postoperative CO2 retention6 8–10 13 16 as ‘prohibitive’ complications leading them to refuse surgery.
Quantification and interpretation of risk
A clear explanation of risks and benefits is central to good consenting practice when offering treatment options to our patients. Dichotomous categorisation of ‘high’ and ‘standard’ risk using Vo2 max for risk assessment, accompanied by a combination of varied outcomes (some of which have little influence on patient decision making) renders the information difficult to apply in practice. The lack of a numerical estimate leads to subjective interpretation of ‘high’; moreover, many studies do not document the uncertainty (confidence limits) that surround their estimates. As there is no accepted level of baseline risk, it is not possible to quantify the relative magnitude of ‘high’ to facilitate the interpretation.
Cost of getting it wrong
It is intuitive that clinicians seek to protect the interests of their patients, and some may wonder if a discussion of the quantification and interpretation of risk is relevant as opposed to acceptance and avoidance of risk based on published values. In the CALGB 9238 study, the largest in the series (with 422 patients), physicians were allowed to offer surgical treatment of patients with ‘very high risk’, defined as FEV1 <900 ml and Vo2 max of <15/ml/kg/min. Of the 68 patients in the ‘very high risk’ group, there was only one postoperative death within 30 days and a total of three in-hospital deaths.17 More importantly, on follow-up, the operated patients in the very high risk group had more than double the median survival compared with the non-operated patients (36.0 months vs 15.8 months, p<0.001), illustrating acceptable procedural mortality and morbidity with twice the median survival with case selection on parameters independent of Vo2 max. Denying patients with ‘prohibitive’ values of Vo2 max the opportunity to consider surgery as a management option may in fact be against their best interests. As the study was not randomised, it is important to bear in mind the invariable presence of selection bias, and the possibility that a better result was achieved by offering surgery to fitter patients with less co-morbidity. Our point is more to question the ‘conventional’ lower limit of safety and the results that can be achieved by further selection.
We acknowledge the consistent message that low levels of Vo2 max are associated with increased complications from surgery. However, we believe current recommendations are flawed by small sample sizes, resulting in imprecise risk estimates. Moreover, the lack of numerical quantification leads to difficulties in defining the level of acceptable risk. Furthermore, the use of composite outcomes leads to a lack of agreement on the importance of the risks, and the incongruence limits the clinical applicability to inform patients on the decision to undergo surgery. We believe that management options should be discussed at a multidisciplinary level but decisions should be undertaken at patient level. This is because patients are heterogeneous, with individual perceptions on the value of benefit and risk. As the lower limits of safety remain imprecisely defined, patients with multidisciplinary team-defined ‘prohibitive’ levels of risk may not be offered the opportunity to consider surgery as an option and denied the possibility of increased life expectancy.
There may also be a degree of concern if postoperative quality of life may be a trade-off for any increase in life expectancy in the high risk cohort; however, prospective studies indicated that patients traditionally considered at higher risk of lung resection had postoperative physical and emotional quality of life scores similar to those observed in younger and fitter patients.26
Before widespread use, further work needs to be performed to determine if cardiopulmonary exercise testing is an independent predictor of mortality (eg, above and beyond that of Thorascore), to relate the study to individual outcomes that would influence the decision to undergo surgery, to provide numerical quantification of risk with an estimate of uncertainty and to demonstrate validity in different cohorts.
Competing interests None.
Provenance and peer review Commissioned; externally peer reviewed.