Article Text

Download PDFPDF

Original research
Performance at stair-climbing test is associated with postoperative complications after lung resection: a systematic review and meta-analysis
  1. Fairuz Boujibar1,2,
  2. André Gillibert3,
  3. Francis Edouard Gravier4,5,
  4. Timothée Gillot6,7,
  5. Tristan Bonnevie4,5,
  6. Antoine Cuvelier5,8,
  7. Jean-marc Baste1,2
  1. 1Department of General and Thoracic Surgery, CHU Rouen, Rouen, France
  2. 2Normandie University UNIROUEN, INSERM U1096, Rouen, France
  3. 3Biostatistics Unit, CHU Rouen, Rouen, Normandie, France
  4. 4ADIR Association, Bois Guillaume, France
  5. 5Normandie University, UNIROUEN, UPRES EA 3830, Rouen University Hospital, Haute Normandie Research and Biomedical Innovation, Rouen, France
  6. 6CETAPS EA 3832, Mont Saint Aignan, France
  7. 7ERFPS, CHU Rouen, Rouen, France
  8. 8Pulmonary & Respiratory Intensive Care Department, CHU Rouen, Rouen, Normandie, France
  1. Correspondence to Fairuz Boujibar, Department of General and Thoracic Surgery, CHU Rouen, Rouen, France; fairuz.boujibar{at}


Background Thoracic surgery is the optimal treatment for early-stage lung cancer, but there is a high risk of postoperative morbidity. Therefore, it is necessary to evaluate patients’ preoperative general condition and cardiorespiratory capacity to determine the risk of postoperative complications. The objective of this study was to assess whether the stair-climbing test could be used in the preoperative evaluation of lung resection patients to predict postoperative morbidity following thoracic surgery.

Methods We performed a systematic review and a meta-analysis on the association between stair-climbing test result and morbidity/mortality after thoracic surgery. We analysed all articles published until May 2020 in the following databases: Pubmed/Medline, Pedro, The Cochrane library, Embase and CINAHL. The risk of bias was assessed using the Quality in Prognosis Studies tool. This meta-analysis is registered as PROSPERO CRD42019121348.

Results 13 articles were included in the systematic review for a total of 2038 patients and 6 in the meta-analysis. There were multiple test evaluation criteria: rise time, height, desaturation and heart rate change. For the meta-analysis, we were able to pool data on the height of rise at a variable threshold: risk ratio 2.34 (95% CI 1.59 to 3.43) with I²=53% (p=0.06). The threshold for occurrence of complications was estimated at a 10 m climb.

Conclusions Our results indicate that the stair-climbing test could be used as a first-line functional screening test to predict postoperative morbidity following thoracic surgery and that patients with a poor test result (<10 m) should be referred to formal cardiopulmonary exercise testing.

  • thoracic surgery
  • lung cancer
  • lung physiology
  • pulmonary rehabilitation

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Key messages

What is the key question?

  • Can stair-climbing test be used to predict postoperative complications in lung resection candidates?

What is the bottom line?

  • This meta-analysis provides quality of evidence that a threshold of 10 m climbed during the stair-climbing test can predict postoperative morbidity.

Why read on?

  • In this original systematic review and meta-analysis, we summarise current evidence of the role of the stair-climbing test in the preoperative assessment of patients referred for pulmonary resection.


Surgery is considered the optimal treatment for patients with early stage lung cancer.1 Nevertheless, lung resection remains a major intervention with significant morbidity.2 Candidate selection influences morbidity and mortality; therefore, an appropriate assessment is primordial. According to the guidelines of the American College of Chest Physicians,3 the European Respiratory Society (ERS) and the European Society of Thoracic Surgeons (ESTS),4 all patients with a predicted postoperative value of diffusing capacity of the lung for carbon monoxide (DLCO) or forced expiratory volume in 1 s (FEV1) under 60% would benefit from further exercise testing as cardiopulmonary exercise testing (CPET). CPET on ergocycle with incremental exercise protocol is currently recognised as the gold standard stress test, allowing cardiopulmonary evaluation.5 It is essential to determine whether a patient is at risk after surgery or even operable and whether there is a need for prehabilitation to improve cardiorespiratory function before surgery.6

However, CPET is largely underused, with surgeons declaring that only between 10% and 30% of their patients receive this evaluation even though 75% of them prescribe CPET.7 As an alternative to CPET, some low-technology tests as the incremental shuttle walking test, the 6 min walk test (6MWT) and the stair-climbing test (SCT) have been described.8

For the SCT, patients are required to climb stairs to evaluate their cardiorespiratory function. In the context of thoracic surgery, this assessment is intended to identify patients at risk for postoperative complications. Indeed, there is a high risk of complications after pulmonary resection, especially if the patient is operated by thoracotomy.9 This test was first described in 196810 and has been used for years. It is safe, easy to understand, fast and cheap. SCT can be performed as a submaximal test (a number of flights to climb) or as a maximal test (symptom limited test).8

The objective of this study was to assess whether the SCT could be used in the preoperative evaluation of lung resection patients to predict postoperative complications.



A systematic review and meta-analysis were performed and are reported in accordance with the ‘Preferred Reported Items for Systematic reviews and Meta-Analyses’ (PRISMA) statement11 and Cochrane Handbook for systematic reviews of diagnostic test accuracy.12 A review protocol was written before starting the literature search.

The research protocol was prospectively published online at the PROSPERO International prospective register of systematic reviews ( under registration number: CRD42019121348.

Literature search

We queried Pubmed/Medline, Pedro, The Cochrane library, Embase and CINAHL in October 2018 and updated the search on May 2020. The search strategy is described in online supplementary material S1.

Eligibility criteria

Inclusion criteria were studies with a cohort design and investigating patients aged over 18 years who were diagnosed with lung cancer and who had SCT before surgery to predict postoperative complications. Studies with patients who had SCT for other reasons were excluded. There were no exclusion criteria based on language or publication date. For the analysis, we retained only studies in which SCT was performed in a maximal way, that is, the patient had to provide an uninterrupted maximal effort until exhaustion.

Outcome reporting

Since all studies that defined postoperative complications, defined it as any adverse event causing a cardiorespiratory disorder or death, this definition was kept for the meta-analysis. Since follow-up of complications was limited to hospital stay,13 up to 1 month,14 15 up to 3 months postoperatively16 or do not define the time limit,17 the maximum follow-up period of each article was used to define complications. Table 1 shows a high level of heterogeneity for categories of complications, with 98% of respiratory complications in Arruda18vs 36% of respiratory and 59% of cardiac complications in Dong.19

Table 1

Characteristics of included studies on systematic review and meta-analysis

Study selection and methodological quality assessment

Two independent reviewers screened and selected studies based on title, abstract and full text. The articles were translated if they were not reported in English or French.

Assessment of risk of bias was done by means of the Quality in Prognostic Studies (QUIPS) tool20 used by Cochrane. A low risk of bias was given if all six domains were scored as low or if no more than two moderate or unknown risks of bias were identified. A moderate risk of bias was given if three or less than three domains were scored as moderate or unknown, in combination with no high risk of bias. A moderate risk of bias was also given if one domain was scored as high in combination with one or less than one moderate or unknown risk of bias. A high risk of bias was given when two or more domains were scored as high or four or more domains were scored as moderate or unknown risk of bias.4 5 The QUIPS assessment for each study was independently completed by two authors. Differences were resolved by referral to a third author.

Data extraction and analysis

The following data were extracted from the studies: name of first author, publication year, country, study period, size of cohort, frequency of postoperative complications and crude number of complications in groups below and above a study-specific threshold separating low-risk and high-risk patients for each exposure. When there was no specified threshold, but detailed data were available, the median height was chosen as the threshold. When no threshold was defined and there were insufficient data to calculate the median or number of subjects above, the study was excluded from meta-analysis pooling.

Statistical analysis

Pooling effects of desaturation, heart rate change and rise time at the SCT was planned but cancelled because of heterogeneity and too few studies for each exposure.

Unadjusted relative risks (RRs) were pooled in a random effect (primary analysis) model by the DerSimonian-Laird method and a fixed effects model based on the Mantel-Haenszel estimator (sensitivity analysis). Heterogeneity was tested by Cochran’s Q test21 and described by tau² and I². A fixed effects model based on the Mantel-Haenszel estimator for RRs was used in a sensitivity analysis.

Due to the threshold variance between studies, sensitivity and specificity are variable and negatively correlated, calling for joint interpretation. We modelled the bivariate distribution of the logit of sensitivity and logit of specificity in the five parameter model of Reitsma et al:22 mean sensitivity, mean specificity, between-study variance of sensitivity, between-study variance of specificity, between-study covariance of sensitivity and specificity. It is equivalent to the hierarchical summary receiver operating characteristic model.23 It was estimated by restricted maximum likelihood by the ‘mada’ package of the R statistical software. Bivariate confidence regions and univariate CIs of sensitivity and specificity were generated from this model. Positive predictive value (PPV) and negative predictive values (NPV) were modelled in another Reitsma model, inverting roles between the test and the outcome as suggested by Leeflang et al.24

All analyses were performed using R statistical software (V.3.5.0, The R Foundation for Statistical Computing, Vienna, Austria).


Database searches and reference lists yielded a total of 168 articles (figure 1). After different removals, we identified 13 studies that were included in the systematic review.

Figure 1

PRISMA diagram showing selection of studies for systematic review and meta-analysis. SCT, stair-climbing test.

Of the 13 cohort studies included, 8 studies described the climbed height as evaluation criteria, of which 2 gave means and SD in the group with and without complications but did not provide enough data to assess the RR of complication at a given threshold. Therefore, 6 of the 13 studies were included in the meta-analysis assessing the predictive value of the climbed height.

Of the studies included in the systematic review, four used desaturation as an evaluation criterion,19 25–27 but only two of them had data on RRs, and they used different definitions of desaturation. Consequently, results were not pooled for this evaluation criterion. Three articles used time to climb as evaluation criteria and two analysed heart rate but none of them had data on RR.

Study characteristics

The study characteristics of the 13 included articles are summarised in table 1. There were 11 prospective cohort studies and 2 retrospective cohort studies. Studies were published between 1991 and 2017 and included data from 2038 patients, with a mean age ranging from 52.7 years to 68.5 years. All were conducted in a hospital setting.

Two studies published by the same team included patients during overlapping periods.14 27 However, the first study14 reported climbed height as assessment criteria while the second27 reported desaturation and height during the test. Only the first was included in the meta-analysis.

Exposure reporting


Four cohort studies investigated the association between oxygen desaturation and postoperative complications. Distance to climb was different between studies. In the first study, patients had to climb six floors and the saturation change was noted at this term with a mean desaturation of 6.15% (SD 5.86%, n=46) in patients with complications vs 4.06% (SD 3.94%, n=124) in patients without complications19 (p=0.008). In the second study, in which patients had to climb three floors,26 the mean desaturation was 2.50% (SD 2.58%) in patients with complications vs 1.18% (SD 0.98%) in patients without complications (p=0.017). In the third study, in which patients had to provide a maximal effort climb, a saturation <90% at the end of the test was observed in 26 (51.0%) patients with major complications or death and in 28 (56.0%) patients without complications or with minor complications25 (p=0.61). In the fourth study, after a maximal effort, 27 (21%) patients with complications had a SpO2 drop ≥4% vs 48 (12%) patients without complications (p=0.008). In the same study, 6 (4.7%) patients with complications had a SpO2 under 90% at the end of the test vs 21 (5.1%) patients without complications (p=0.80).27 Definitions of desaturations and evaluation methods differed between studies; hence, we were not able to pool data because of this heterogeneity.


Three cohort studies reported time as assessment criteria.25 26 28 For Nikolić et al,25 the duration of maximal effort SCT was only predictive of postoperative complications in the lobectomy subgroup, with a mean test duration of 96.1 s (SD 14.5) in patients without complications or with minor complications and of 80.7 s (SD 14.2) in patients with complications. Ambrozin et al28 reported, for six floors climbed, 46.6 s (SD 17.4) for patients with complications vs 35.8 s (SD 12.8) for patients without complications (p=0.005), while Toker et al26 reported 21.67 s (SD 6.55) vs 19.65 s (SD 6.32) for 3 m (p=0.16).

Heart rate change

Only two studies reported this exposure. Dong et al19 reported, for five floors climbed, a mean heart rate change of 48.8 (SD 13.5) in patients with complications vs 54.1 (SD 14.5) in patients without complications (p=0.03), while Toker et al26 reported, for 3 m climbed, respectively, 14.85 (SD 3.77) and 14.93 (SD 3.55) (p=0.90).

Climbed height

Eight studies investigated the association between the height climbed and postoperative morbidity. The overall homogeneity allowed us to perform a meta-analysis. One study29 was excluded from quantitative analysis (n=11) because it did not define a climb threshold, and RRs at the median could not be calculated. All patients included in the meta-analysis were instructed to climb the maximum number of floors until exhaustion. Studies in which patients did not perform the test under maximum conditions were excluded from the analysis.19 For the study of Olsen et al,17 two thresholds were presented (19 m and 10 m). The one closest to the mean of the other studies was selected (10 m).

Results of meta-analysis: relative risk

Figure 2 shows RR estimations of the random effects model at the threshold defined by the study. The overall pooled RR of complications was 2.34 (95% CI 1.59 to 3.43) with heterogeneity coefficient I²=53% (p=0.06). The pooled RR of complications in the fixed-effects model (sensitivity analysis) was 2.31 (95% CI 1.79 to 2.96). A sensitivity analysis was performed by using fixed-effect models by excluding the two studies with the lowest quality score showing an RR estimated at 2.00 (95% CI 1.53 to 2.61), with a non-significant heterogeneity (I2=34%, p=0.21) (online supplementary figure S2).

Figure 2

Forest plot of the relative risks of postoperative cardiopulmonary complications for a height of climbing below (vs above) the study-specific threshold.

Each study compared the risk of complications of patients exceeding a threshold to that of those not exceeding it. This threshold was different in each study, varying between 2.43 m30 and 12 m.14 We therefore analysed for each study the association between complications and height climbed at its own threshold (figure 2). A pooled threshold of height climbed, indicating a high risk of complications, was obtained by the unweighted mean of thresholds of all the studies included in the meta-analysis. In a sensitivity analysis, the two studies with poor methodological quality were excluded. The mean threshold including the studies with poor methodological quality13 30 was 8.11 m whereas the threshold excluding these studies was 9.91 m.

Predictive performances

The sensitivity, specificity, PPV and NPV of the SCT were estimated respectively at 43% (95% CI 21% to 68%), 88% (95% CI 82% to 92%), 62% (95% CI 42% to 78%) and 75% (95% CI 65% to 83%). Confidence regions of predictive values are shown in figure 3.

Figure 3

Sensitivity, specificity (panel A), PPV and NPV (panel B) of each study (grey crosses and 95% thin ovoid confidence regions) and of the meta-analysis (round and bold 95% ovoid confidence region) in Retisma’s random effect model taking in account the correlation between sensitivity and specificity and between PPV and NPV. NPV, negative predictive value; PPV, positive predictive value.

Risk of bias

The overall ‘Risk of Bias’ of each included study is presented in figure 4. Overall crude agreement on methodological quality scores between the reviewers was 84.6%. Five studies had a low risk of bias, four had a moderate risk of bias and four had a high risk of bias, mainly due to the variety of confounding factors.

Figure 4

Risk of bias according to the QUIPS tool. Red circle: high risk of bias, orange circle: moderate risk of bias, green circle: low risk of bias. QUIPS, Quality in Prognosis Studies.

We assessed the overall level of evidence of the univariate prognosis value of the SCT for postoperative complications with the adapted GRADE framework defined by Huguet et al for prognostic meta-analyses.31 The investigational phase was at 2 for all studies. There was no problem on: inconsistency, indirectness, publication bias. There was a downgrade for serious study limitations (high risk of bias in several studies) and an upgrade for moderate/large effect size. Overall, the level of evidence was high.

Assessment of risk of bias

Selection bias: two studies do not include only thoracic surgery candidates, but also a minor proportion of abdominal surgery.15 30

Information bias on exposition: the heterogeneity of SCT is not perfectly standardised. The types of stairs are different, and this can have an impact on the result. Furthermore, the reproducibility of the test is unknown.

Information bias on outcome: we observe a difference in follow-up complications with a lack of standardised definition of postoperative complications.


This is the first meta-analysis of observational studies on the association between the height climbed during SCT and postoperative morbidity. We found evidence of a 2.34-fold (95% CI 1.59 to 3.43) increased risk of postoperative complications beyond the climb threshold of SCT.

This meta-analysis included two low-quality studies. However, their exclusion had little impact on estimations of effects. The high methodological heterogeneity of studies included in this systematic review made it impossible to pool any effect, but the climbed height. Even that analysis was limited by variable climbing thresholds. The climbing threshold being variable from study to study, the predictive value at the mean threshold (9.91 m) could not be computed on aggregated data. Individual data meta-analysis was not possible since we failed to acquire study databases despite our requests to the authors.

Exercise tests simulate surgical stress and detect potential defects in the lung, heart and/or muscular chain of the oxygen transport system.5 This encourages the widespread use of preoperative exercise in thoracic surgery units. The SCT involves a large muscle mass and it is a functional test, informing the clinicians about the daily capacities of patients. The number and height of steps in stair flights varied between centres. We tried to standardise the results by reporting the performance in metres. This study shows that the number of floors was not a reliable value, 12 m corresponding in some studies to six floors,14 while 10 m corresponded to 2.5 floors in others.17 Metric altitude standardisation allows therapists to adapt the risk threshold according to patients’ consultation environment.

ERS/ESTS guidelines recommend that SCT should be used as a first-line functional screening test to select patients for safe surgery (height of ascent >22 m) (grade of recommendation: B).4 The 10 m threshold computed in our work is based on pooling studies that directly estimated the risk of complications, while the 22 m threshold is based on a single study32 that did not measure postoperative complications, but estimated it indirectly via VO2max. A total of 98% of patients who were above the 22 m threshold had high VO2max (>15 mL/kg/min) but only 23% of patients who were below the threshold had low VO2max (<15 mL/kg/min). From the same study, a threshold of 14 m would provide a PPV of 56%, a sensitivity of 64% and a specificity of 87%.

The climbed height or climbing speed is correlated (r=0.63–0.72) with the VO2peak reached during SCT.19 28 VO2peak reflects a patient's aerobic activity and ability to respond to intense physical stress, as during major surgery. SCT yields greater values of VO2peak than ergocycle.33 This can be explained by the visual feedback and the motivation of the patient to reach the next floor before stopping. Only 10% of patients performing SCT stopped the test before reaching the next floor.32

After reanalysis of raw data published by Bolliger et al, VO2peak PPV, at threshold 15 mL/min/kg was 47% (23%–72%) and NPV was 87% (77%–94%). In our meta-analysis, the PPV of SCT was higher (62%), but NPV was lower (75%).34

The SCT as the 6MWT can be used to predict postoperative outcomes.35 The 6MWT is very easy to perform but requires a 30 m long corridor which is not always available. The SCT requires stairs that are high enough. To improve risk control, some teams prefer the use of ergocycle in a secured environment.36

As SCT is a highly demanding cardiorespiratory test, there are associated risks.37 It can identify patients presenting risks, but it is difficult to determine if the risk is cardiac, respiratory and/or neuromuscular. Regarding these results, we suggest using SCT more as a screening tool to identify patients who would benefit from CPET, rather than as a substitute for this major examination.

Currently, the use of CPET is based on European and American recommendations.4 5 These recommendations are relative to resting respiratory parameters (FEV1/DLCO) at a time when the patient's evaluation must be dynamic. In addition, these recommendations, like the majority of the studies included in this meta-analysis, are relative to thoracotomy-operated patients, whereas at present, surgery by video or robotic-assisted thoracic surgery (VATS or RATS) is largely preferred. There are too few prospective studies including these techniques.

The strength of this meta-analysis is an extensive literature search, reporting based on PRISMA statement and standardised assessment of bias, whereas most articles report small case series limited to a single institution. As part of this analysis, patients treated with VATS were combined with those treated with thoracotomy.19 30 Compared with thoracotomy, VATS enables less postoperative pain, fewer complications and shorter hospitalisation.38 Most of the reviewed studies did not include VATS resection. Modern thoracic surgery may allow more complex cases to be performed. All patients in the meta-analysed studies had major pulmonary resection except in the Salahuddin et al30 and Girish et al15 studies, which included 65% and 50% of patients with thoracic surgery, respectively, and the remainder had upper abdominal surgery.

As with all meta-analyses, the quality of pooled results is dependent on the quality of the included studies. The included studies were small, with variable methodological quality. The high variability of the average height climbed suggests that the populations and/or test conditions were variable. The surgery extent (from wedge resection to pneumonectomy) was variable and subgroup analyses were impossible with the published data. Postoperative complications were defined in all studies as cardiorespiratory impairment resulting in morbidity or mortality. However, complications were not detailed in all studies. It would be helpful to know which complications were most frequently detected by the SCT. Some may be relatively easily managed such as atrial arrhythmia whereas others such as ventilation insufficiency may not. None used the Clavien-Dindo classification to rank complications.39

Moreover, postoperative follow-up was not of the same duration (hospital stay of up to 90 days), even if according to Agostini’s work, postoperative complications occur mostly during hospital stay.40

Our meta-analysis completes the systematic review of Moran et al investigating the predictive capacity of field tests in abdominal surgery and calls for further evaluation of SCT in a context of minimally invasive surgery or combined with other low-technology tests.41


This meta-analysis and the systematic review highlight an association between height climbed at stair-climbing test and postoperative complications. SCT could be used as a screening tool in patients with lung cancer in order to select candidates for thoracic surgery who could benefit from preoperative CPET and prehabilitation. However, better designed prospective studies are necessary to assess the predictive value of low-technology tests and risk classification algorithms in a context of minimally invasive thoracic surgery.


The authors are grateful to Nikki Sabourin-Gibbs, Rouen University Hospital, for her help in editing the manuscript.


View Abstract


  • Twitter @boujibarf

  • Contributors Contributed to conception and design, acquisition, analysis and interpretation of data: FB, AG, JMB, TG, FEG, AC, TB. Contributed to drafting the work and critical revisions: FB, AG, JMB, FEG, TB. Contributed to final approval of the version submitted: FB, AG, JMB, TG, FEG, AC, TB

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles