Introduction The gender-age-physiology (GAP) index is an easy-to-use baseline mortality prediction model in idiopathic pulmonary fibrosis (IPF). The GAP index does not incorporate exercise capacity parameters such as 6 min walk distance (6MWD) or exertional hypoxia. We evaluated if the addition of 6MWD and exertional hypoxia to the GAP index improves survival prediction in IPF.
Methods Patients with IPF were identified at a tertiary care referral centre. Discrimination and calibration of the original GAP index were assessed. The cohort was then randomly divided into a derivation and validation set and performance of the GAP index with the addition of 6MWD and exertional hypoxia was evaluated. A final model was selected based on improvement in discrimination. Application of this model was then evaluated in a geographically distinct external cohort.
Results There were 562 patients with IPF identified in the internal cohort. Discrimination of the original GAP index was measured by a C-statistic of 0.676 (95% CI 0.635 to 0.717) and overestimated observed risk. 6MWD and exertional hypoxia were strongly predictive of mortality. The addition of these variables to the GAP index significantly improved model discrimination. A revised index incorporating exercise capacity parameters was constructed and performed well in the internal validation set (C-statistic: 0.752; 95% CI 0.701 to 0.802, difference in C-statistic compared with the refit GAP index: 0.050; 95% CI 0.004 to 0.097) and external validation set (N=108 (C-statistic: 0.780; 95% CI 0.682 to 0.877)).
Conclusion A simple point-based baseline-risk prediction model incorporating exercise capacity predictors into the original GAP index may improve prognostication in patients with IPF.
- idiopathic pulmonary fibrosis
- interstitial fibrosis
- respiratory measurement
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known on this topic
The gender-age-physiology (GAP) index is an easy-to-use baseline mortality prediction model in idiopathic pulmonary fibrosis (IPF). Despite its practicality, the GAP index is limited by insufficient precision and the lack of evaluation of measures of exercise capacity.
What this study adds
Distance ambulated during 6 min walk testing and exertional hypoxia are strongly predictive of mortality in IPF and the addition of these factors to the GAP index improves overall model performance.
How this study might affect research, practice or policy
The distance-oxygen-GAP index, a simple, point-based baseline-risk prediction model incorporating exercise capacity parameters into the GAP index improves mortality prediction in patients with IPF and has potential utility in research studies as well as clinical practice.
Idiopathic pulmonary fibrosis (IPF) is characterised by progressive fibrosis that results in physiologic restriction, diminished exercise capacity and premature death. Despite a median survival of approximately 4.5 years, individuals demonstrate marked variability in their clinical decline.1–3
Given this clinical heterogeneity, multidimensional indices have been developed to predict course of illness based on clinical factors.4–8 An easy-to-use risk prediction model, the gender-age-physiology (GAP) index, was developed and validated in diverse patient populations.9–11 This model provides valuable insight through simple objective predictors of disease progression (gender, age, forced vital capacity (FVC) and diffusing capacity of the lung for carbon monoxide (DLCO)). Despite its practicality, the model is limited by lack of precision and the overestimation of risk in some cohorts.4 12 Furthermore, the GAP index does not incorporate exercise capacity and therefore, may be of limited value in capturing decline over time.5 9 13
Exercise capacity testing, such as 6 min walk testing (6MWT) and the need for supplemental oxygen may outweigh other predictors of mortality in IPF and may outperform the predictive ability of the GAP index itself.10 14
We sought to build on the GAP index to develop and validate a simple, clinically useful predictive model incorporating exercise capacity which can be applied at the time of initial clinical evaluation.
We reviewed records of all patients with IPF evaluated at the Inova Advanced Lung Disease Program, a tertiary referral centre, between 1 December 2007 and 15 March 2020 (derivation and internal validation set) and all patients with IPF evaluated at the Hôpital Européen Georges Pompidou (Paris, France) between 15 July 2005 and 21 January 2021 (external validation set). Patients were included if confidently diagnosed with IPF based on available international consensus criteria at the time of evaluation and had pulmonary function testing available within 6 months of their initial clinic visit.15–17 The study was approved by the Institutional Review Boards (IRB# U20-03-3956 and CEPRO2021-029) at both institutions. Data were collected from the time of initial lung disease consultation and for up to 3 years. Outcome variables were obtained from the electronic medical record, the clinic databases and the Social Security death index.
Predictor variables, in addition to those included in the original GAP index, were prespecified based on clinical relevance and evidence demonstrating their association with poor prognosis in IPF.10 14 Predictors evaluated were exertional hypoxia and the 6 min walk distance (6MWD). Categorical thresholds in the original GAP index were retained from the clinically meaningful categories defined in this prior model and a threshold for 6MWD of less than 250 m was selected a priori as the cutpoint most likely to best discriminate survival. 6MWD threshold selection was based on prior research in large cohorts, which found this distance to be independently associated with a greater than twofold increased risk of 1-year mortality.18 19
The 6MWD was included if performed within 6 months of the initial clinic evaluation. Exertional hypoxia was defined by either an active prescription for supplemental oxygen (for either previously documented severe resting or exertional hypoxemia related to IPF) or if desaturation (SpO2 <88%) was observed during the 6MWT performed on room air.20 Both 6MWT and pulmonary function tests were performed according to standard criteria with patients on chronic supplemental oxygen administered oxygen at their prescribed rate during the 6MWT.21 22 It is routine practice at both centres for all new consults to undergo pulmonary function testing and 6MWT during their initial consultation.
The primary outcome was time from baseline (defined as the date of initial pulmonary function test at our institution) to death from any cause. Patients were right censored at the time of loss to follow-up, end of follow-up and at the time of lung transplantation.
Survival analysis was performed using the Kaplan-Meier method and the log-rank test to compare groups. The Cox proportional hazards model was used to calculate hazard ratios (HRs) with the 95% CI to analyse individual risk factor of interest and their relationship to all-cause mortality. The proportional hazard assumption was tested using Schoenfeld residuals and was found to be valid. Necessary sample size (N=246) for model development was calculated using the method described by Riley et al for time-to-event analysis and was based on follow-up of 2.4 years, C-statistic of 68.7 (original GAP index validation) and estimated event rate of 22.8 cases per 100 person years.5 23 24 As all modelled variables were routinely collected at both study sites, incomplete data were infrequent (<1%), thus, missing data were handled via complete-case analysis.
We first evaluated the predictive performance of the original GAP index in the full study cohort using the previously validated categorised model with point-score assignments.5 We subsequently randomly split the internal cohort in half into a derivation and validation set using a sorting of the data based on automatically generated uniformly distributed random numbers. While retaining the original categories, the GAP index was refit in the derivation cohort and a screening procedure (a ‘base-model’ approach) was used to select a suitable model that optimised outcome prediction. Model discrimination with the GAP index plus additional predictors was compared with the refit GAP index primarily by the change in the Harrell’s C-statistic in the validation cohorts in accordance with published methodology.25 The incremental improvement of the two new predictors was also evaluated by category-free net reclassification improvement (NRI) and integrated discrimination improvement (IDI) for events, non-events and in aggregate at 1 year following initial pulmonary function testing. NRI measures whether risk increases in patients that ultimately died and likewise, if risk decreased in patients that ultimately survived. IDI is a similar metric, which incorporates the magnitude of risk change for each individual rather than simply the direction of change.26 Bootstrap resampling with 500 repetitions was used to calculate 95% bias-corrected CIs. New categories in the final model were assigned a point value by rescaling by a multiplicative factor of 5 (linear transformation to allow for maintained weighting of the original model) and rounding to the nearest whole integer of the regression coefficients estimated in the Cox regression model. A staging system was developed by organising point scores into three groups (40% with the lowest estimated risk, 40% with intermediate risk and 20% with the highest risk) to approximate subsequent risk in a method analogous to the original GAP index.5 Calibration of the final model was examined overall over 3 years of follow-up and by visual examination at multiple time points by comparing observed to predicted model values. The baseline cumulative hazard function was estimated from the internal derivation set and the relationship between observed to predicted model values was displayed graphically using calibration plots where the intercept of a well-calibrated model should not significantly differ from 0 and slope should not significantly differ from 1.27
As lung transplantation in IPF may alter expected survival, right censor of patients at the time of lung transplantation may introduce bias in the results. To explore this further, a sensitivity analysis using Fine and Gray competing-risk regression was performed treating lung transplantation as a competing risk. Finally, to explore utility in patients receiving IPF therapy and to ensure applicability to patients treated most contemporaneously, the discrimination of the selected point-score model was assessed in patients in the internal validation set treated with antifibrotics for at least 6 months and in those diagnosed with IPF within the last 5 years. All analyses were performed using STATA V.14 (StataCorp LP; College Station, Texas).
Internal cohort characteristics
There were 611 patients diagnosed with IPF during the study period. Fourty-one patients were excluded given the inability to calculate a baseline GAP index and 8 patients were excluded for incomplete follow-up data. Totally, 562 patients qualified for analysis and were included in the final cohort.
Demographics, baseline characteristics and relevant pulmonary metrics of the cohort are summarised in table 1.
The median age of the cohort was 72 years (IQR: 66, 77) and 129 patients (23%) were women. Mean FVC% predicted and DLCO% predicted were 65.6±19.1 and 41.0±15.4, respectively. Median 6MWD was 366 m (IQR: 262, 445) and a total of 228 patients (40.6%) exhibited exertional hypoxia. Median follow-up was 1.7 years (0.01–3) during which 163 deaths and 55 lung transplantations occurred.
External cohort characteristics
A total of 108 patients were diagnosed with IPF during the study period at the Hôpital Européen Georges Pompidou. All patients were included in the final cohort and baseline characteristics are included in table 2.
Original GAP index
The original GAP index had a C-statistic of 0.676 (95% CI 0.635 to 0.717) when applied to our cohort. Calibration of the staging system of the original GAP index was evaluated by comparing previously published predicted mortality to that observed in our cohort.5 The GAP index consistently overestimated mortality. Overestimated mortality (and thus, poor model calibration) was especially problematic for those patients classified as having ‘stage I’ disease (table 3).
New predictors and mortality
6MWD <250 m and exertional hypoxia were associated with mortality in the entire cohort (HR 4.43, 95% CI 3.21 to 6.10 and HR 4.11, 95% CI 2.97 to 5.68, respectively) (figure 1). The original GAP index, 6MWD and exertional hypoxia were all associated with mortality in univariate analysis of the derivation cohort.
In multivariable analysis, all predictors remained independently associated with all-cause mortality. Inability to walk further than 250 m and exertional hypoxia were associated with a greater than twofold and nearly threefold-independent increase in risk of mortality, respectively (table 4).
Additive value of new predictors
The ‘base-model’ (the GAP index) and models containing the new predictors were fit in the derivation cohort and their predictive power was then measured in the internal validation cohort. The applied refit GAP index is included in online supplemental table E1. The discriminative value of 6MWD and exertional hypoxia when added to the original and refit base model are provided in table 5.
Notably, the addition of 6MWD and exertional hypoxia together resulted in the largest improvement in discrimination (C-statistic 0.756; 95% CI 0.706 to 0.806, difference in C-statistic compared with the GAP index: 0.055; 95% CI 0.011 to 0.098). Furthermore, this model demonstrated significant risk reclassification improvement as reflected by the NRI (61.9%; 95% CI 26.9% to 102.9%) and IDI (0.074; 95% CI 0.032 to 0.143).
Given the large improvement in discrimination of the combined model, a simple point-score model was constructed based on regression coefficients in the derivation cohort (table 6).
This distance-oxygen-GAP (DO-GAP) index was then tested in the internal validation cohort and demonstrated significantly improved discrimination over the original and refit GAP index. Finally, the DO-GAP index was applied to the external validation cohort and again demonstrated good outcome discrimination (C-statistic 0.780; 95% CI 0.682 to 0.877). Total DO-GAP score points were then grouped into three stages (table 6) and overall survival in the entire internal cohort based on DO-GAP stage is displayed in figure 2. When applied to the internal validation set, the DO-GAP staging system resulted in a stage reclassification from their original GAP stage in 31% of patients (online supplemental table E2).
The calibration of the new score was evaluated in the validation cohorts by comparing observed mortality to the predicted mortality from the derivation cohort. Results of the intercept, slope and joint intercept and slope test were all non-significant, indicative of good overall model calibration in the internal and external validation sets. Though not statistically significant, some miscalibration (overestimation of observed risk) and wide CIs between predicted and observed mortality were evident at the highest predicted event probabilities in the external cohort online supplemental figure E1.
A sensitivity analysis modelling mortality with lung transplantation as a competing risk of death was also performed. The original GAP index, 6MWD and exertional hypoxia all remained similarly associated with mortality in univariate and multivariable analysis compared with the original method using right censoring (online supplemental table E3) and no change to the overall proposed DO-GAP index resulted from this analysis. Furthermore, DO-GAP model performance remained consistent when lung transplantation was treated as a competing risk of death in the internal validation set (C-statistic 0.746; 95% CI 0.697 to 0.796, difference in C-statistic compared with the GAP index: 0.068; 95% CI 0.018 to 0.119) and external validation cohort (C-statistic 0.768; 95% CI 0.664 to 0.873). Calibration, too, remained good overall and visually indistinguishable to that of the right censored model (data not shown to avoid redundancy). Model performance was also evaluated in the subgroup of patients in the internal cohort that received antifibrotic therapy (online supplemental figure E2) and patients diagnosed with IPF within 5 years prior to conclusion of the study. Both the original GAP index and the DO-GAP index performed consistently compared with discrimination in the entire cohort. Difference in discrimination between the DO-GAP index and the GAP index for patients in the internal validation cohort on antifibrotics (N=174) and those evaluated most recently (N=141) both favoured the DO-GAP index, however, these differences were not statistically significant (online supplemental tables E4 and E5). The DO-GAP index again demonstrated significant risk reclassification improvement compared with the GAP index in both subgroups.
We assessed the performance of the GAP index in a contemporary cohort of patients treated for IPF. The GAP index provided slightly lower discrimination compared with the original cohort in which it was validated.5 Furthermore, the index was not well calibrated and overestimated mortality, especially in mild and moderate disease. Notably, reduced 6MWD and exertional hypoxia were factors strongly predictive of overall mortality. The addition of these variables to the GAP index significantly improved model discrimination. Finally, we generated, internally, and externally validated an extension of the original GAP index, termed the DO-GAP index, incorporating 6MWD and exertional hypoxia into a simple point-score model. Our new index demonstrated improved discrimination in the internal validation cohort compared with the original and refitted GAP index as well as satisfactory calibration. In addition, it maintained utility in discriminating outcomes when applied to patients with a recent diagnosis of IPF, those treated with antifibrotic therapy and patients with IPF from a geographically distinct external cohort.
IPF is a progressive disease with a prognosis worse than many types of cancer and associated debility that threatens patient independence and quality of life.28 The time of diagnosis has been described as a valuable ‘touchpoint’, a point of patient vulnerability heralding a period of inevitable lifestyle transition.29 The knowledge provided at this encounter, including precise information regarding prognosis, is important in providing usable information to patients and clinicians to plan for future events including the possible need for supplemental oxygen, lung transplantation, consultation of palliative care and ultimate demise.
A number of existing models provide prognostic insight regarding survival in IPF.4–8 Of these models, several has been constructed, which incorporate longitudinal data (factors collected over multiple visits) to predict long-term outcomes.4 18 Though this approach is expected to produce accurate long-term prediction of outcome, it is limited by the need for repeated testing. A subset of these existing models incorporates baseline factors in order to predict prognosis through immediately available objective data.5 6 This approach has the advantage of ease of implementation in clinical practice and allows for sharing of valuable prognostic information to patients during the initial ‘touchpoint’. However, of the available baseline models, the most used, the GAP index, has shortcomings that make its application in clinical practice challenging. Since the development of the original GAP index in 2011, a number of changes in the care of patients with IPF have occurred. First, a large percentage of patients in the original cohort were treated with prednisone, azathioprine and N-acetylcysteine, which current clinical practice guidelines no longer recommend.30 Second, two antifibrotic medications, pirfenidone and nintedanib have emerged as effective therapies for IPF associated with a reduction in the decline of lung function, reduced risk of acute deterioration and possible improved life expectancy.31 Finally, mortality in IPF appears to be decreasing.32 These changes in management and prognosis may explain the observation that the GAP index overestimates mortality in more contemporary cohorts.4 Furthermore, though easy to use, the GAP index may not be predictive of pulmonary function decline over time and may be outperformed as a means of mortality prediction by measures of exercise capacity such as the 6MWD or the need for supplemental oxygen.14 33
The 6MWD is a reliable and reproducible measure of disease status in IPF and can be performed by nearly all patients, even in the presence of severe functional limitations.19 22 34 The test assesses overall functional ability and captures the convergence of multiple influences on exercise capacity including respiratory, cardiovascular, rheumatologic and neurologic contributions.35 Most notably, cardiovascular disease, which is more prevalent in patients with IPF, can directly impact mortality.36 This may be one reason that the addition of the 6MWD to the original GAP score improves its prognostication for all-cause mortality. Furthermore, 6MWD may impart additional prognostic value compared with spirometry data in patients with comorbid obstructive lung disease by identifying impairment that might otherwise be missed by spurious preservation of the FVC.37 Additionally, the 6MWD captures functional impairment imposed by complicating pulmonary hypertension.38 Other investigators have found that the 6MWD added significant prognostic value to an existing longitudinal model for IPF, and this model was also updated to include this factor.18 Likewise, exertional hypoxia is an objective measure of disease severity that has been associated with mortality in patients with IPF.10 39–41 We demonstrated that the incorporation of these two measures of exercise capacity into the original GAP index significantly improved the predictive value of the model. The relative improvement in C-statistic of the DO-GAP index compared with the original and refitted GAP index of 0.068 and 0.050, respectively, observed in the internal validation set is consistent with a large overall improvement in model discrimination.42
Our study has a few limitations. The new scoring system was developed through retrospective evaluation of records from a single tertiary care centre with the possibility of referral bias. Although our centre has an advanced lung disease programme and also houses a lung transplant programme, we evaluate patients with IPF across a wide range of disease severity. Indeed, the mean FVC of the cohort (65.6%) is not too different from that reported in phase 3 IPF clinical trials that generally included patients with mild to moderate disease.43 In addition, model performance was consistent when applied to an external, geographically distinct cohort from a hospital without an advanced lung disease service.
Several testing limitations are also important to recognise. First, during the study time period, reference values for pulmonary function testing were introduced and updated. These updated reference equations have generally been found to produce concordant results, however, differences in per cent-predicted lung function based on extreme height or age or under-represented ethnicity could not be accounted for in our results.44 The performance of a DLCO manoeuvre was protocolised for all patients at their initial visit. If DLCO was unavailable, then this was assumed related to their respiratory limitation; however, in a very small minority of cases, this absence may have been due to other factors. Furthermore, both centres involved in this study are located at sea level and exertional hypoxia may be identified earlier in the disease course at sites at higher elevations. It is important to note that our model includes ‘exertional hypoxia’ rather than supplemental oxygen use, since we were unable to account for compliance with this. Both of these caveats may have introduced bias in the inclusion of exertional hypoxia in our risk prediction model. Furthermore, we evaluated combinations of predictors using the GAP index as a ‘base-model’. The model in its current form requires that all components of the risk score be available to accurately assess risk. Other potentially useful predictors including radiographic abnormalities and serum biomarkers were not evaluated. This approach was chosen to maximise simplicity, practicality, reproducibility and statistical power and is similar to that of other investigators who have sought to improve on original models in IPF.4 18 We chose not to alter the categorical thresholds in the original GAP index and it is possible that modifying one or more of the GAP index predictors could have improved its final performance in our more contemporary IPF cohort. Likewise, in their original work, Ley et al reported a ‘GAP calculator’ which provided risk estimation based on a fully specified continuous model.5 It is likely that such a continuous model would provide more precise risk estimation compared with a point-score index. However, as our goal was to derive a model suited for implementation in clinical practice, we chose not to explore this further. Finally, while necessary sample size for model development was achieved, the external cohort and subgroup sample sizes were small. Specifically, though the proportion of women in our sample likely reflects the epidemiology of IPF, the overall number of women, the prevalence of lung cancer and the proportion of patients administered antifibrotics were low, which may limit generalisability of our results. Furthermore, though the prediction model was well calibrated overall, a visual trend towards miscalibration with lower observed events and wide CIs at the highest predicted event probabilities were appreciated in the external cohort. This likely reflects the small number of high-risk patients in this subset of the external validation cohort. Thus, despite validation in a randomly derived internal subset of patients as well as an external cohort, our DO-GAP index should be further evaluated in other larger cohorts of patients with IPF. Although the DO-GAP index performed better than the original GAP index in patients on antifibrotics and in the more recent era, this improvement in discrimination was not statistically significant, which likely reflects the smaller size of these subgroups.
In summary, we assessed the utility of the GAP index in a modern real-world cohort of patients with IPF. We found that the original index was not well calibrated to predict outcomes observed in our cohort. 6MWD and exertional hypoxia were significant prognostic factors strongly associated with overall survival. These prognostic factors were then combined into a new model, the DO-GAP index, which demonstrated improved outcome discrimination and calibration compared with the original GAP index.
Data availability statement
Data are available upon reasonable request.
Patient consent for publication
This study involves human participants and was approved by the Institutional Review Boards (IRB Numbers U20-03-3956 and CEPRO2021-029) at Inova Fairfax Hospital and Hôpital Européen Georges Pompidou. The requirement for informed consent was waived by the Institutional Review Board at both institutions in part due to the retrospective study design and low risk for patient identification during performance of this research.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Contributors AC and SDN take full responsibility for the content of this paper and act as guarantors for the data. AC, JP, CSK, and SDN conceived the study design. JP and SV were responsible for data acquisition. AC performed the data analysis. All authors contributed to the interpretation of the findings, critically revised the paper for intellectual content, and approved the final version of the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests SDN is a consultant and is on the speakers bureau for Boehringer-Ingelheim, Roche and United Therapeutics. The other authors have no conflicts of interest or disclosures pertinent to this manuscript.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.