Longitudinal assessment of lung clearance index to monitor disease progression in children and adults with cystic fibrosis

Background Lung clearance index (LCI) is a valuable research tool in cystic fibrosis (CF) but clinical application has been limited by technical challenges and uncertainty about how to interpret longitudinal change. In order to help inform clinical practice, this study aimed to assess feasibility, repeatability and longitudinal LCI change in children and adults with CF with predominantly mild baseline disease. Methods Prospective, 3-year, multicentre, observational study of repeated LCI measurement at time of clinical review in patients with CF >5 years, delivered using a rapid wash-in system. Results 112 patients completed at least one LCI assessment and 98 (90%) were still under follow-up at study end. The median (IQR) age was 14.7 (8.6–22.2) years and the mean (SD) FEV1 z-score was −1.2 (1.3). Of 81 subjects with normal FEV1 (>−2 z-scores), 63% had raised LCI (indicating worse lung function). For repeat stable measurements within 6 months, the mean (limits of agreement) change in LCI was 0.9% (−18.8% to 20.7%). A latent class growth model analysis identified four discrete clusters with high accuracy, differentiated by baseline LCI and FEV1. Baseline LCI was the strongest factor associated with longitudinal change. The median total test time was under 19 min. Conclusions Most patients with CF with well-preserved lung function show stable LCI over time. Cluster behaviours can be identified and baseline LCI is a risk factor for future progression. These results support the use of LCI in clinical practice in identifying patients at risk of lung function decline.

participants were switched to room air using fast-responding pneumatic valves within the breathing unit, and instructed to maintain tidal breathing. Washout was continued until expired end-tidal SF6 concentration reached <2.5% of the starting concentration. There was no requirement for a delay between end of washout and start of next wash-in, and subjects started the next test as soon as they were able. Distraction was provided by showing age-appropriate movies or TV shows. In the case of adults, visual feedback of inspiratory volumes was also available to aid reproducibility of breathing patterns, and typically set at 10-15ml/kg. Washouts were performed in the outpatient clinic rooms or on the ward using a portable system as previously described (2).
Both children and adults used identical patient interfaces and mouthpieces, with the only difference being that a smaller filter was used in children (subjects <18yrs). Due to refinements in the patient interface over the 3 years of the study, total deadspace varied from 50 to 58mls for the paediatric setup and from 55-65mls for the adult setup. Children transitioning to adult care also transitioned from paediatric to adult setup. It has been assumed that these small changes in deadspace volume have not affected measurements.
Subjects completed three washouts. If one or more tests were obviously compromised (e.g. evidence of leak), then additional tests were added. Detailed analysis and quality control were performed in a separate offline custom-built washout analysis package prepared in Igor Pro v6 (Wavemetrics Inc., Lake Oswego, OR, USA), as previously described (2)(3)(4)(5). Washout repeats were excluded if there was evidence of leak, or in case of large differences seen in LCI or FRC measurements (>25% from median) (6). Final LCI and FRC measurements quoted are the average of at least two reproducible repeats.
Operator training and quality control was led by the study lead (AH). Completed test files were sent electronically for centralised review by AH, who also analysed all washouts.
Washout test time was taken from the length of the washout file. This is the total time to complete all wash-in and washout tests, including additional tests required, any interval between tests, and analyser warm-up time (60 seconds). It does not include time taken to explain the test to the participants, or time taken to clean the apparatus between volunteers.

Patient experience questionnaire
Adult study participants were asked to complete a participant experience form immediately after testing, on a single occasion in the final 12 months of the study. The form provided opportunity for free-text feedback about the MBW test, asked subjects to identify the worst part, and provided visual analogue scales (VAS) out of 100 on  "How easy was the test was to perform?" ("Not at all easy" to "Very easy")  "Rate the time taken to complete" ("Far too long" to "Just right").

Assessment of clinical impact
For the final 6 months of the study period, clinicians were asked to rate the impact of LCI on clinical decision making. Patient data were loaded into the study database which was used to generate graphic reports showing all stored LCI and spirometry to date, along with times of exacerbation marked on the graph. Clinicians were provided with training to understand LCI and data on LCI variability generated in the first half of the study period. Assessment of impact depended upon having completed the LCI measurement before clinical review (in some cases not possible due to logistic issues) and having the data analysed and QC-checked before clinical review, which required the presence of an experienced operator (AH). Providing these conditions were met, the physician reviewing the patient recorded the clinical outcome immediately after reviewing the patient and rated the impact of the LCI measurement on that decision process as below: 1 -None. LCI data not relevant to clinical decision/outcome.

-Partial.
LCI data played some role in clinical decision/outcome.

-Strong.
LCI data were major factor in clinical decision/outcome.
Reasons for no impact could include all data in concordance (e.g. patient clinically stable with no change in LCI or other lung function measures) or patient clearly unwell and likely to receive treatment irrespective of LCI.

Statistical analyses
Data were analysed using Prism version 8 (GraphPad, San Diego, USA), R version 3.6.0 (Vienna, Austria) and Stata version 15.1 (IBM, New York, USA). Parametric data were expressed as mean (standard deviation) and nonparametric data expressed as median (interquartile range). Comparisons were performed using unpaired t test for normally distributed data, Mann-Whitney U test, or 2-tailed Fisher's exact test for proportions. No adjustment was made for multiplicity and p<0.05 was considered statistically significant.
Target population size was 70 patients with regular follow-up, estimated to provide sufficient numbers for robust longitudinal monitoring. There was no formal power calculation and over-recruitment was permitted.

Latent class growth analysis
Latent class growth analysis (LCGA) is a person-centred method which can be considered as a special type of latent variable modelling (7)(8)(9). LCGA models allocate individuals into different groups or classes based on the shape of their latent growth curve trajectory (9). Thus each class is summarised BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) by a latent growth curve with an estimated mean intercept and mean slope. The class's intercept and slope are referred to as "latent parameters" since these parameters were unobserved prior to undertaking the analysis (10). In LCGA, the variance and covariance within each latent class is eliminated by fixing the variance of the intercept and slope to zero (7,9,10). Due to this lack of withinclass variance, LCGA models therefore assume that all individual growth trajectories within a specific class are homogenous (7,8,10). LCGA can thus be thought of as a fixed effect model (11). This means that all class members have the same intercept, linear slope and quadratic slope (7,10). Individuals are probabilistically assigned to the latent class which best reflects their latent trajectory; with individuals assigned to the class for which they have the highest posterior probability (12). The latent class growth analysis process sequentially increases the number of latent classes, until the optimal number of classes is determined (10).

Feasibility of LCI
Excluding those visits where LCI was not attempted, and excluding also the patients described in the main manuscript who were unable to perform LCI, there were 846 LCI measurements made on 112 subjects. In addition to this, there were another 6 visits where only a single usable washout repeat was obtained, and 61 visits where no usable repeat washouts were completed. These 67 failed assessments represented 7.3% of all visits where washout was attempted, giving a success rate of 92.7%.
Failed assessments were more common in children (52/462 visits, 11.3%) than adults (15/451 visits, 3.3%), p<0.001. Reasons for failure to obtain quality controlled washouts included patient-related issues such as inability to concentrate and complete a washout test (n=3, 4.5% all test failures). At another 22 visits (32.8%), test failure was due to washout technique issues (such as incomplete washin or washout, excessive breath volumes) that had not been successfully corrected at the time of testing.
In 2 cases (3.0%), washout repeats were not reproducible enough to combine. The most common causes of washout failure however, accounting for 40 visits (59.7% of all failed visits), were technical issues relating to the washout system. Some of these were easily corrected, whilst one LCI machine in particular had a leaking valve which took longer to correct and resulted in the loss of several washout datasets.
Of 846 successful LCI visits, a full set of triplicate LCI repeats was available for 683 assessments (81%).
163 repeats were excluded, making up 6.4% of the total. The usual reasons for excluding a repeat were due to poor reproducibility or due to a washout not meeting quality control (eg inadequate washin, air leak). These data, and data on how many visits required a fourth washout to obtain a triplicate dataset, were not captured separately. Operators were encouraged to include a fourth washout if they suspected quality control was poor. Total test time includes all attempted washout repeats, whether included or not

Baseline FEV1
In the original protocol, mild disease was defined as those with best FEV1 in last 6 months as >60% predicted. This lower limit was subsequently reduced in order to capture those with recent dips in FEV1, an issue identified in some adults. At visit 1, some patients were additionally unable to achieve their recent best spirometry. Overall, 6 adults (14%) and 2 children (3%) had an FEV1 below 60% predicted at visit 1, whilst 25 adults (57%) and 46 children (70%) had FEV1 above 80% predicted.
Distribution of FEV1 at visit 1 is shown in Figure E2.    Analyses were also conducted for adults and children separately. The adults contributed 208 of the data pairs (66%). Median % difference in LCI at visit 2 was -0.5%, whilst for children the median % difference in LCI at visit 2 was 1.9%, p=0.015.
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)

Latent Class Growth Analysis
The 4-cluster solution was considered the best fit since it returned the lowest BIC, SABIC, AIC and yielded high entropy (Table E2). All class sizes were>1% of the total number of participants. The degree to which the trajectory classes captured distinct and important patterns in the data was assessed by estimating the average posterior probability for each cluster. These values are presented for the 4cluster solution in Table E3 and follow the GRoLTS-Checklist (14). Individual and mean LCI trajectories for the 4-cluster solution for the combined dataset are shown in Figure E6, and described in the main manuscript text.
These analyses were repeated for the adults (n=29) and children (n=52) separately and are presented in Figures E6 and E7. Due to the smaller numbers of subjects in each of these cohorts, these analyses should be considered as exploratory only. In both cases the cohorts clustered into 3 groups; there were insufficient data to form the same four groups seen with the full dataset. Also in both cases the largest group was those with stable LCI. Posterior probabilities were high for almost all clusters (>0.9).
In both adults and children, univariate modelling identified differences in baseline LCI across cohorts (P<0.0001). Differences in FEV1 across the cohorts were only seen with the adult data (p=0.008). No other factors were significantly associated with clusters.

Trajectory based on initial FEV1 and LCI
The group with normal-range FEV1 (defined as z score >-2) but raised LCI were investigated further, and compared to those with normal FEV1 and normal LCI. Such patients represent a group collectively identified as "normal" by spirometry but divided here by LCI to explore whether this measurement, at a single visit, could provide insight into future outcomes. This analysis was performed post-hoc, and was not a part of the original analysis plan.
This analysis was only conducted on those included in the longitudinal dataset. Of these, 63 patients had normal FEV1 at their first visit, of whom 41 (65%) had elevated LCI (>6.9). Visit 1, LCI trajectory, and cluster distribution are shown below for these two groups.  (29%) indicating change in LCI outcomes over time. Positive and negative changes in LCI (clusters 2 and 4, vs cluster 3) were equally likely, leading to an overall minimal change in mean LCI.

Participant experience questionnaires
Questionnaires were given to participants at the time of LCI review and handed in separately to the clinic nurse. Five previously-recruited subjects were not seen during the questionnaire period, making the eligible population 41 adult subjects (including those transitioned to adult care from paediatrics).
Eighteen completed questionnaires were received (44% eligible population). Responses to free text were grouped into categories. Visual analogue scores (VAS) were taken from measurements of the point where the mark made by the patient crossed the score line, and are shown graphically in bins of 10mm ( Figure E10). Question 1: "How did you find the washout testing?" 32% identified that they had experienced no problems and a further 32% answered "ok". Some added additional comments to say the test required little effort or time (n=3), one indicated that the first test had been the hardest, and one that it was harder when unwell. One respondent answered that the test was "long" and another that it was "boring". Question 2: What was the worst part of the test?
Responses to this question are shown in below in Figure E9.
Question 3: How could the test be improved? 7/18 respondents (39%) were unable to identify any ways to improve the test. Five (28%) felt the test time was too long, one that the apparatus needed to provide more leg room, two recommended more practice before starting testing. Three subjects felt there should be a better selection of films provided for distraction during tidal breathing.