Computed tomography (CT) has two potential roles in the evaluation of patients with cystic fibrosis (CF) lung disease: as a diagnostic test primarily for the detection of supervening complications and as a monitoring tool in clinical research. Interest in the latter role has gained momentum in the last 5 years because of two factors: (1) therapeutic options for CF lung disease are developing rapidly, hence the need for an outcome measure that can be applied in clinical intervention trials; and (2) it has become clear that traditional outcome measures such as pulmonary function tests are relatively insensitive to the early structural damage that occurs in CF. Several recent studies have shown that CT can be used as a potential surrogate outcome measure, although its suitability for this specific role is controversial and still under investigation. This review summarises current concepts relating to the research applications of CT in CF, with particular emphasis on the evidence supporting the use of CT as a surrogate outcome measure in clinical trials.
- CF, cystic fibrosis
- CT, computed tomography
- FEV1, forced expiratory volume in 1 second
- HRCT, high resolution computed tomography
- LCI, lung clearance index
- cystic fibrosis
- computed tomography
Statistics from Altmetric.com
- CF, cystic fibrosis
- CT, computed tomography
- FEV1, forced expiratory volume in 1 second
- HRCT, high resolution computed tomography
- LCI, lung clearance index
The first observational studies using computed tomography (CT) in patients with cystic fibrosis (CF) were published in the late 1980s.1–3 These studies described the CT features of the disease and demonstrated the superiority of CT over the chest radiograph in depicting early or subtle disease of the airways. The most ubiquitous feature of CF clearly identifiable on CT is bronchiectasis. Other morphological abnormalities include bronchial wall thickening, a mosaic attenuation pattern (a term used to describe the abnormal inhomogeneity of lung parenchymal density), centrilobular nodules/“tree-in-bud” pattern, areas of consolidation, and atelectasis and bullae. Unlike many other lung diseases, there has been a distinct lack of CT-histological correlative studies in CF (largely because of the lack of biopsy material), so the view that, for example, centrilobular nodules represent exudative small airways disease or that areas of decreased attenuation (as part of a mosaic attenuation pattern) represent the consequences of small airways obliteration have been formed from corroborative studies of similar, but different, diseases such as diffuse panbronchiolitis and idiopathic bronchiectasis.
Expiratory CT enhances a mosaic attenuation pattern, with the areas of decreased attenuation representing air trapping. In CT studies of CF the terms mosaic attenuation pattern, air trapping, and decreased attenuation lung are used synonymously when describing the extent of small airways disease (either due to obliteration or bronchial hyperreactivity). For the remainder of this review, the terms mosaic attenuation pattern or air trapping will be used, the latter only if expiratory CT was obtained.
CT SCORING AND DISEASE QUANTIFICATION
In 1991 Bhalla published the first paper outlining a method of scoring CT abnormalities,4 thus providing an approach to quantifying disease and enabling structure-function relationships to be explored. Since then a number of additional scoring systems have been proposed.5–7 Current scoring systems are similar in terms of the range of CT features scored and most are semi-quantitative. These range from fairly coarse scores (such as the use of a two-point scale to indicate the presence or absence of a feature) to an attempt at a more precise estimation of disease (the use of a four-grade scale to quantify bronchial wall thickness). A recent study by de Jong et al8 showed that the inter-observer agreement for five established scoring systems (those of Castile, Bhalla, Helbich, Santamaria and Brody) was good, with intraclass coefficients ranging from 0.74 to 0.97.
A problem with the plethora of scoring systems reported in the literature is that there is no standardised approach to CT scoring; with each new generation of studies using CT there is an amalgamation of the different scoring systems with a few additions or modifications according to the preferences of individual radiologists. CT is increasingly being used in clinical intervention studies—and it would clearly be advantageous if there was some consensus as to the scoring system used. Clear definitions of the abnormalities being scored (particularly pertinent to the scoring of small airways disease which is confusingly defined in some studies as areas of “hyperinflation” and in others as areas of “air trapping” or a “mosaic attenuation pattern”) would help ensure that observers scored and reported the same morphological feature. A robust scoring system devised by Brody9 and recently refined10,11 has been shown to be both reproducible and sensitive to variation in the severity of CF lung disease.11
The concern over the lack of perceived objectivity associated with conventional semi-quantitative (non-automated) scoring, coupled with the obvious demands of non-automated methods on radiologists’ time, has led to interest in the use of software driven automated techniques to quantify CT features. At present these techniques are confined to the quantitation of bronchial wall thickness and bronchial dilatation12,13 and air trapping.14–16 There are both proponents and detractors of automated methods; the benefits and problems of both automated (so-called objective) and non-automated (subjective) methods are discussed below, focusing on bronchial dimensions and air trapping.
Direct comparison of automated versus non-automated methods is fraught as different statistical tests are often used to express inter-observer variation. The inter-observer variation of automated and non-automated scoring methods for airway dimensions has been expressed as the weighted kappa coefficient,17,18 the intraclass correlation coefficient.12,19–21 Table 1 summarises the values obtained for inter- and intra-observer variation in recent studies that have used either automated or non-automated techniques to quantify bronchial dimensions. In summary, both automated and non-automated scoring methods would appear to have acceptable reproducibility.
The reproducibility of automated techniques used to quantify air trapping (areas of decreased attenuation below a given threshold) has not been fully documented but can be simplistically assumed to be near perfect—that is, repeated measurements obtained by software derived algorithms have no reason to differ. The inter-observer variation of non-automated methods for the quantification of a mosaic attenuation pattern/air trapping is more variable, some studies reporting low inter-observer variation17,19 while others have reported unacceptably high levels of variation between observers.18 The experience of the radiologist involved, the coarseness of the grading system, and problems of definition are factors that influence inter-observer variation so, in terms of reproducibility alone, there are potential benefits of using automated techniques. However, it should be appreciated that at present no work exists to validate automated methods for the quantification of air trapping.
Quantification of air trapping and bronchial dimensions
The second issue is that of accuracy—the ability of the method of quantification to reflect the desired “target” (in the context of CT, the target is a single CT feature under scrutiny). Experienced radiologists can readily identify obliterative small airways disease by the paucity and reduced calibre of vessels in the affected lung which is of decreased attenuation (blacker than expected) (fig 1). Regions of lung which are normally avascular (adjacent to the fissures and at the apices) and other artefactual causes of decreased attenuation are intuitively ignored when images are interrogated. Automated methods of quantifying air trapping are either based on a fixed density threshold approach15 or a varying threshold,14 but these methods have problems.
Quantitative image analysis using a threshold based approach has been most widely studied in emphysema.25–27 The most robust measure introduced by Muller et al is the density mask approach in which a computer highlights pixels below a set threshold. In this study, a threshold was set at −910 HU based on the fact that normal lung has an attenuation greater than −910 HU and an excellent correlation was achieved with extent of disease on pathological specimens.25 Applying the threshold approach to obliterative small airways disease is less straightforward. The abnormal decreased attenuation lung in small airways disease is represented by a spectrum of CT densities and thus is not entirely analogous to the quantification of emphysema (in which areas of emphysema are characterised by uniform air density). As no gold standard exists, it is impossible to determine whether automated methods are more or less accurate in quantifying small airways disease.
The quantification of bronchial wall thickness—whether by automated or non-automated methods—is difficult as retained secretions can cause apparent wall thickening (fig 2). In studies using automated methods, airways that are obviously mucus-lined are excluded from computer analysis.12
Sensitivity to change
It is crucial that any method of quantification should be sensitive enough to detect change, be it following a therapeutic intervention or simply serial change following a defined time interval. A recent study compared air trapping measurements obtained using an automated method (threshold approach) and a non-automated method (visually scored CT scans in which a 4-point scale was used to quantify air trapping) during a 1 year placebo controlled dornase alfa interventional trial in children with mild CF.16 All subjects underwent CT scanning and lung function tests at baseline, 3 months, and 1 year. At 3 months and 1 year there were no statistical differences between treated and non-treated groups for lung function tests and visual CT scores, although there were significant (3 months) and near significant (1 year) differences between the two groups for quantitative measurements of air trapping. Intriguingly, the trend in air trapping scores assessed visually was towards an increase in the treated group and a decrease in the non-treated group, in contrast to the results of the automated method in which air trapping decreased in the treated group and increased in the non-treated group. This disparity was also evident at 1 year (when the visual CT air trapping score was significantly higher in the treated group compared with baseline). The study confirms that both methods are able to show significant change from baseline measurements following a clinical intervention (even when a relatively coarse scale is used to visually quantify air trapping). However, an explanation for the conflicting results obtained with the visual and the automated system is awaited.
De Jong et al used a software programme to evaluate bronchial wall thickness and identified a small (0.03 mm) but significant increase in patients on serial CT scans (2 year interval) whereas visual scoring systems showed no change.12 Further studies are needed to establish whether these small changes are coupled to clinically significant changes in true outcome measures.
These two studies suggest that automated techniques may be more sensitive to small changes than non-automated methods. The fact that using an automated method enables a change to be demonstrated is certainly encouraging, and it seems likely that, once the relevant software is more widely available, automated methods may, if validated, become the norm, particularly for interventional studies. The fact that automated techniques may not always accurately represent the CT feature under evaluation may not be so crucial if the change that is demonstrated is produced by consistently measuring the same feature.
VOLUME CONTROLLED ACQUISITION
In children under the age of 5 years it has been suggested that controlled ventilation techniques should be used in order to minimise artefact from respiratory motion and the problem of imaging at low resting tidal volumes which can easily obscure the early changes of bronchiectasis.28 This technique requires sedation and positive pressure facemask ventilation but provides motion-free images of the lung at full inflation.
For interventional studies in CF it is likely that volumetric high resolution CT scanning (HRCT) will replace interspaced HRCT. The main benefit of volumetric HRCT is that it yields contiguous thin sections of the entire lung. Generating serial images that are anatomically comparable is possible with volumetric HRCT (this is not the case with interspaced sections), and this facilitates the ability to detect change in longitudinal studies. The increased sampling of the lung achievable with a volumetric acquisition should also improve the accuracy of scoring CT features such as bronchial dilatation and mucus plugging. Matched inspiratory/expiratory pairs also facilitate the accurate assessment of air trapping. It is likely that CT protocols for research will include limited expiratory images as these accentuate what is often a subtle mosaic attenuation pattern.
While the advantages of controlled ventilation CT scanning protocols are recognised in younger children,28 the necessity of this approach is debated in older patients. Spirometer triggered gating has the advantage of allowing measurements of air trapping to be made at standardised lung volumes when evaluating serial CT scans. This is important as the extent of air trapping demonstrated on CT is dependent on the degree of expiration. However, studies have shown that, in reality, most patients are able to breath hold reliably at end-expiration following rehearsed verbal instructions and, more importantly, that lung volumes at end-expiration do not change significantly between examinations. A study by Bankier et al showed that, in patients with obliterative bronchiolitis, residual volumes determined by CT software were not significantly different on repeated examinations and that the extent of air trapping scored using visual assessment was unchanged.29 Additionally, in a study comparing spirometric gated CT with automatic patient instruction, Kauczor et al concluded that there were no significant differences between the mean lung density of expiratory scans in normal subjects when spirometric and the “verbal instruction” sets of images were compared.30
Non-gated acquisition has been used in numerous published CF clinical trials and, for centres without suitable apparatus, volitional breath holding is probably an acceptable method of acquiring expiratory images. Most modern CT scanners allow volume measurements to be made using in-built computer software which segments out the lung on each consecutive image. Thus, for those comparative studies in which it is important to ensure that there are no significant differences in expiratory lung volume between serial examinations, it is possible to verify the lung volumes at which the CT scans were acquired.
Most CT examinations performed for clinical reasons in older children and adults are interspaced HRCT scans (thin sections acquired at 10 mm intervals at full inspiration). Expiratory CT images are not routinely obtained in all centres. A recent study set out to determine whether increasing the interval between CT sections to 20 mm and 30 mm was a feasible dose reducing strategy in children with CF.31 It was concluded that increasing the interval between sections to more than 10 mm resulted in a significant loss of information compared with scans obtained at 10 mm intervals.
The nature of clinical intervention trials necessitates, at the very least, a before and after CT scan. With new treatments such as gene therapy, in which the timing of the effect is largely unknown, several CT scans may be required to avoid missing an early or late response. Some CF centres routinely perform surveillance CT scans every two32 or three years.20 Multiple examinations increase the radiation dose to children and young adults and recent concerns regarding dose are justified, particularly in light of the increasing survival in patients with CF. A recent study has shown that the survival reduction associated with annual CT scans (interspaced HRCT with an average estimated dose of 1 mSv using 120 kV and 120–160 mA/s) from 2 years until death is approximately 1 month and 2 years for CF cohorts with a median survival of 26 and 50 years, respectively, indicating that the overall risk is relatively low but will increase as survival in CF patients improves.33
There are clear advantages to the use of volumetric HRCT over an interspaced technique for research studies as described above. However, if conventional parameters are used in the acquisition of volumetric CT, the radiation dose incurred can be considerable: sometimes as much as 7–10 times the effective radiation dose of an interspaced CT scan. At present there are no recommendations for scanning parameters for volumetric HRCT scanning in children and young adults. Some institutions use 1 mA/kg for children and young adults <50 kg in conjunction with 100 kVp. A timely publication by Siegel et al34 has recommended mA and kV settings for children of different weights. A reduction in mA can lead to a substantial decrease in dose. Lucaya et al35 showed that the dose of a conventional HRCT scan performed at 120 kVp and 180 mA was nearly four times higher than a similar examination using 50 mA. No differences in image quality were found between the different tube currents. Adapting the kVp in CT scanning of children is more controversial. Cody et al36 have shown that, even in the smallest patients, 80 kVp is associated with unacceptable beam hardening; however, other workers favour a setting of 80 kVp in children weighing <50 kg.37 It is clear that attention to scanning parameters is needed to ensure that patients are exposed to an acceptable level of radiation. Using a 1 mA/kg protocol in all patients weighing <50 kg with 100 kVp incurs a dose of 0.77–1.14 mSv for a volumetric high resolution scan obtained on a Siemens Sensation 64 multislice scanner, similar to that of an incremental high resolution scan performed using conventional parameters, 0.9 mSv (120 kVp and 90 mA).
CT AS A SURROGATE OUTCOME MEASURE: WHAT IS THE EVIDENCE?
The validation of an imaging biomarker as a surrogate outcome measure requires three criteria to be met: (1) the presence of the imaging biomarker is closely coupled or linked to the severity of the target disease; (2) the detection and quantitative measurement of the imaging biomarker is accurate and reproducible; and (3) the measured changes over time in the imaging biomarker are closely coupled to the success or failure of the therapeutic effect and the true end point sought for the medical treatment being evaluated.38–40 It has also been suggested that an outcome surrogate should ideally improve rapidly with effective treatment and be correlated with true outcomes rather than short term measures of disease.41
Forced expiratory volume in 1 second (FEV1) has traditionally been used as a reliable measure for monitoring the course of CF lung disease, and has shown to be the best surrogate for survival (a true outcome measure). However, there are several limitations associated with using FEV1 as a surrogate end point in interventional studies: (1) in patients with early or mild disease, the maintenance of a normal FEV1 does not necessarily indicate a lack of lung damage—this limits its use as a means of demonstrating a lack of progressive lung damage following a specific treatment; (2) FEV1 is commonly used to stratify subjects in research studies according to the severity of their disease, but if differences in severity of lung disease are not reflected by FEV1, this could confound the results of controlled studies; and (3) even in subjects with reduced FEV1 the rate of decline can be very slow, thus requiring studies involving large numbers of patients followed over a long period of time. These concerns have prompted a search for another measure of lung disease in CF.
Based on criteria outlined at the beginning of this section, what is the evidence that CT could be used as a potential outcome measure? Firstly, it is clear that abnormalities demonstrated on CT do reflect the severity of disease in patients with CF. In addition, the detection and quantification of CT features are, as previously discussed, relatively reproducible. Theoretically, CT can be performed serially (a necessary attribute of an applicable outcome measure), with the caveat that radiation burden is an important issue which will ultimately limit the number of repeated CT examinations. The accuracy of the detection and quantification of CT abnormalities is much harder to assess in the absence of a gold standard. Most studies have used correlation with lung function (FEV1) as a means of validating CT scoring systems. However, if it is accepted that FEV1 is a blunt tool and, in particular, can be insensitive to mild or localised lung disease, then its role as a “validator” is questionable. Thus, it can often be misleading to make a statement on the suitability of a biomarker as a potential surrogate outcome measure based on the correlation (or lack of correlation) with another “established” surrogate outcome measure. The fact that CT demonstrates abnormalities such as mild bronchiectasis or a mosaic attenuation pattern even when FEV1 is normal,42,43 or that disease progression can be seen on sequential CT scans in patients in whom FEV1 shows either stability or even improvement,32 has led to the perception that, at least in some respects, CT may be superior at identifying early lung damage in patients with CF and more accurately represents disease burden than FEV1. Long et al22 showed that infants (mean age 2 years) with CF have greater airway wall thickness and lumen diameter than matched controls—further evidence that CT abnormalities can be identified very early in patients with CF. The lung clearance index (LCI) is emerging as a very sensitive test in the detection of early CF lung disease. The technique involves the inhalation of a known concentration of a marker gas. When equilibrium has been reached, supply of the marker gas is stopped and the falling concentration is measured as it is washed out during continued tidal breathing. LCI is calculated from the time taken for the marker gas concentration to fall to 1/40th of the starting level.44 The LCI has been shown to be more sensitive in separating individuals with CF from control individuals than spirometry or plethysmography,45 and a better predictor of progression of disease than lung function in children and adolescents with CF.46 The correlation of CT scores with LCI measurements (as opposed to spirometric measurements) should be addressed as valuable insights may be gained.
Sensitivity to change, an important prerequisite of a surrogate outcome measure, has also been confirmed in several CT studies that have evaluated specific features before and after a clinical intervention—either dornase alfa (Pulmozyme)47,48 or conventional treatment for an acute exacerbation.49–51 Mucus plugging, centrilobular nodules, and bronchial wall thickening are reversible CT features.50,51 Mosaic perfusion was found not to improve following conventional antibiotic treatment,50 progressed following dornase alfa,49 but was found to improve in two studies.16,48 The patients in the latter two studies had mild CF lung disease (FEV1 ⩾70%, FVC ⩾85%), and it is presumed that air trapping in these patients may be a result of bronchial hyperreactivity from inflammation or increased airway compliance in this age group rather than an obliterative bronchiolitis which is most likely to be the dominant factor in the air trapping seen in patients with more severe disease.
A recent study by Brody et al10 has shown that the change in overall CT score, bronchiectasis, and parenchymal disease on CT correlated significantly with the number of infective exacerbations that occurred during a 2 year period. This is the first study to show that changes in CT are coupled to what is considered to be a “true” outcome measure. The change in pulmonary function over the 2 year period did not correlate with the number of respiratory tract infections.
A recent paper by de Jong et al20 provides further evidence for the use of CT scanning as an outcome measure. CT scans and lung function tests were evaluated in 119 CF patients at baseline and after a 2 or 3 year interval. CT scans were scored using a system developed by Brody which results in both a total/composite CT score and individual component scores. CT composite scores, individual component scores, and certain lung function parameters worsened with time, but the peripheral bronchiectasis score showed the largest annual numerical change in children—an increase of 1.7% per year (p<0.0001). On the assumption that the feature (be it structural or functional) that produces the largest annual change is the most “sensitive”, the authors conclude that the peripheral bronchiectasis score could be used as an outcome parameter in clinical studies. It is suggested by the authors that halting the progression of a particular feature that is known to be largely irreversible (such as bronchiectasis) may be more readily identifiable than looking for a therapeutic effect on a potentially reversible feature such as mucus plugging or bronchial wall thickness. Further studies which aim to determine if the worsening in peripheral bronchiectasis is coupled to a change in a true outcome measure (quality of life scores, frequency of pulmonary exacerbations) are required.
In summary, there is enough evidence at present to justify the inclusion of CT scanning as one of the many potential outcome measurements in clinical intervention studies. What specific CT end point should be used is still under investigation and is likely to depend on the stage of disease of the study population. Some would favour using a total CT score encompassing all CT morphological features, while others advocate selecting a single CT feature such as air trapping (in the younger population),15 a bronchiectasis score,12,20 or a composite score which incorporates both CT and certain lung function parameters.47
Interest in CT has been fuelled by the need to identify new potential outcome measures to assess novel treatments in patients with CF. While CT appears to fulfil some of the necessary criteria, further studies showing that it is an effective outcome measure in intervention studies are needed before it should be accepted as a potential surrogate measure. There is also a need for longitudinal studies to establish the trends of the different CT features over a defined period of time. This will enable true change as a result of a therapeutic intervention to be distinguished from expected fluctuations in the different CT features in patients with CF. Further work is also needed to increase the accuracy and availability of automated techniques for disease quantification. Correlation of CT scores against non-imaging assays such as the LCI may provide valuable information. The use of hyperpolarised 3-helium magnetic resonance in patients with CF lung disease is currently being investigated. The initial results look promising, with preliminary studies indicating that ventilation defects show good correlation with spirometry,52,53 change with treatment, and are increased in number in patients with CF with normal spirometry results.53
Finally, a consensus regarding the most appropriate non-automated scoring system should be reached to facilitate the comparison of results across centres. Prevailing concerns regarding dose to young children requires manufacturers and radiologists to continue to lower the radiation dose, although substantial reductions in dose are unlikely at present if tailored protocols and parameters such as those outlined in this review are adhered to.
Competing interests: none declared.