Article Text

## Statistics from Altmetric.com

Recommendations for the standardisation of bronchial challenge were made in 1985,1 and updated in 1993.2However, neither document recommended a single protocol or even criteria that would recommend the adoption of a particular method. Even the terminology lacks uniformity, bronchial responsiveness being used in this review in the hope that it offends the fewest number of readers. Distinction has been made, but not consistently, between (hyper)responsiveness as a general term,2(hyper)sensitivity as a leftward shift in the dose-response curve, and (hyper)reactivity as an increase in the dose-response slope.3 “Airway” and “bronchial” have been used interchangeably. The prefix “hyper” has connotations of excessive, implying a bimodal distribution in the population. Although this has not been substantiated, the acronyms BHR (bronchial hyperresponsiveness) and AHR (airway hyperresponsiveness) are well established, the former being adopted in this article.

A review of the methodology of BHR might be expected to cover all aspects of bronchial challenge, but one short article cannot provide this. Rather than reproduce recommendations which were comprehensive on most details,2 this paper emphasises variations in protocol that lead to results from different studies being non-comparable, explains how different methods of summary have arisen, and discusses the implications of these.

## Variations in protocols

Whichever term for BHR is used, challenge with a pharmacological agent or with exercise in children, rather than allergen challenge, is usually implied. Subjects are assessed for eligibility, including adequate baseline lung function. Many different provocation agents have been used1; histamine and methacholine now predominate, although a number of others have recently become popular.2The aerosol generated by a nebuliser is inhaled during inspiration or tidal breathing in increasing doubling concentrations until the chosen measure of lung function has fallen by a predetermined amount from its value measured after inhalation of the diluent, the chosen maximum concentration is reached, or the test is stopped for some other reason.

Although specific conductance and other measures were adopted in some early work,1
3
4 forced expiratory volume in one second (FEV_{1}) is now almost always used as the measure of lung function because of its greater reproducibility.2
5 Peak expiratory flow is used in exercise challenge if measurement of FEV_{1} is impracticable.6 Exercise challenge is not considered further here. As challenge has been made with a single “dose” in general, there are more limited options for expression of the response than with challenge with a pharmacological agent.

The use of FEV_{1} is virtually the only point on which testing in adults is standardised, variations in eligibility for challenge, the provoking agent, mode of delivery, starting and maximum concentrations, and expression of the response being found in every possible combination. Clinical and community studies tend to differ in eligibility criteria and maximum dose or concentration; termination of the test when a 20% fall in FEV_{1} occurs is usual in community studies but greater falls and higher doses may be achieved when justified clinically and ethically.

## Dose or concentration

The result of challenge is a dose or concentration response curve. With the tidal breathing method developed for clinical use results are expressed in terms of concentration of drug delivered,7but it is more common with methods recommended for epidemiological studies to calculate the cumulative dose.8
9 Juniper*et al* suggested that methacholine had a small cumulative effect but that the effect of histamine was non-cumulative.10 However, within one study there will be a close relation between cumulative dose and either the final dose or concentration delivered, and this may explain why the issue was not discussed in the 1993 recommendations.2

### CALCULATION OF DOSE

Calculation of dose requires a measure of nebuliser output. The output by weight of the nebuliser multiplied by the concentration gives a nominal dose. This will in general exceed the actual dose as aerosol output is generally less than 100% of total output.11 If precise calculation of the dose delivered is thought to be essential, then nebulisers should be pre-calibrated and standardised for all aspects of operation.12 It is probable that differences in reports of post-study calibrations in different laboratories of Mefar jet nebulisers used in the European Community Respiratory Health Survey (ECRHS)13 were due to differences in driving pressure of the dosimeters used in post-study calibration (E H Walters, personal communication).

Even if the nebuliser output is pre-calibrated, output by weight constantly checked, and nebulisers filled frequently with fresh solutions of histamine or methacholine, the amount of drug delivered with each nebulisation may not be constant with use. At best the calculated dose is a good approximation to the dose delivered to the upper airway, but does not necessarily represent that received by the lung.

Results from studies that use different provoking agents, methods of delivery, or maximum concentrations will not be directly comparable even if the expression of the results appears similar. This is sometimes overlooked in arguments over how the data are analysed. The rest of this review discusses how the different summary statistics have arisen and their implications for interpretation.

## Shape of the dose-response curve

The term “dose” will be used from now on, but all that follows applies equally if results are expressed in terms of concentration. If FEV_{1} is plotted against the log dose of histamine14 or methacholine,15 a sigmoid curve is obtained in non-asthmatic subjects and in some mild asthmatics.14 Curves may differ in the maximal response (plateau), the slope of the steep part of the sigmoid, or in position. However, it is necessary to administer high doses in order to observe the plateau of the curve in non-asthmatics. Woolcock *et al*gave a cumulative dose of up to 122 μmol histamine to laboratory staff,14 refraining from higher doses because of the side effects. The test was terminated in asthmatic subjects when FEV_{1} had declined by 60%, maximal response being assumed to be 100%.

Hence, while high doses may occasionally be given to volunteers,16 it is rarely possible to measure the maximal response in clinical studies and never in epidemiological studies in which a high response rate is required. The consequences of this are that at most the mid-curve slope and position of the curve can be estimated, and the majority of subjects will have reached neither a specified fall in FEV_{1} nor a plateau.

## Expression of the results

### SENSITIVITY

A distinction was made between bronchial sensitivity (the dose causing a specified decrease in lung function) and bronchial reactivity (the slope of the dose-response curve beyond this point),3 with the recommendation that both should be determined. Habib *et al* suggested “threshold”, the dose causing a fall in lung function greater than two standard deviations below the mean of pre-histamine values, as a measure of sensitivity,17 but PC_{20}, the concentration causing a 20% fall in FEV_{1}, showed greater reproducibility5
18 and better discrimination between asthmatic and normal subjects.18

### REACTIVITY

Reactivity was generally defined as the slope of the dose-response curve beyond the threshold dose. This has been little used recently, primarily because studies showed it to have little relation to the clinical state in asthmatic subjects,19 that it added little to the information in PC_{20},20 and that higher doses are required to estimate reactivity than PD_{20}alone.21 However, it has been suggested that sensitivity and reactivity have different clinical implications.22

In a community study, if both mid-dose slope and sensitivity are estimated from the small number of data points that will be obtained, the two measures are likely to be highly correlated for statistical reasons. Hence, for all practical purposes only, one measure of BHR is justified.

## Estimation and analysis of PD_{20}

Although originally thought of as a measure of the sensitivity component of BHR, PD_{20} (or PC_{20}) has become the most used summary of BHR itself. Provided at least a 20% fall in FEV_{1} has been observed, the PD_{20} can be estimated by linear interpolation between the last two points on the dose-response curve. As doubling doses are used, interpolation is usually carried out on the log dose scale, although there is some evidence that FEV_{1} is linearly related to dose.23
24 When the FEV_{1}-log dose is sigmoid, the observed part of that curve is approximately exponential.25
26 However, the inherent variability of FEV_{1} makes the shape of the curve often difficult to observe for an individual subject, and in clinical use the practicality of linear interpolation on the log dose scale outweighs any other consideration.

It is for community studies that other methods have been recommended, when data are necessarily computerised and maximising information is the priority. Curve fitting enables all the information to be used, rather than just the final two points, and has been shown to improve repeatability slightly.25 In order to increase the number of subjects with a measurable PD_{20}, extrapolation by one doubling dose has sometimes been used but is now thought inadvisable.26

Inevitably, whatever maximum dose of histamine or methacholine is permitted, and whether or not extrapolation is used, most studies find that less than 50% of the population achieve a 20% fall in FEV_{1}. Hence the percentage with PD_{20} less than an arbitrary cut off point is the most used summary statistic, and logistic regression is the most used method of analysis to identify risk factors for BHR. The percentage summary is simple to understand but has several drawbacks: (1) it wastes the information in the size of PD_{20} in those with an estimate; (2) it suggests that BHR is a dichotomy whereas the evidence is that the distribution is unimodal27
28; and (3) the value depends on the cut off point used which is normally the maximum dose except for reasons of comparability with other studies.29

Methods to overcome the first and second problems depend on assumptions. These are essentially statistical methods for censored data, an observation being “censored” when it is known only to be above a certain limit, in this case the maximum dose.30Either method designed for censored data can be used directly,30 or survival analysis methods can be exploited, in which dose is treated as “time”, reaching a 20% fall in FEV_{1} being the equivalent of failure or death in conventional survival analysis.31 Each method requires the assumption of a distribution for PD_{20}. As most subjects have censored data, only the left hand end of the distribution can be observed, and power to detect departure from any distribution is low. Evidence in favour of a log-Normal distribution has been presented28
32 and taken as a fact by some authors,33 but a good fit to a Weibull distribution has also been reported.31 Whether data exist that would show any material difference in goodness of fit, whether this would be generalisable, or whether other positively skewed distributions would fit equally well is not known. The log-Normal distribution leads to estimation of the geometric mean PD_{20} and “fold difference” or “percent change” between comparison groups.33 The results, assuming a Weibull distribution, are expressed in relative percentiles, the ratio of the doses required for a given percentage of individuals to achieve a 20% fall in FEV_{1} in different groups.31

Although statistical programs are now readily available for both analyses, the degree of censoring for PD_{20} far exceeds that which is recommended for such analyses, and other assumptions of the analyses are untestable. These are more crucial than the distribution assumption for the calculation of p values and confidence intervals—namely, homogeneity of variance of the log-Normal distributions or proportionality of hazard functions in survival analysis.

## Continuous measures of BHR

While some authors have tried to maximise use of the information in PD_{20} by the above methods, others have sought alternative measures of BHR that would enable data from all tested subjects in a population study to be included in a standard statistical analysis. As the percentage decline in FEV_{1} with cumulative dose is approximately linear over the range of doses permitted, the slope of this line is very highly correlated with PD_{20} but can be measured in all subjects challenged. O’Connor *et al* proposed that the slope should be estimated simply by dividing percentage fall from post-saline FEV_{1} at the highest dose given by that final dose.34 Thus an estimate is possible for all subjects given at least one dose of histamine or methacholine. Abramson *et al* proposed the slope estimated using linear regression.35 This least squares slope requires at least two doses to be administered for estimation to be possible, but uses all information.32

These two measures have immediate appeal but need to be used with caution. Firstly, they require transformation in order to satisfy the statistical requirements.32 Secondly, whether a log transformation or a reciprocal transformation is used, a constant must first be added to remove negative values that can occur in subjects with low BHR. Thirdly, the fact that every subject has a value does not guarantee that the summary measure provides information extra to that in PD_{20}. The two-point slope was found to be poorly repeatable in subjects without a measurable PD_{20} on both occasions.32 Hence, its use is little better than analysing PD_{20} using censored regression. The least squares slope was found to be reasonably repeatable, but no simple transformation to a Normal distribution was found.32 Its use is equivalent to estimating PD_{20} by extrapolation beyond the maximum dose using a linear model. Verlato *et al* found this to give better agreement with observed values than extrapolation from lower doses using an exponential curve on a logarithmic scale, but with some overestimation of PD_{20}, and consequently cautioned against extrapolation.26 It should be noted that, as all data points are used, these slopes do not measure “reactivity” as defined above.

### PERCENTAGE DECLINE WITH LOG DOSE

A measure of decline in FEV_{1} with log dose has been proposed quite separately by two research groups. Burrows *et al* suggested BRindex, defined as the log ([% decline in FEV_{1}/log(final methacholine concentration in mg/dl)] + 10), as a measure of BHR in children after finding no Normalising transformation for O’Connor’s slope.36 Chinn *et al* proposed 100/(“log slope” + 10), where “log slope” was defined as the least squares slope of % decline in FEV_{1} with log_{10} (cumulative dose in mg), to overcome a potential problem with nebuliser batch variation in the ECRHS mentioned above.23 BRindex and ECRHS slope are not quite equivalent, even allowing for the slight difference in transformation, in the sense that O’Connor’s two-point slope is an approximation for the least squares slope on the linear scale. As zero log dose is unity on the original scale, BRindex assumes zero fall in FEV_{1} at a concentration of 0.01 mg/ml. BRindex does not use all information and mean values of it and the ECRHS slope may depend on the range of doses or concentrations used, as the % decline in FEV_{1} curve with log dose is not linear.

## Importance of variations in protocol and expression of results

Within one study the protocol should be uniform. The continuous slope measures or methods for analysis of PD_{20} that use all the information, either by censored regression or survival analysis methods, will have greater power to detect group differences than logistic regression of the percentage achieving PD_{20} at some arbitrary dose.28 No other difference in results has been reported. This is not surprising given the very high correlation between least squares slope on the linear scale and PD_{20}. Although the ECRHS slope and PD_{20} are less highly correlated, these also have given similar results.29
37

At the other extreme, if results are to be compared between different studies, any of the variations in protocol may matter, particularly if levels of BHR are to be compared rather than relations of BHR to risk factors. Comparisons of prevalence of BHR between populations have been hampered, in particular, by the differences in provoking agents, variety of cut off points, and age ranges of the subjects.38 There is little alternative to large scale multicentre studies if truly comparative data are to be obtained. An additional problem is that eligibility criteria necessarily entail exclusion of subjects with poor lung function, many of whom may have high BHR, and this will differ between the populations studied. Here there is an advantage of logistic regression of PD_{20} as a sensitivity analysis of the assumption that all such subjects have PD_{20} below the cut off point can be carried out easily.37

Relations of BHR to risk factors can be compared qualitatively between studies employing different protocols, but results may appear different because of sample size differences or the limited power of some analyses. Increasingly, there is a desire to combine results from different studies using meta-analysis.39 Although the method was developed for combining randomised controlled trials, it can also be used to provide a quantitative summary of results from several observational studies, as used in relation to passive smoking.40 In the context of BHR, either odds ratios from logistic regression in relation to the risk factor of interest can be combined or differences in means of one of the continuous measures. If heterogeneity between studies is detected this may be due to any of the variations in protocol that occur. Conversely, differences in protocol may obscure true heterogeneity in the effect of interest. While sensitivity analyses may help to differentiate between true and spurious heterogeneity, the variations in protocol are likely to be too great for this to be convincing. An alternative is to use effect size, the difference in means divided by the within group standard deviation. This provides a dimensionless measure, although it does not guarantee comparability. Calculation of effect size requires a continuous outcome—that is, one of the slope measures—or an estimate of standard deviation of log(PD_{20}) from censored regression.

## Conclusion

Given the very many variations in protocol, different ethical requirements in different countries, and suitability of methods in different circumstances, it seems unlikely that researchers or clinicians could agree to standardise the measurement of BHR. The variations may be unimportant in clinical use but hamper the progress of epidemiology. Analysis of percentage PD_{20} below an arbitrary cut off point by logistic regression is misleading, given the unimodal distribution of BHR in the population, lacks power, and is unhelpful for those wishing to combine results using meta-analysis. Least squares slope or ECRHS slope should be used, or analysis of PD_{20} using censored regression, while recognising that the first and third are essentially the same analysis in different guises, and that each measure has problems which reflect the nature of the data.

## Acknowledgments

I am indebted to my colleagues on the European Community Respiratory Health Survey, particularly Dr Deborah Jarvis and Professor Peter Burney, for many fruitful discussions.

## References

## Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.