Article Text

## Abstract

Background: There is no standardised protocol for the measurement of bronchial responsiveness. Results from different studies are difficult to compare and combine.

Methods: Analyses are divided between those of a continuous outcome, which can be directly standardised as effect size, and those based on a binary outcome. A published method is used to convert an odds ratio to equivalent effect size.

Results: The use of effect size allows comparison between studies using a continuous outcome but different protocols, provided the relevant standard deviation is reported. Effect size from a continuous outcome and that derived from an odds ratio from an equivalent analysis gave similar results.

Conclusions: Systematic reviews which include both continuous effect estimates and odds ratios can include both in one meta-analysis, provided relevant standard deviations are published for the former. Authors are encouraged to report these in all fields in which measurement protocols vary.

- bronchial responsiveness
- protocol variation
- effect size
- statistical analysis

## Statistics from Altmetric.com

The measurement of bronchial responsiveness (BHR) is advocated in population studies of asthma.^{1} While not synonymous with asthma, it is an objective measure that has particular advantages in multicultural studies.^{2} BHR has also been used in the diagnosis of asthma and the measurement of severity in clinical settings.^{3}

The results of a scientific study are of little use unless they are in a form that is useful to the readers and allows comparison with previous or future work. This normally requires that the measurement protocols are similar. While some protocol variation exists for many measurements, this is a particular problem for BHR.^{3,}^{4}

Results of histamine or methacholine challenge are most commonly summarised in clinical studies by the dose (PD_{20}) or concentration (PC_{20}) that produces a 20% fall in forced expiratory volume in one second (FEV_{1}). A logarithmic scale is considered appropriate for the analysis,^{5} and results are frequently expressed in doubling doses or concentrations. For example, in a randomised controlled trial of two treatments for asthma, the result for BHR might be one doubling dose difference—that is, that after one treatment, compared to the other, twice as much of the provoking agent was required to produce a 20% fall in FEV_{1} or, equivalently, that BHR was reduced by one doubling dose.

## CONSEQUENCES OF PROTOCOL VARIATION FOR CLINICAL STUDIES

Results reported as PD_{20}, PD_{10}, PC_{20}, etc can be converted to doubling dose units, and authors of meta-analyses have used this method to combine results from different studies.^{6,}^{7} This does not necessarily overcome the problem of protocol variation. The response to treatment may be more variable in one study than in another as a result of the protocol differences. Provided a standard deviation of the outcome on the doubling dose or other logarithmic scale is reported, this can be used to calculate a standardised effect for each study. The estimate and its associated standard error are divided by the standard deviation to give “effect size”. This does not affect the associated p value. When the estimate is a difference in means, effect size is also known as standardised difference. This approach was used by Abramson *et al* to combine results from 12 studies on immunotherapy in asthma with non-specific BHR as outcome and 14 studies that measured allergen specific BHR.^{8}

## A FURTHER PROBLEM IN POPULATION STUDIES

In clinical studies on asthmatic patients, each participant is likely to have a measurable PD_{20} on each occasion. In a population study the maximum dose of provocation agent will be limited by safety considerations, and at least 50% of the population will have a less than 20% fall in FEV_{1} at the highest dose administered. For these participants, PD_{20} is said to be “censored”. There are three ways of dealing with this in the analysis.

One is to use a method known as censored regression, implemented in at least one major statistical program.^{9} This produces estimates on the log(PD_{20}) scale, but is heavily dependent on the assumption that log(PD_{20}) is normally distributed. However, there is evidence for this assumption,^{10} and it has been used in the analysis of several population studies of the relation of BHR to urinary electrolytes.^{11,}^{12}

A second method is to use an alternative summary statistic for BHR. Several measures of “slope” have been proposed^{3,}^{4} which have the advantage that a value can be defined for each person. The simplest of these, due to O'Connor,^{13} is the fall in FEV_{1} divided by the final dose. Provided a standard deviation is reported, estimates using censored regression of PD_{20} or a slope outcome can be combined or compared with each other using effect size as described above.

A third method is the most common in population studies. Participants are simply divided into those “with” and those “without” BHR according to whether PD_{20} is less than the maximum dose. Descriptive statistics are proportions of the sample “with BHR”, analysis is using logistic regression, and results are given as odds ratios. This analysis makes few assumptions and is relatively simple to understand. However, it is wasteful of information. In addition, BHR in the population has a unimodal distribution,^{14,}^{15} so any division into subjects “with” or “without” BHR is purely arbitrary and usually determined by the maximum dose administered.

Cook and Strachan found 10 studies that reported an odds ratio for BHB for children exposed to enfironmental tobacco smoke compared with those who were not.^{16} Odds ratios can be combined on the log scale and antilogged to give a final summary. Whether Cook and Strachan^{16} or van Grunsven *et al*^{6} had to omit some studies from their meta-analyses is unclear. However, Abramson *et al*^{8} found a mixture of results reported as a continuous or binary outcome. Their solution was to perform two separate meta-analyses.

## COMPARING OR COMBINING ODDS RATIOS AND EFFECT SIZE

It is clearly desirable to be able to compare or combine results from studies using these different methods. A simple method was recently published^{17} which is illustrated in table 1. Results are taken from a study of the relation of BHR to sodium excretion.^{11} The report published both a censored multiple regression analysis, giving a regression coefficient of log_{10}(PD_{20}) on log_{10}(sodium), and a multiple logistic regression with PD_{20} dichotomised at the maximum dose of 8 μmol histamine, giving a logistic regression coefficient for the proportion “with BHR” on log_{10}(sodium). An estimated standard deviation of 0.788 was also reported from the first, allowing the calculation of effect size simply by dividing both the regression coefficient and standard error by this figure. The logistic regression coefficient is the natural logarithm of the odds ratio associated with a unit increase in log_{10}(sodium). This is converted to effect size by dividing by the factor 1.81 which was derived from the properties of the logistic distribution.^{17} This allows for the fact that the logistic transformation “stretches” the scale more than the normal equivalent deviate or probit transformation. For example, a proportion of 0.025, or 2.5%, is a normal equivalent deviate of –1.96 but is transformed to –3.66 (natural logarithm of (0.05 divided by 0.95)) on the logistic scale. The ratio of these transformed values is slightly greater than 1.81, which is the average over the whole scale, and good approximation over the range of proportions from 0.03 to 0.97.^{17} The conversion can thus be used provided the prevalence of BHR is not less than 3%.

The two estimates of effect size in table 1 differ in sign because a decrease in PD_{20} is an increase in BHR. Allowing for this, the two estimates are close—within a difference that might be expected from the standard errors and assumptions of the two methods. Two further points should be noted. It is more common now to report an odds ratio with 95% confidence interval than a logistic regression coefficient and standard error. This requires that the natural logarithm of each value is taken before dividing by 1.81 to give effect size and its 95% confidence interval. The second is that, when the estimate, as here, is a regression coefficient rather than a difference in means, the independent variable also needs to be on the same scale. A unit increase in log_{10} (sodium) represents a 10-fold increase; a more useful quantity might be the change in BHR associated with a 50% increase which can be derived by multiplying each estimate by log_{10} (1.5).

Table 2 shows effect sizes derived from results in a paper that reported BHR summarised as a slope measure, devised for use in the European Community Health Survey,^{2} and analysed by multiple regression (table 6 in reference), and equivalent results from logistic regression (table 7 in reference).^{18} The residual standard deviation from the multiple regression was not reported explicitly, but the total standard deviation was 2.10 and the variation explained 33.3%, so the residual standard deviation was calculated as √(0.667(2.10)^{2}). As stated, the sign of each effect size derived from the multiple regression was reversed so that it was compatible with the comparable value from the logistic regression. The effect sizes derived from odds ratios generally have wider confidence intervals and greater p values due to the loss of information on dichotomising BHR. Where the two effect sizes do appear to differ, which was notable for the association with specific IgE to Timothy grass and, to a lesser extent, with specific IgE to birch, this is due not to a problem with the conversion of odds ratio to effect size but to the fact that the multiple regression and logistic regression did in this case give different results, with the p value for the latter less than that for the former.

## DISCUSSION

The above method for conversion of a ln(odds ratio) to effect size shows that the two are essentially equivalent. However, an analysis of a continuous outcome is generally more powerful than one based on an arbitrary division of the scale into two groups. This is not an argument for a slope measure of BHR over PD_{20}, as the existence of a value for each subject does not automatically imply more information.^{10} Although the result implies that an odds ratio may be little affected by the alteration of cut off point, other statistics will change. Peat *et al* reported that BHR has high specificity for asthma and low sensitivity,^{1} but a greater maximum dose of provoking agent and cut off point would increase sensitivity and decrease specificity.

The method was illustrated using results from Burney *et al*^{11} and Chinn *et al*^{18} because, in each study, the data were analysed in the two ways; in the former the residual standard deviation from the linear regression was reported and in the latter it could be calculated from published results, enabling the calculation of effect size in both cases. More often results are reported without the standard deviation^{12} or only as an odds ratio.^{19} When a mixture of odds ratios and estimates based on continuous outcomes are to be compared or combined, it is essential to obtain an estimate of the residual standard deviation for the latter. Publication of residual standard deviations from every analysis of variance and multiple linear regression should therefore be encouraged. This also applies to any outcome in addition to BHR for which a completely standardised protocol does not exist.

A result expressed as effect size is not, of course, as useful clinically as one in doubling dose units. An alternative would be to assume an underlying standard deviation for studies reporting just an odds ratio and use this to convert the effect size to approximate doubling dose units. Again, only if authors report standard deviations will it be possible to know what standard deviation might be assumed.

It seems unlikely that a standardised protocol can be achieved for BHR, so there will be a continuing need to compare and combine results in the above manner. While any comparison may require this, clearly the need is greatest in conducting a systematic review and meta-analysis. Conducting two meta-analyses, one of odds ratios and the other of effect size, is unsatisfactory. Each will have reduced power to detect and explain heterogeneity between studies, they may give different answers, and when they do not the two confidence intervals will be wider than that for all studies combined. Curiously, Abramson *et al* realised in an earlier paper^{20} that conversion of odds ratios to effect sizes or vice versa was possible by probit transformation, but dismissed the former because “to express essentially categorical outcomes such as ... BHR as effect sizes would make the results too difficult to interpret”. The opposite is true as BHR is continuous, not categorical.

The above is not an argument for combining every outcome in a meta-analysis regardless of comparability. For example, specific and non-specific BHR should not be combined, and provocation with direct and indirect stimuli may measure different aspects of asthma.^{21} Care must be taken to include each study once only when more than one analysis or BHR outcome has been included, and the choice should not be made on the grounds of “greatest significance” which can lead to publication bias. However, where two analyses have been performed, that based on a continuous outcome—whether linear regression of slope or censored regression of PD_{20}—should be preferred. The direction of each estimate must be determined, with signs reversed as necessary. Protocol variation should be a factor that is considered in examining heterogeneity, whether in effect sizes or of unstandardised estimates.

The simple method outlined here should enable better comparison between studies employing different methods of summarising BHR and promote meta-analyses which seem to be uncommon in respiratory medicine in comparison with other fields.

## Acknowledgments

Susan Chinn is funded by the Higher Education Funding Council for England.

## REFERENCES

## Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.