Relearning an old lesson: stopping trials early

Najib M Rahman; Robert J O Davies

doi:10.1136/thx.2009.131219

Article Text

PDF

Editorial

Relearning an old lesson: stopping trials early

Free

Najib M Rahman,
Robert J O Davies

UKCRC Oxford Respiratory Trials Unit, University of Oxford and Oxford Centre for Respiratory Medicine, Oxford Radcliffe Hospital, Oxford, UK

Correspondence to Robert J O Davies, Oxford Radcliffe Hospital and Oxford University, Churchill Hospital, Headington, Oxford OX3 7LJ, UK; robert.davies{at}ndm.ox.ac.uk

https://doi.org/10.1136/thx.2009.131219

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Pleural disease

A well designed and delivered clinical trial is the main tool to define whether medical interventions ‘work’, and how well. As such, they are potent weapons in the armoury of medical progress—and like all potent weapons need to be used with care.

In this month's Thorax, Koegelenberg et al (see page 857) report the findings of a trial comparing the diagnostic accuracy of closed pleural biopsy (Abrams needle) and cutting needle pleural biopsy after thoracic ultrasound, for the diagnosis of pleural tuberculosis (TB).1 This question is clearly important given the global significance of TB, and the key role of pleural biopsy in the diagnosis and microbiological assessment of its pleural presentations. To date, there are no published studies assessing ultrasound-guided pleural biopsy for the diagnosis of TB-related pleural effusions. This study is a continuation of this group's research programme which has a track record of delivering valuable evidence in the diagnosis of pleural TB, not least their previous study showing that thoracoscopy is superior to closed pleural biopsy in this disease.2

Their studies are conducted in an area with a high prevalence of TB, with all the recruited subjects receiving the compared diagnostic tests, allowing the comparison of diagnostic results. In this new study, all patients underwent both Abrams biopsy and cutting needle biopsy, performed in random order. The design is simple, logical and efficient, with the diagnostic accuracy for TB assessed against accepted ‘gold standards’. Given the biology of pleural TB, which manifests as diffuse pleural involvement, it is reasonable to propose that the two biopsy techniques may be similar. As such, the results of the study should be clinically important, but on this occasion caution needs to be exercised in interpreting these results in view of a flaw in study delivery.

The study was stopped prematurely (after 89 (40%) of a planned 220 subject recruitment). The study was originally, appropriately, planned as a non-inferiority design requiring 220 patients to demonstrate ‘equivalence’ of the two techniques within a 10% threshold. It was halted when a statistically significant difference was identified at a preplanned interim analysis, with closed pleural biopsy appearing more sensitive than cutting needle biopsy (81.8% vs 65.2%, p=0.022). Unfortunately, this early cessation means it is difficult to interpret the result of the study and assess the possible benefit in favour of closed biopsy.

A clinical trial would normally only be stopped early in the face of ‘overwhelming evidence’ of a difference in outcome between the study groups (conventionally a p value of about <0.0001), a criterion generally known as the Peto–Haybittle rule.3 4 The stopping decision would normally be taken by an independent group who are not part of the core investigator team, to avoid bias. This is key to the ability of the trial to fulfil its primary functions of delivering a result that is highly likely to be true, and is useful in quantifying the magnitude of any benefits seen. It is worth revisiting why this is so.

There are several reasons why trials should not stop when a statistically significant difference is first seen. ‘Statistically significant’ differences commonly arise by chance early during trial recruitment, and disappear later. This is the reason for the demanding (p<0.0001) threshold in the Peto–Haybittle rule. In this study, the statistical signal was p=0.022, creating the possibility that the result is a statistical fluke and hence wrong. If the next two cases recruited to the study happened to favour cutting needle biopsy, the statistical significance would have disappeared (p=0.08).

Our Unit's trial of adjuvant intrapleural streptokinase in pleural infection5 is a real-life example of early and misleading statistical significance. During recruitment to this trial the independent Data Monitoring Committee first reviewed the data for safety assessment reasons after ∼40% of the recruitment. At this review there was a ‘statistically significant’ difference between the study groups in the frequency of death and surgery, with a p value of ∼0.01. This was not thought to constitute ‘overwhelming evidence’ of a treatment difference, was not communicated to the trial team and recruitment was allowed to proceed. By the next review this difference had disappeared and the eventual trial result was completely negative. With the benefit of hindsight, if the study had been stopped and published when this first significant p value was identified, we would have reported a result that was untrue.

Secondly, if a statistically significant difference in the trial outcome between study arms is used as a reason to stop, it is tempting to assess the data repeatedly in order to deliver the study result rapidly. Unfortunately, every ‘peek’ at the data increases the likelihood of a false-positive result. The conventional maximum p value for (arbitrary) statistical significance is <0.05, which means that the observed result has a chance of having arisen fortuitously of <1 in 20 (5%). However, if the results are assessed on 20 different occasions, a p value of 0.05 will occur by statistical chance alone on one occasion. The study investigators in this study did preplan their interim analysis, but they did not adjust their p values for this extra ‘peek’ at the data. A correction for one ‘peek’ would imply a p value of ∼0.025—barely achieved when the trial was stopped.

Finally, stopping trials earlier than planned reduces the precision of the estimate of the outcome being assessed. This is most commonly expressed using the 95% CI for the result. In this study, the observed difference in the diagnostic sensitivity between the two techniques is 16.7%; however, the CIs imply that it is somewhere between 1.9% and 31.5%. This means that there is anything from an enormous and vital advantage, through to a trivial and unimportant difference in using Abrams biopsy for the diagnosis of TB. If recruitment had been completed to the original target of 220 and the difference between the groups had remained unaltered, the result would have allowed us to be reasonably confident that the advantage to Abrams biopsy would lie between 10.9% and 27.3%—a more clinically useful estimate.

So how should the clinician respond to these results? Despite the above limitations, these are the first high quality randomised data in this area and the authors should be congratulated on conducting such a difficult and intensive trial. The study also provides interesting data on the high diagnostic sensitivity for pleural malignancy using an ultrasound-guided closed pleural biopsy strategy (100% for combined Abrams and Tru-cut)—this aspect now warrants further specific study. The results of this study suggest that there may be an advantage to Abrams biopsy over cutting needle biopsy for the diagnosis of TB—as this disease is highly prevalent in resource-poor areas, this would be an important finding with implications for practice. However, we cannot be confident that this is true or of the magnitude of any possible advantage. Accordingly, clinical practice should not change on the basis of these results. This study demonstrates that such a trial is deliverable, and that there is a significant possibility that a more simple diagnostic strategy may be advantageous, consistent with previous evidence on the relatively high yield of Abrams biopsies in the diagnosis of pleural TB. Given the lack of data in this field, a fully completed randomised study of similar design would be the appropriate next step to advance diagnostic strategies in this important disease, and this study provides accurate data on the sample size required for such a trial. If such a study again demonstrated high diagnostic yields for both TB and malignancy with an ultrasound-guided blind technique, future assessment of this strategy compared with thoracoscopy or CT-guided biopsy would be warranted.

References

↵
1. Koegelenberg CFN,
2. Bolliger CT,
3. Theron J,
4. et al
. A direct comparison of the diagnostic yield of ultrasound-assisted Abrams and Tru-cut needle biopsies for pleural tuberculosis. Thorax 2010;65:857–862.
OpenUrl Abstract/FREE Full Text
↵
1. Diacon AH,
2. Van de Wal BW,
3. Wyser C,
4. et al
. Diagnostic tools in tuberculous pleurisy: a direct comparative study. Eur Respir J 2003;22:589–91.
OpenUrl Abstract/FREE Full Text
↵
1. Haybittle J
. Repeated assessment of results in clinical trials of cancer treatment. Br J Radiol 1971;44:793–7.
OpenUrl Abstract/FREE Full Text
↵
1. Peto R,
2. Pike M,
3. Armitage P
. Design and analysis of randomized clinical trials requiring prolonged observation of each patient: introduction. Br J Cancer 1976;34:585–612.
OpenUrl CrossRef PubMed Web of Science
↵
1. Maskell NA,
2. Davies CW,
3. Nunn AJ,
4. et al
. U.K. Controlled trial of intrapleural streptokinase for pleural infection. N Engl J Med 2005;352:865–74.
OpenUrl CrossRef PubMed Web of Science

View Abstract

Footnotes

Linked articles 125146.
Competing interests None.
Provenance and peer review Commissioned; not externally peer reviewed.

Linked Articles

Tuberculosis
Direct comparison of the diagnostic yield of ultrasound-assisted Abrams and Tru-Cut needle biopsies for pleural tuberculosis

Coenraad Frederik N Koegelenberg Christoph Thomas Bolliger Johan Theron Gerhard Walzl Colleen Anne Wright Mercia Louw Andreas Henri Diacon
Thorax 2009; 65 857-862 Published Online First: 08 Dec 2009. doi: 10.1136/thx.2009.125146
Airwaves
Airwaves

Andrew Bush Ian Pavord
Thorax 2010; 65 i-i Published Online First: 22 Sep 2010. doi: 10.1136/thx.2010.150425

[1] ↵
Koegelenberg CFN,
Bolliger CT,
Theron J,
et al
. A direct comparison of the diagnostic yield of ultrasound-assisted Abrams and Tru-cut needle biopsies for pleural tuberculosis. Thorax 2010;65:857–862.
OpenUrl Abstract/FREE Full Text

[2] Koegelenberg CFN,

[3] Bolliger CT,

[4] Theron J,

[5] et al

[6] ↵
Diacon AH,
Van de Wal BW,
Wyser C,
et al
. Diagnostic tools in tuberculous pleurisy: a direct comparative study. Eur Respir J 2003;22:589–91.
OpenUrl Abstract/FREE Full Text

[7] Diacon AH,

[8] Van de Wal BW,

[9] Wyser C,

[10] et al

[11] ↵
Haybittle J
. Repeated assessment of results in clinical trials of cancer treatment. Br J Radiol 1971;44:793–7.
OpenUrl Abstract/FREE Full Text

[12] Haybittle J

[13] ↵
Peto R,
Pike M,
Armitage P
. Design and analysis of randomized clinical trials requiring prolonged observation of each patient: introduction. Br J Cancer 1976;34:585–612.
OpenUrl CrossRef PubMed Web of Science

[14] Peto R,

[15] Pike M,

[16] Armitage P

[17] ↵
Maskell NA,
Davies CW,
Nunn AJ,
et al
. U.K. Controlled trial of intrapleural streptokinase for pleural infection. N Engl J Med 2005;352:865–74.
OpenUrl CrossRef PubMed Web of Science

[18] Maskell NA,

[19] Davies CW,

[20] Nunn AJ,

[21] et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Statistics from Altmetric.com

Request Permissions

References

Footnotes

Linked Articles

Read the full text or download the PDF:

Log in using your username and password