Statistics from Altmetric.com
“Don’t worry about the physiology test—if everybody fails, everybody passes”, Anonymous, Jefferson Medical College, Fall 1977.
The first few months of medical school caused anxiety in most students, who typically went from environments where they were well above the “norm” in their classes to one in which they were just average. At the same time, the volume and difficulty of the material one was expected to master increased dramatically, compared with undergraduate studies. The one hope students could hold onto was that tests were graded “on the curve” and as long as you did better than the bottom 5% of the class (the lower limit of normal (LLN)) you would pass. This is the origin of the phrase “if everyone fails, everyone passes”. At the time, unbeknownst to the students, there was a push in our medical school to move towards a more standardised minimal passing grade of, somewhat ironic to this discussion, 70%, that would allow better comparison between classes and schools. The argument here was that if, in fact, everyone failed (ie, scored less than the minimum passing grade), even if that was “normal”, it was not good.
The conflict between what is statistically “normal” and statistically “abnormal”, and how these are defined, is central to a current controversy in the world of respiratory medicine. On one side of this debate is the idea that “normal” people lose lung function as they age, and because of the “normal” loss of elastic tissue in the lung, the forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC) will also decrease with aging. Defenders of this position state that the definition of “abnormal” needs to vary by population and age, and that using a fixed FEV1/FVC ratio ends up “overdiagnosing” people as “abnormal” who are actually “normal”. On the other side of this debate is the argument that an easy and simple way of classifying patients that will be accurate in the majority of people we see is useful. Defenders of this position believe that using a fixed ratio of 70% is easy to remember, easy to teach to medical students and residents, and works most of the time. This also serves to “demystify” spirometric interpretation (ie, if the ratio is low, the spirometry is in the “obstructive” family, whereas if the ratio is not low the spirometry is in the “restrictive” or normal family). The differences between these two approaches is reflected in different guidelines. For example, the American Thoracic Society/European Respiratory Society (ATS/ERS) guidelines for the interpretation of spirometry recommend using an LLN approach to classify chronic obstructive pulmonary disease (COPD)1 whereas the ATS/ERS guidelines for COPD recommend using the fixed FEV1/FVC ratio of 70% to classify a person as “obstructed”.2
What are the downsides to these respective approaches? Using the LLN approach is very dependent on the choice of prediction equations used and keeps spirometry interpretation in a “black box”, which is to say we need the computerised interpretation to tell us whether the tracing is “normal” or “abnormal (below the LLN)”, typically with some type of colour signal or flag. The downside on the fixed ratio is the risk of “underdiagnosing” obstruction in younger populations and “overdiagnosing” obstruction in older populations. While this could potentially lead to “overtreatment” there is no evidence that this actually occurs.
Should we trust “mathematical norms” and models to define the presence of disease? Is it possible to eliminate respiratory disease completely by expanding our definition of what is normal? Could we set the LLN at 1% rather than 5%? If everyone fails, does everyone pass?
Two papers in this issue of Thorax examine this problem.3 4 The first compares the LLN to the fixed ratio in a young population, where one would expect the fixed ratio to “underdiagnose” COPD compared with the LLN (see page 1040). The second examines prediction equations across multiple populations (see page 1046). The first paper concludes that it is important to use “statistically derived spirometric criteria to identify airflow obstruction”3 and the second that “airway obstruction should be defined by the FEV1/FVC and FEV1 being below the LLN derived from appropriate reference equations”.4
In the paper by Cerveri and colleagues,3 using the LLN in this relatively young population identified a subgroup of people at a higher risk of adverse outcomes during follow-up. If one looks across their classification strata in table 1 “normal, below the LLN and below the LLN and ratio of 70%”, the proportion with asthma increases from 14% to 27% to 54%. The point here is that if asthma is a marker for “obstruction”, which it appears to be, there are some subjects in the “normal” group who have this marker, and there are some in the most “abnormal” group who do not have it. I would also guess that a more “sensitive” indicator of obstruction, such as an FEV1/FVC less than 75%, compared with an FEV1/FVC less than 70%, would have similar results. Similarly, setting the LLN at different percentiles (perhaps the first percentile or the fifth percentile rather than the 2.5th percentile) would also result in “definitions of obstruction” with varying levels of sensitivity and specificity.
The paper by Swanney and colleagues4 looked at data from 57 different populations to determine when the FEV1/FVC fell below 70%. While mean age for this was 42 years among men and 48 years among women, it varied in individual studies, from less than 18 years to more than 80 years (see fig 1 in their paper). This in many ways reflects some of the problems in using “statistically derived” criteria to determine abnormality—depending on the reference population used, people can be either “normal” or “abnormal”. Furthermore, the authors suggest that different populations will need different reference equations.
Why do we classify normal and abnormal and the presence and absence of disease? To both understand the natural history of disease progression and to provide interventions for our patients. The FEV1/FVC undeniably declines with age.5 The prevalence of COPD also undeniably increases with age,6 as does the incidence of hypertension, diabetes, macular degeneration, Alzheimer’s disease, most malignancies and death.7 8 Classification of disease is useful both epidemiologically and clinically. For example, the link between lung disease, measures of inflammation and cardiac disease is important epidemiologically and may provide clinical guidance for our patients.
The disparate views that surround the definition of COPD are, at the end of the day, less important than one might think. People with moderate, severe and very severe disease by GOLD criteria would almost all be similarly categorised using the LLN or other criteria. The differences that we would find relate to the mild category. In the BOLD study, GOLD stage 1 was not included in the overall estimates,9 although others have shown that people in this category have increased morbidity and mortality.10 11 While mild disease may be more “treatable” it may also be part of the spectrum of “normal”. It may also be true that early evidence of disease may be more important as an indicator of non-respiratory disease, such as cardiovascular disease. Furthermore, in mild to moderate disease the recommended interventions are based on treating symptoms, whereas in severe to very severe disease they are based on both treating symptoms and preventing exacerbations.
To answer the question posed in the title, I do not believe that the use of statistics and mathematical “norms” is the best way to diagnose and classify disease. If everybody fails, nobody passes (but the tests and the teaching need to be critically evaluated). I continue to believe that a disease classification scheme that is easy to remember (such as the fixed FEV1/FVC ratio) and to teach others remains useful. I also strongly believe that interventions need to be based on factors other than lung function, particularly in mild to moderate disease. I also support continuing to evaluate this problem by focusing on outcomes and not simply mathematical distributions of data in populations.
Competing interests: DMM has received research grants from GlaxoSmithKline, Pfizer and Novartis, and serves as a consultant to GlaxoSmithKline, Pfizer, Boehringer-Ingelheim, Astra-Zeneca, Dey, Sepracor and Novartis.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.