Article Text


Validity of spirometric testing in a general practice population of patients with chronic obstructive pulmonary disease (COPD)
  1. T R Schermer,
  2. J E Jacobs,
  3. N H Chavannes,
  4. J Hartman,
  5. H T Folgering,
  6. B J Bottema,
  7. C van Weel
  1. Department of General Practice, University Medical Centre Nijmegen; Centre for Quality of Care Research, University Medical Centre Nijmegen and University of Maastricht; Department of General Practice, University of Maastricht; Department of Pulmonology Dekkerswald, University Medical Centre Nijmegen, The Netherlands
  1. Correspondence to:
    T R Schermer
    229-HSV, PO Box 9101, 6500 HB Nijmegen, The Netherlands;


Objective: To investigate the validity of spirometric tests performed in general practice.

Method: A repeated within subject comparison of spirometric tests with a “gold standard” (spirometric tests performed in a pulmonary function laboratory) was performed in 388 subjects with chronic obstructive pulmonary disease (COPD) from 61 general practices and four laboratories. General practitioners and practice assistants undertook a spirometry training programme. Within subject differences in forced expiratory volume in 1 second and forced vital capacity (ΔFEV1 and ΔFVC) between laboratory and general practice tests were measured (practice minus laboratory value). The proportion of tests with FEV1 reproducibility <5% or <200 ml served as a quality marker.

Results: Mean ΔFEV1 was 0.069 l (95% CI 0.054 to 0.084) and ΔFVC 0.081 l (95% CI 0.053 to 0.109) in the first year evaluation, indicating consistently higher values for general practice measurements. Second year results were similar. Laboratory and general practice FEV1 values differed by up to 0.5 l, FVC values by up to 1.0 l. The proportion of non-reproducible tests was 16% for laboratory tests and 18% for general practice tests (p=0.302) in the first year, and 18% for both in the second year evaluation (p=1.000).

Conclusions: Relevant spirometric indices measured by trained general practice staff were marginally but statistically significantly higher than those measured in pulmonary function laboratories. Because of the limited agreement between laboratory and general practice values, use of these measurements interchangeably should probably be avoided. With sufficient training of practice staff the current practice of performing spirometric tests in the primary care setting seems justifiable.

  • chronic obstructive pulmonary disease
  • spirometric testing
  • general practice

Statistics from

In recent years the use of spirometric tests has rapidly increased in primary health care. Practice guidelines assign a central role to spirometry in the management of patients with chronic obstructive pulmonary disease (COPD).1,2 As most of these patients are detected and treated in primary care, these guidelines are particularly relevant for general practice.3,4 There is some evidence that application of spirometric testing in general practice may reduce the number of undetected cases with chronic respiratory morbidity5 as well as diagnostic misclassification,6–8 which may lead to overall improved respiratory health.7

The validity (or “reliability”) of spirometric tests is a prerequisite for their use as an instrument for diagnosis, monitoring, and management of respiratory disease.9 Despite their widespread use, little is known about the validity of spirometric tests in the primary care setting. It has been reported that at least one third of tests performed in general practice do not meet the quality criteria which apply to pulmonary function laboratories.10 Training of practitioners and nurses seems to enhance the quality of testing only temporarily.10 Four studies have shown that spirometric indices obtained in general practices may be considerably lower than those obtained in laboratories, suggesting insufficient test validity in general practice.11–14 However, none of these reports has been peer reviewed and apparent methodological shortcomings justify further studies of this topic. The main objective of the current study was to assess the extent to which the results of spirometric tests performed in general practice correspond with the results of the same tests performed in a certified pulmonary function laboratory.


Study design and participants

The study was a repeated cross sectional within subject comparison of spirometric testing in pulmonary function laboratories and general practices. Four pulmonary function laboratories (two in universities, two in general hospitals) and 61 general practices comprising 149 general practitioners (GPs) and 185 practice assistants were involved. (In Dutch general practice the practice assistant is a paramedical professional who has been trained for administrative and patient care related activities.) A priori, we considered the laboratory spirometric tests as “gold standard”15 measurements.

GPs selected subjects who met the following inclusion criteria: age 30–75 years; current or ex-smoker; diagnosis of COPD as assigned by a GP; meeting the clinical definition of COPD (“increased cough, sputum and dyspnoea on most days for a minimum of 3 months a year for at least the previous 2 years”);16 post-bronchodilator forced expiratory volume in 1 second (FEV1) 40–90% of the predicted value and/or post-bronchodilator FEV1/FVC (forced vital capacity) <88% of the predicted value for men and <89% for women. Subjects with severe co-morbidity and/or a history of asthma, allergic rhinitis, or atopic rash were excluded.

The study was approved by the medical ethics committee of the University Medical Centre Nijmegen and all subjects gave written informed consent.

Spirometry training programme

A spirometry training programme for GPs and practice assistants was developed and pretested before the study. Training consisted of two 2.5 hour sessions separated by an interval of 1 month. The content of the training sessions is available online on the Thorax website ( The training programme specifically focused on elements that need improvement in general practice spirometric tests.10,17

Spirometric testing

Data collection took place from December 1998 to January 2001. General practices and laboratories were all equipped with the same electronic spirometer (Microloop II; Micro Medical Ltd, Rochester, UK) and spirometry software (Spirare; Diagnostica Ltd, Oslo, Norway). Durability of the Microloop turbine flow sensor has proved to be acceptable.18 The spirometry software displays real time flow-volume curves, patient instructions, and a time indicator to monitor duration of expiratory and inspiratory flow but does not contain “built in” quality assurance prompts.4

In each study subject a pair of spirometric tests was performed. The first test always took place in one of the laboratories, the second in the subject’s general practice. Subjects with an interval of >30 days between the two tests were excluded from the analysis. In case of a recent exacerbation the measurement schedule was postponed until at least 6 weeks after clinical recovery. The test sequence was repeated one year later in the same subjects.

During laboratory and general practice visits subjects performed a full (pre-bronchodilator and post-bronchodilator) spirometric test. Subjects were instructed to abstain from short acting bronchodilators for 8 hours and long acting bronchodilators for 12 hours before testing. Post-bronchodilator tests were performed 15 minutes after administration of 400 μg aerosolised salbutamol by spacer. For each test at least three acceptable forced expiratory manoeuvres were required.9 The spirometric indices (including FEV1 and FVC) of the manoeuvre with the highest sum of FEV1+FVC were stored and used for analysis. Spirometers were checked for errors in readings by a research nurse every 3 months using a 3 litre syringe and “biological control”—that is, a manoeuvre performed by the research nurse herself. In cases with a deviation of ⩾3% in the volume reading or a divergent outcome of the biological control manoeuvre the spirometer was replaced.

Outcomes and statistical analyses

The primary outcomes were the within subject differences between laboratory and general practice spirometric tests in terms of FEV1 and FVC (ΔFEV1 and ΔFVC, respectively). Crude mean ΔFEV1 and ΔFVC were calculated by subtracting a subject’s laboratory value from the general practice value. Mean values for the primary outcomes with 95% confidence intervals (95% CIs) were calculated and difference versus mean plots and accompanying limits of agreement produced to express the variability between laboratory and general practice measurements.19 5% trimmed means (arithmetic mean without the largest 5% and the smallest 5% of observations) were also calculated to rule out the impact of outliers. Adjusted mean estimates were calculated to control for potential bias in the primary outcomes due to differential timing of laboratory and general practice tests. This was done by defining three subgroups based on the circadian (“diurnal”) variation of lung function20: (1) a potential advantage of ⩾50 ml due to time of measurement favouring the laboratory test; (2) a potential advantage ⩾50 ml favouring the general practice test; and (3) no potential advantage for either test. (The 50 ml cut off reflects approximately half the maximum variation of FEV1 throughout the day.) The potential advantage in ml for groups 1 and 2 compared with group 3 was estimated using a one way analysis of variance (ANOVA) model. In groups 1 and 2 the actual measured values of FEV1 and FVC were corrected for the estimated values from the ANOVA model. We consider the crude estimates to be the main results.

Differences in the primary outcomes between laboratories were analysed using ANOVA, and associations between primary outcomes and the number of days elapsed between laboratory and general practice spirometric tests were analysed using Pearson correlation.

The proportion of tests with a reproducibility of <5% and <200 ml (test variance) between the two highest FEV1 values from the three accepted forced manoeuvres was considered as a marker of the quality of the spirometric tests.9,10 Differences in the proportion of non-reproducible tests in laboratories and general practices were analysed using McNemar’s test. The Statistical Analysis System (SAS, Version 6.12 for UNIX) was used for analysis.


Characteristics of general practices

Of the 61 general practices involved, 21 (34%) were single handed practices, 35 (58%) were two handed or group practices, and five (8%) were multidisciplinary health care centres. Forty practices (65%) already possessed a spirometer before the study was initiated. Descriptive characteristics of the general practices are shown in table 1. Attendance rates in the spirometric training programme were 57% for GPs and 78% for practice assistants. In two practices GPs performed the spirometric tests, while in the remaining 59 practices the practice assistants undertook the testing.

Table 1

Baseline characteristics of general practices and study subjects

Study subjects and primary outcomes

Matched pairs of laboratory and general practice spirometric tests were available for 388 subjects in the first year and 332 subjects in the second year evaluation (table 1). The mean (SD) number of days between laboratory and general practice tests was 7.2 (7.8) for the first year evaluation and 11.2 (8.1) for the second year evaluation. There was no significant correlation between the number of days between measurements and the primary outcomes (ΔFEV1: r=0.11; ΔFVC: r=0.13). In 24% of the spirometric test pairs the laboratory test was favoured by the circadian variation, in 21% of the tests the general practice test was favoured, and in 55% neither test was favoured.

Adjusted estimates of the primary outcomes were consistently (but only marginally) higher than crude estimates (table 2). First year and second year mean ΔFEV1 and ΔFVC values were all higher for the general practice measurements. These findings were consistent for each of the laboratories involved (table 3). The scatter of the ΔFEV1 and ΔFVC values did not vary in a systematic way over the range of measurements (fig 1 and 2). The interval between the limits of agreements was wide in both study years for ΔFEV1 as well as for ΔFVC, which indicates considerable discrepancies between the two measurements.

Table 2

Mean (95% CI) and trimmed mean for crude and adjusted estimates of the primary outcomes

Table 3

Mean (SD) crude estimates of the primary outcomes by pulmonary function laboratory

Figure 1

Difference in forced expiratory volume in 1 second (FEV1) against mean plots for (A) the first year (n=730) and (B) the second year (n=656) evaluation. Pre-bronchodilator and post-bronchodilator values are pooled. The dashed horizontal lines indicate the limits of agreement.19

Figure 2

Difference in forced vital capacity (FVC) against mean plots for (A) the first year (n=730) and (B) the second year (n=656) evaluation. Pre-bronchodilator and post-bronchodilator values are pooled. The dashed horizontal lines indicate the limits of agreement.19

Quality of spirometric test performance

Because of occasional imperfections in the data transfer between the spirometer and spirometric software, information on the number of forced manoeuvres performed and FEV1 reproducibility was missing for 12 (<1%) laboratory and 89 (3%) general practice tests. Within the set of tests with complete information there were no tests with fewer than two forced manoeuvres in either the laboratories or practices. Table 4 shows that the proportion of non-reproducible tests—that is, FEV1 reproducibility ⩾5% or >200 ml—in the first year evaluation was 16% for the laboratories and 18% for the general practices (p=0.302). The corresponding figures for the second year were 18% and 18%, respectively (p=1.000). The proportion of non-reproducible tests in the general practices ranged from 4% in the best to 35% in the worst performing practice for the pooled first and second year data. For the four pulmonary function laboratories the corresponding range was 13–20%.

Table 4

Differences between general practices and pulmonary function laboratories in reproducibility in FEV1 in spirometric tests for the first year data (n=693)*


The results of the current study indicate that, on average, the validity and quality of spirometric tests in Dutch general practices is satisfactory in comparison with the “gold standard” procedure, a spirometric test performed in a pulmonary function laboratory. We observed mean differences in the primary outcomes consistently in favour of general practice spirometric testing in the first as well as the second year evaluation. The overall proportion of non-reproducible spirometric tests was similar for laboratories and general practices. However, the agreement between laboratory and general practice measurements seems limited. This means that using laboratory and general practice measurements interchangeably should probably be avoided in practice.

Strengths and limitations of the study

We aimed to compare, as strictly as possible, the spirometric performance of general practice and laboratory staffs. Performance depends on a number of factors related to the executor of the test—quality of subject instruction, intensity of coaching during forced manoeuvres, critical assessment of acceptability of separate manoeuvres, and test reproducibility.21 As we wished to minimise any potential bias in the comparison, we chose to equip practices and laboratories with the same type of spirometer and to check spirometer readings at the same 3 monthly intervals at both locations. Although portable turbine spirometers like the one used in our study cannot easily be calibrated on the spot and are not commonly used in laboratories, we believe that ruling out the “equipment factor” makes the comparison fairer, as turbine spirometers may produce FEV1 and FVC values which diverge from the advanced equipment normally used in laboratories.22

From a methodological point of view, randomisation of the order in which laboratory and general practice tests took place would have been the preferred approach. However, because most of our study subjects (67%) were participating in an ongoing randomised controlled clinical trial,23 the order of the tests was dictated by the trial protocol. We cannot therefore rule out the possibility of a systematic “one sided” bias in favour of either general practice or laboratory spirometric testing due to natural variability in lung function. Sources of short term intra-individual variability such as airway reactivity21 and diurnal variation in lung function20 may have influenced our findings. Although we used a rather approximate method to adjust for the latter variable, this factor did not seem to bias the results significantly. We consider it implausible that other intra-individual factors may have systematically put the laboratory tests at a disadvantage.

Although we cannot rule out a possible “learning effect” in study subjects due to repetition of spirometric testing within a short time span,24 we believe that the order of tests alone cannot fully explain our results. Three arguments support this view: (1) Most subjects had been diagnosed as having COPD several years earlier, which makes it quite likely that most of them already had a “history” of spirometric testing before entering the study, especially since most practices had been using spirometric tests for some time. (2) All subjects performed a full spirometric test in their general practice several weeks before the first visit to the laboratory to assess study eligibility. In other words, they could not be entirely “naive” with regard to spirometric testing before the tests for the actual evaluation study were performed. (3) The differences in favour of general practice spirometric testing persisted after a year of regular monitoring of lung function. Individual learning curves in study subjects should have levelled off by that time.24 Another explanation for the observed higher general practice values may be the performance level of the laboratory technicians. It has previously been recognised that significant variation may exist between laboratories.25 This could mean that, at least in some cases, we actually used a “gilded standard” instead of a pure “gold standard”.

In spirometry a widely used criterion for unacceptable performance is fewer than two acceptable manoeuvres.9 Unfortunately we were not able to perform a full evaluation of all acceptability markers—that is, adequacy of the start of the forced expiration, duration of the expiration, abrupt ending or cough during the manoeuvre9—because the Spirare spirometry software used only stores one “best” manoeuvre. A study performed in primary care practices in New Zealand looked into quality assurance data and found that only a third of spirometric tests performed by trained practitioners and nurses fulfilled the minimum quality criterion of two or more acceptable manoeuvres.10 Although in the current study we could not evaluate all separate manoeuvres, we have previously observed that manoeuvres performed by a sample of trained practice assistants are mostly judged to be acceptable by experienced lung function technicians.26 Apart from the relatively small proportion of tests with missing data, we can be sure that at least two manoeuvres were obtained in all other spirometric tests. Reproducibility of FEV1 cannot be calculated on the basis of a single FEV1 value.

Comparison with previous studies

Our findings contradict previous reports on the validity of general practice spirometric testing.11–14 These studies consistently reported lower mean FEV1 and FVC values for general practice spirometric tests with differences of 70–280 ml for FEV111,14 and 360 ml for FVC.11 The presence or absence of factors responsible for short term intra-individual lung function variability, as discussed above, may explain the discrepancy between the studies, as may the diverging study populations involved (asthmatics,11 subjects with respiratory symptoms,13 adult patients with limited airflow,14 those with severe COPD,12 and a heterogeneous group of patients with COPD in our study).

There are several reasons why we believe that our study reflects the actual validity of general practice spirometric testing. Firstly, our training programme was probably more elaborate than those in other studies because we specifically emphasised elements of test performance which are now known often to be insufficient in general practice.10,17 This tailored programme may have prepared practice assistants and GPs better for their task. Also, as far as can be extracted from the published reports, other studies did not use spirometers which display flow-volume curves. We have previously reported that real time feedback of information from flow-volume curves may lead to improved performance in spirometric testing.26 A final alternative explanation may be that in our study, unlike in some of the earlier studies,11,14 most of the spirometric tests were performed by practice assistants instead of practitioners. As practice assistants will generally have more time available, they might take more time to attain a satisfactory test result. In our view, similar results could be achieved in other countries or healthcare settings as long as training of the professionals who perform the spirometric tests is of sufficient quality and intensity.


We conclude that spirometric indices relevant for the management of COPD obtained in trained general practices were marginally but statistically significant higher than those measured in certified pulmonary function laboratories. The quality of spirometric tests in laboratories and general practices in terms of test reproducibility seemed equivalent. However, as the agreement between spirometric tests performed in the laboratory and in general practice was limited, using these measurements interchangeably should probably be avoided in practice. The results of this study seem to support the already widespread practice of performing spirometric tests in primary care settings. Further encouragement of primary care physicians to implement spirometric tests therefore seems justifiable, providing the training of practice staff is sufficient.


View Abstract

  • .

    Web-only Appendix
    Contents of the spirometry training programme for general practitioners and practice assistants

    The Training Programme is available as a downloadable PDF (printer friendly file).

    If you do not have Adobe Reader installed on your computer,
    you can download this free-of-charge, please Click here


    Files in this Data Supplement:

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Airwaves
    Wisia Wedzicha