Introduction

Ventilator-associated pneumonia (VAP) is the most frequently occurring nosocomial infection in Intensive Care Unit (ICU) patients and has been associated with increased morbidity, prolonged duration of ventilation and ICU-stay, and increased costs for health care [1]. Diagnosing VAP is difficult and usually based on criteria with high sensitivity and low specificity [2] such as fever, leukocytosis, and infiltrative abnormalities on chest radiographs. Improvement of the diagnostic armamentarium for VAP has become an important research topic in intensive care research, and two different approaches have emerged [3, 4]; our capacity to establish a microbiological diagnosis, or improving our clinical judgment based on easily obtainable parameters.

The first approach includes invasive diagnostic techniques with quantitative microbiological analysis of samples obtained by bronchoalveolar lavage (BAL) or protected specimen brush (PSB). The second approach, also called the non-invasive strategy, relies on clinical criteria without microbiological confirmation by bronchoscopically obtained samples. When compared to a non-invasive approach, an invasive strategy based on direct examination of bronchoscopically samples for the presence of intracellular bacteria and quantitative cultures was associated with better survival and less antibiotic use [5]. However, bronchoscopy includes a certain, though probably small, risk for complications and processing specimens is expensive and labor intensive.

A reliable algorithm for clinical judgment based on easily available parameters would overcome these potential drawbacks. For example, the Clinical Pulmonary Infection Score (CPIS) described by Pugin and coworkers [6] consists of six easily obtainable clinical and laboratory variables. The CPIS was validated retrospectively in a population of 28 patients, with and without clinical suspicion of VAP, undergoing 40 BAL procedures. When compared to quantitative cultures of samples obtained by BAL, a CPIS score >6 had an excellent predictive value for diagnosing VAP. Since then, the CPIS has become a widely used diagnostic tool for VAP, both in clinical practice and in clinical trials [2, 6, 7, 8, 9, 10, 11, 12]. Optimally, such a widely used diagnostic algorithm should have a high diagnostic accuracy and low inter-observer variability, but such data are scarce or even completely absent. The aims of the present study, consisting of two substudies, were to compare CPIS scores with quantitative cultures of BAL fluid in patients with a clinical suspicion of VAP, using quantitative cultures of BAL as gold standard for VAP (study 1), and to determine inter-observer variability of this scoring system (study 2).

Methods

Setting

Study 1 was performed in three ICUs of two university hospitals in the Netherlands; the medical ICU (ten beds) and neurosurgical ICU (seven beds) of the University Medical Centre Utrecht (UMCU) and a mixed ICU of the University Hospital Maastricht (UHM) (15 beds). Data were collected from 1 June 1999 through 1 March 2001 in Utrecht and from 1 January 1997 through 31 December 1999 in Maastricht. Study 2, measuring the inter-observer variability, was only performed in Utrecht.

Data collection/study population

Study 1

All adult (age ≥18 years) patients mechanically ventilated for at least 48 h with a suspicion of VAP, according to the diagnostic criteria for VAP (radiographic appearance of a new or progressive pulmonary infiltrate, and two of the three following: (1) fever, (2) leukokytosis, and (3) purulent tracheobronchial secretions [13]) and in which bronchoscopy could be performed on day of VAP suspicion, were eligible for the study. In case of concomitant antibiotic use, only patients who received antibiotics for >48 h and in whom the clinical suspicion of VAP had developed during antimicrobial treatment were included. The following data were routinely and prospectively recorded in computerized patient charts: body temperature, blood leukocyte count and number of band forms, character of tracheal secretions [purulent (yellow-green colour, with more than 25 leukocytes/ high power field) or not], microscopic examination (Gram stain) of tracheal secretions, semi-quantitative culture of bronchial secretions, ratio of arterial oxygen tension and inspiratory fraction of oxygen (PaO2/FiO2), interpretations of chest X-ray, and the use of antibiotics. When the variable was measured more than once in the 24 h, the value rendering the highest score (e.g., the most abnormal value) in 24 h prior to time of bronchoscopy was used. In addition, gender, age, and number of days on the ventilator prior to bronchoscopy were recorded.

Bronchoalveolar lavage and microbiology

BAL was performed by wedging the tip of the bronchoscope in the bronchus where the infiltrate was suspected, and infusing four 50 ml (UHM) or five 20 ml (UMCU) aliquots of sterile saline [14]. The first aliquot was discarded, and subsequent aliquots were pooled. In the UMH, quantitative cultures of BAL-fluid were performed by plating 2 μl and 10 μl on different culture media and in the UMCU by preparing 100-fold serial dilutions of BAL-fluid. Isolated bacteria were identified by standard laboratory methods and susceptibility to antimicrobial agents were determined according to National Committee for Clinical Laboratory Standards (NCCLS) methods[15]. Results were expressed in colony forming units per ml (cfu/ml).

Quantification of culture results was expressed as the log count and the bacterial index (BI). The log count was calculated by converting the sum of absolute counts of bacterial concentration(s) to the logarithmic number. The BI was calculated by summing the logarithmic numbers of individual log counts of bacterial concentrations. For example, a sample growing two bacterial species in concentrations of 1,000 cfu/ml each has a log count of 3.30 and a BI of 6.

Study 2

For measurement of inter-observer variability, two different intensivists calculated CPIS scores for 52 patients in the UMC Utrecht. In 42 patients, CPIS scores were calculated prospectively over a period of 3 months. On three randomly chosen days, CPIS of all patients in the ICUs were determined. A clinical suspicion of VAP was not required for these patients. The remaining ten patients were selected on the basis of a clinical suspicion of VAP in whom BAL had been performed (these patients were also included in study 1). In these patients, CPIS was calculated retrospectively by two intensivists. All intensivists had a list with CPIS definitions according to Pugin’s definition (Table 1) when calculating scores.

Table 1. Clinical Pulmonary Infection Score used for the diagnosis of ventilator-associated pneumonia [6]

Because of the disappointing results of study 2, the investigation was repeated with an adjusted flow-chart (Table 2). For this analysis CPIS were determined prospectively in 46 patients.

Table 2. Adjusted and precise defined variables used in the second part of the inter-observer variability study. (ARDS adult respiratory distress syndrome)

Definitions

Clinical pulmonary infection score (CPIS)

The CPIS as described by Pugin et al. included six variables: (1) body temperature, (2) blood leukocyte count and number of band forms, (3) character of tracheal secretions (purulent or not) and quantity of tracheal aspirates, (4) microscopic examination (Gram stain) and semi- quantitative culture results of the bronchial secretions, (5) arterial oxygen tension/inspiratory fraction of oxygen (PaO2/FiO2), and (6) chest X-ray [6] (Table 2). In the original study, quantities of tracheal aspirates were measured by trained ICU-nurses who scored quantities of secretions from 0 to 4+. The total volume of secretions per day was calculated by adding all the + values. Since the nurses in our ICUs were not specifically trained to score quantities of tracheal aspirates and amounts were not uniformly registered, this variable was not included in study 1. Therefore, the possible scores for that part of our study ranged from 0 to 10.

In the analysis of the inter-observer variability (study 2) the variable quantities of tracheal aspirates were, in fact, included. The estimated quantities of tracheal aspirates noted by nurses were interpreted by the intensivists and translated in 0–4+ as described by Pugin in the first part of this study.

Statistical analysis

Data were expressed as absolute numbers with percentages and as means or medians with standard deviation or ranges. Comparisons were performed by chi-square or t-test, when appropriate. A probability value less than 0.05 was considered statistically significant.

Using the results of quantitative cultures of BAL (BI ≥5 and log count ≥4) as “gold standard”, Receiver Operator Characteristic (ROC) curves correlating true and false positive rates (sensitivity and 1-specificity, respectively) for the different “CPIS” thresholds were constructed. Univariate and multivariate logistic regression analysis were used to determine the association between the individual scoring variables and VAP (defined as log count ≥4). The variables were dichotomized into ‘normal’ (CPIS variable 0) and ‘abnormal’ (CPIS variable 1 and 2). The inter-observer variability was analyzed for the individual variables of the CPIS as well as for total scores (using a cut-off point of >6) by calculating the kappa coefficient, a dimension of clinical agreement of two observers [16].

Results

Study 1

Patient characteristics

A total of 99 patients (68.7% male) were included (89 in Maastricht and ten in Utrecht), with a mean age of 62±15 years (range 28–87 years) (Table 3). The mean duration of mechanical ventilation at the time of bronchoscopy was 12.4±11.3 days (range 3–65 days). Based on quantitative cultures of BAL, the clinical suspicion of VAP was microbiologically confirmed in 69 patients (69.7%). Thirty-three (33.3%) patients received antibiotics at the time of bronchoscopy, all during >48 h and in all patients a new clinical suspicion of VAP developed while receiving antibiotics. The clinical suspicion was microbiologically confirmed in 26 of these 33 patients. Thirty-six (36.4%) patients died in the ICU.

Table 3. Patient characteristics (n =99)

Comparison of CPIS and quantitative cultures of BAL

Because the variable “quantities of secretions” was not available, the highest possible score of CPIS was 10. CPIS scores varied from 3 to 10 (median 7; mean 6.9±1.4). BI values of BAL samples varied from 0 to 22.4 (mean 6.4±4.6) and log counts varied from 0 to 7.3 (mean 4±1.9). Correlations between CPIS values and BI or log count values appeared to be poor (r =0.178, P =0.079 and r =0.115, P =0.257 for CPIS and BI and CPIS and log count, respectively) (Fig. 1a and Fig. 1b). When using a log count ≥4 of BAL samples as gold standard, CPIS value >7, had the largest area under the curve (AUC) in the Receiver Operator Characteristic (ROC) curve (0.644). The AUC for ROC curves with CPIS>6 and CPIS>8 and log count≥4 were 0.541 and 0.640, respectively. Sensitivity and specificity of CPIS>7 were 41% and 77%, respectively, with a positive predictive value of 80% and a negative predictive value of 36%. Comparable results were obtained for ROC analysis of CPIS scores and BI values (Fig. 2). Exclusion of patients admitted with pneumonia or those receiving antibiotics at the time of bronchoscopy did not change the results (data not presented).

Fig. 1a,b.
figure 1

a Scatter plot of comparison between log count of BAL fluid and Clinical Pulmonary Infection Score; b Scatter plot of comparison between Bacterial Index of BAL fluid and Clinical Pulmonary Infection Score

Fig. 2.
figure 2

ROC of Clinical Pulmonary Infection Score and quantitative culture of BAL fluid with Bacterial Index ≥5 and log count ≥4

Of the individual CPIS variables, only a positive microbiological culture of tracheal aspirate was significantly associated with VAP (defined as log count ≥4) (Odds 4.25, 95% CI 1.474–12.25; P =0.007) (Table 4). The variable chest X-ray was not evaluated in logistic regression analysis because this variable is abnormal in all cases, by definition of CDC criteria.

Table 4. Association between individual variables of Clinical Pulmonary Infection Score and Ventilator-Associated Pneumonia. (VAP defined as ≥104 cfu/ml in BAL)

Study 2

Agreement between observers in measuring individual CPIS variables varied from 0.02 for culture results in prospective analysis to 0.7 for temperature in retrospective analysis (Fig. 3). Importantly, physicians frequently decided that variables could not be scored as data were not available at that time point. This could be due to pending data of culture results, impossibility to score body temperature between 36.0 and 36.5, as this was not defined in the original CPIS, or due to missing data, e.g., when chest X-ray was not performed on that day. Although some physicians used the variables of the previous day in case of missing values or gave an interpretation of temperature between 36 and 36.5, others found that they could not use these data to score CPIS.

Fig. 3.
figure 3

Inter-observer variability (kappa) for the different Clinical Pulmonary Infection Score-variables and for CPIS >6 measured prospectively (P_1) (n =42), prospectively with more defined criteria (P_2) (n =46), retrospectively (R) (n =10) with 95% CI

The CPIS scores (maximum score of 12) of 42 randomly chosen patients with or without a clinical suspicion of VAP varied from 0 to 8 (mean 3.26±1.91), with a level of agreement (kappa) for CPIS <6 and ≥6 of 0.16. This extremely low level of agreement is explained by the inclusion of missing data for individual scoring variables. When excluding patients with missing variables kappa for CPIS>6 and ≤6 was 0.6 (data not shown). The retrospective calculation of CPIS scores in patients with a clinical suspicion of VAP (that had undergone bronchoscopy) varied from 3–10 (mean 7.9±2.46) with a level of agreement for CPIS>6 and ≤6 of 0.55.

In attempt to decrease inter-observer variability, a flow-chart (Table 2) was developed. Now kappa values varied from 0.13 for quantity of tracheal secretions to 0.6 for leukocyte counts, with a level of agreement (kappa) for CPIS ≤6 and >6 of 0.18 (Fig. 3). When excluding patients with missing variables, kappa was 0.5 for CPIS>6 and ≤6.

Discussion

Associations between CPIS and quantitative cultures of BAL samples in patients with a clinical suspicion of VAP appeared to be poor and the level of agreement between different physicians for scoring individual variables varied greatly. These findings question the reliability of the CPIS as a diagnostic tool for VAP.

The absence of a true gold standard for VAP hampers the interpretation of any study investigating the diagnostic approach of this infection, and is also a confounder of the present study. We have chosen to use quantitative cultures of BAL as surrogate gold standard, fully realizing that sensitivity and specificity of BAL cultures are not 100%. Moreover, since Pugin et al. validated the CPIS scores on bacterial index of log counts, we have used both the log count of microorganisms per ml as well as the bacterial index for comparison with CPIS score. The bacterial index was introduced by Johanson et al. [17], but is probably not widely used nowadays.

Validation of CPIS has been attempted in seven studies, but due to differences in comparative diagnostic techniques that were used and modifications of the original CPIS criteria, direct comparison of these studies is difficult. In their original analysis Pugin and coworkers [6] found an excellent correlation between CPIS>6 and bacterial index >5 of quantitative cultures obtained by BAL. In their analysis of 40 BAL procedures in 28 patients, a CPIS >6 had a sensitivity of 93% and a specificity of 96% for diagnosing VAP, which corresponds to an area under the ROC curve of 0.95. In two other studies the original CPIS was compared to post mortem results with VAP diagnosed upon histological criteria [2, 8]. Using a CPIS>6 as cutoff point, sensitivities were 72% and 77%, and specificities 85% and 42%, respectively. The comparison of the unmodified CPIS to the consensus of two investigators in a study of 59 children with suspected VAP gave an area under the ROC curve of 0.81 [9].

In the remaining three studies, the original CPIS was modified. A’Court et al. [7, 11] deleted the variable ‘culture result’ and added three criteria: [1] clinical course on/off antibiotics consistent with pneumonia, [2] lack of evidence for an alternative source of sepsis, and [3] lung biopsy or post-mortem histology demonstrating pneumonia within a relevant time span. VAP was confirmed retrospectively either by CPIS>8 plus one out of three additional criteria, or by CPIS>6 plus two out of three additional criteria. Compared to quantitative cultures of BAL (cut off ≥104 cfu/ml) the sensitivity of this modified CPIS was 93%. Flanagan and colleagues [10] deleted the variable ‘culture result’ and used a different description of sputum quantity, which was divided into scanty (0 points), moderate/profuse but non-purulent (1 point) or moderate/profuse and purulent (2 points). The adjusted CPIS was compared to a clinical suspicion of VAP according to CDC criteria, which was considered confirmed by histopathological confirmation or by concurrent isolation of a pathogen from distal respiratory samples, pleural fluid culture or blood culture. In an analysis of 34 episodes of clinically suspicions of VAP in 32 patients, CPIS>7 as a cutoff point had a sensitivity of 85% and specificity of 91%. In the most recent study, a modified CPIS (using five variables at day 1: temperature, leukocytes, secretions, PaO2/FiO2, chest X-ray) had a sensitivity of 60% and specificity of 59% for diagnosing VAP when using a value >6 as cutoff. Sensitivity and specificity improved somewhat (to 78% and 56%, respectively), when Gram stain results of BAL fluid were added to the score [18].

How can we explain the poor correlations found in the present study, as compared to better associations in some other studies? Patient selection is a possibility. The pre-test probability of VAP may have been higher in our population as only adult patients with VAP according to CDC criteria were included. In other studies patients without a clinical suspicion [6], or children [9] have been included. Recently, Michaud and coworkers demonstrated in a meta-analysis of studies on diagnostic tests for VAP that patient selection (i.e., inclusion of patient with a clinical suspicion of VAP) had the largest effect on the measurement of a test’s performance [19]. Inclusion of patients without a clinical suspicion of VAP could lead to false-positive results. They also recommended to use BAL volumes ≥140 ml and to obtain pulmonary secretions before start of antibiotics to get a higher performance of diagnostic tests. In ten patients included in our study (those studied in Utrecht) BAL was performed with <140 ml lavage fluid, which might have negatively influenced our results. Another explanation could be the “gold standard” that was used. In two studies, CPIS scores were not calculated at the time of infection, but at the time of death with post-mortem examination as a gold standard [2, 8]. Finally another explanation could be the sample size, as smaller patient populations, ranging from 25 to 59 patients were included in all other studies [2, 6, 7, 8, 9, 10].

As compared to the original CPIS score defined by Pugin, we had to introduce some adjustments. Since quantities of tracheal aspirates were not monitored routinely by our nursing staff this variable was omitted, precluding an optimal comparison of CPIS test characteristics to those reported by Pugin. This being said, low inter-observer variability also is a prerequisite for general use of a diagnostic test. In our study the agreement of individual CPIS variables between observers was low, especially for frequently measured variables, and the introduction of a flowchart did hardly improve agreement levels. However, the high number of missing data remains an insufficiency of our study.

Positive microbiological cultures of tracheal aspirates obtained at the day of the clinical suspicion of VAP, appeared to be the best predictor of VAP. However, these results would only be available 24–48 h later and this variable is, therefore, hardly suitable for prospective use. Although surveillance cultures of tracheal aspirates are frequently used for selection of empirical treatment, its benefits on accuracy of empirical therapy in case of VAP have not been determined. The positive predictive value of surveillance cultures to identify the causative microorganism causing VAP was only 18% in a recent study [20].

The CPIS has been successfully used to guide duration of therapy in some patients and as a tool to investigate the effects of infection prevention. Singh et al. [12] used an adjusted CPIS score (counting five variables: temperature, leukocytes, secretions, PaO2/FiO2, chest X-ray) to safely discontinue antimicrobial therapy after 3 days in patients with persistent low CPIS scores (≤6). In addition, CPIS may be used for longitudinal analysis of patients in intervention studies to prevent VAP [21]. However, our findings on the inter-observer variability justify that as few as possible persons calculate CPIS scores.

In conclusion, when compared to quantitative cultures obtained by BAL, specificity and sensitivity of CPIS appeared to be low. Moreover, inter-observer variability for individual CPIS variables and overall score of CPIS was low. These findings justify a cautious use of CPIS as a diagnostic tool for VAP.