Article Text

Value of severity scales in predicting mortality from community-acquired pneumonia: systematic review and meta-analysis
  1. Yoon K Loke1,2,
  2. Chun Shing Kwok1,
  3. Alagaratnam Niruban2,
  4. Phyo K Myint1,2
  1. 1School of Medicine, Health Policy and Practice, University of East Anglia, Norwich, Norfolk, UK
  2. 2Directorate of Medicine, Norfolk and Norwich University Hospital, Norwich, Norfolk, UK
  1. Correspondence to Dr Yoon K Loke, School of Medicine, Health Policy and Practice, Chancellors Drive, University of East Anglia, Norwich, Norfolk NR4 7TJ, UK; y.loke{at}


Background Several scoring systems have been used to predict mortality in patients with community-acquired pneumonia. The properties of commonly used risk stratification scales were systematically reviewed.

Methods MEDLINE and EMBASE (January 1999–October 2009) were searched for prospective studies that reported mortality at 4–8 weeks in patients with radiographically-confirmed community-acquired pneumonia. The search focused on the Pneumonia Severity Index (PSI) and the three main iterations of the CURB (confusion, urea nitrogen, respiratory rate, blood pressure) scale (CURB-65, CURB, CRB-65), and test performance was evaluated based on ‘higher risk’ categories as follows: PSI class IV/V, CURB-65 (score ≥3), CURB (score ≥2) and CRB-65 (score ≥2). Random effects meta-analysis was used to generate summary statistics of test performance and receiver operating characteristic curves were used for predicting mortality.

Results 402 articles were screened and 23 studies involving 22 753 participants (average mortality 7.4%) were retrieved. The respective diagnostic odds ratios for mortality were 10.77 (PSI), 6.40 (CURB-65), 5.97 (CRB-65) and 5.75 (CURB). Overall, PSI had the highest sensitivity and lowest specificity for mortality, CRB-65 was the most specific (but least sensitive) test and CURB-65/CURB were between the two. Negative predictive values for mortality were similar among the tests, ranging from 0.94 (CRB-65) to 0.98 (PSI), whereas positive predictive values ranged from 0.14 (PSI) to 0.28 (CRB-65).

Conclusions The current risk stratification scales (PSI, CURB-65, CRB-65 and CURB) have different strengths and weaknesses. All four scales had good negative predictive values for mortality in populations with a low prevalence of death but were less useful with regard to positive predictive values.

  • Clinical epidemiology
  • respiratory infection

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Community-acquired pneumonia (CAP) is a common cause of hospital admission and a leading cause of death in the UK.1 The overall mortality of CAP varies, with higher rates of death seen in those judged to have severe pneumonia.2–5 A reliable method of assessing the severity of pneumonia may potentially improve the triage or initial management of patients by helping clinicians determine whether close monitoring and aggressive treatment is more appropriate than conservative management. However, accurately assessing the severity of pneumonia can be challenging, so a number of scales based on prognostic factors have been designed to identify patients at high risk of death, as well as those at low risk who may require less of a watchful eye.

The most notable scales are the CURB-65 (confusion, urea nitrogen, respiratory rate, blood pressure, age ≥65 years) which is recommended by the British Thoracic Society5 and the Pneumonia Severity Index (PSI) which originates from the USA.6 Different iterations of the CURB-65 are available, including the CRB-65 which may be particularly suited for community use as it relies on clinical history and examination without requiring blood urea measurements,7 and the CURB score which excludes age as part of its criteria, thus reducing emphasis on chronological age as a prognostic factor.8

In this review we aimed to systematically retrieve and appraise the available data on the PSI and the three main iterations of the CURB scoring system (CURB, CRB-65, CURB-65) in order to determine the ability of each test to correctly predict mortality in patients with pneumonia. Although we are aware of a multitude of individual studies in different settings, the comparative performance of these scales has yet to be summarised in a meta-analysis.


Search strategy

Ovid SP was used to search PubMed and EMBASE from 1999 up to October 2009 using the terms (Pneumonia and (Sever* or Predict* or prognos*) and (scale or score or assessment or index) and (mortality or survival or death) and (community-acquired)).mp. The bibliographies of included studies were checked for any other relevant articles and the authors were contacted for further information where necessary.

Eligibility criteria

Two reviewers initially checked the title and abstracts against the following inclusion and exclusion criteria.

Inclusion criteria

  • English language journal publication

  • At least 100 participants

  • Prospective studies

  • Reporting patient outcomes after the use of a pneumonia severity scale

  • Community-acquired pneumonia, which may include residents of nursing homes

  • Randomised controlled trials where any single intervention arm fulfilled inclusion criteria; we did not consider trials comparing different antibiotics.

Exclusion criteria

  • Less than 100 participants in the study

  • Based in the community (without radiological and laboratory tests), or if recruitment was restricted to patients on the intensive care unit without considering other hospitalised patients. However, we accepted studies that reported fully on all hospitalised patients (in intensive care as well as general wards).

  • Hospital-acquired pneumonia

  • Studies that looked at outcomes of specific types of pneumonia such as legionella or viral rather than community-acquired pneumonia in general

  • Studies that relied entirely on specific biomarkers without using a clinical risk stratification scale

  • Retrospective studies.

From the above we then obtained full-text versions of potentially relevant articles and carried out a more detailed screening process. In conjunction with the above criteria, studies were included only if:

  • Pneumonia was defined by signs, symptoms and chest x-ray

  • Follow-up mortality data were available at 4–8 weeks after presentation

  • Reporting of mortality according to parameters of severity scale (PSI, CURB-65, CRB-65, CURB)

  • Enrolment of patients in the 10-year period spanning 1999–2009 (the CURB-65 and its different iterations only became available after 1999 and we wanted to ensure that comparisons of the PSI and CURB-65 scales were based on patients recruited within the same time period; this was aimed at reducing the possibility of confounding where any differences in the performance of the severity scales might have stemmed from time-related changes in the epidemiology, microbiology or treatment of pneumonia).

Validity assessment

Study validity was assessed using a checklist based on published opinion regarding the key components of prognostic studies.9 The list was based on clear reporting of the following items:

  • Patient selection criteria

  • Diagnostic criteria for pneumonia

  • Loss to follow-up

  • Methods used to ascertain outcome

  • Management protocol used in treating pneumonia (eg, antibiotic regimen)

Data abstraction

Two reviewers assessed the eligibility and extracted numerical outcomes data from the included studies. The reviewers obtained full consensus on inclusion of the studies and data extraction after resolving any discrepancies through discussion with team members. Authors were contacted if any items required clarification.

Study characteristics

Geographical location and setting, sample size, age and gender of participants, type of prognostic scale and mortality rate according to classification of severity were recorded.

Quantitative data synthesis and sensitivity analysis

Patients were classified according to risk score and data on mortality were extracted. For each study the numbers of patients and deaths were recorded, with ‘higher risk or severe’ dichotomised categories comprising PSI (class IV and V), CURB-65 (score ≥3), CURB (score ≥2) and CRB-65 (score ≥2). Statistical analysis was carried out using R-DiagMeta,10 MetaAnalyst11 and RevMan 5.024 (Nordic Cochrane Center, Copenhagen, Denmark). Pooled sensitivities, specificities, diagnostic odds ratios (OR) and positive and negative predictive values were calculated using the random effects model (which takes into account the variability between studies).12 The summary receiver operating characteristic curves (SROC) for each severity scale were generated with the bivariate random effects approach.10 Statistical heterogeneity was assessed using the I2 statistic, with I2 values >50% indicating a substantial level of heterogeneity.13


The study selection flowchart is shown in figure 1. Twenty-three studies were included in the analysis,14–33 with some studies reporting more than one risk stratification scale. Sixteen studies covered PSI, 12 CURB-65, 10 CRB-65 and 5 CURB. For direct comparisons of the severity scales in the same patient dataset, 7 tested PSI versus CURB-65, 4 tested PSI versus CRB-65 and 2 studied PSI versus CURB. Table 1 and online table show the characteristics of the studies included in the meta-analysis.

Figure 1

Flowchart showing study selection for pneumonia severity scales. CURB, confusion, urea nitrogen, respiratory rate, blood pressure; PSI, Pneumonia Severity Index.

Table 1

Study characteristics

The total sample size from the 23 studies was 22 753 participants with 1680 deaths, giving an average mortality rate of 7.4%. Sample sizes ranged from 134 to 3181 participants. Studies were carried out mainly in emergency department and hospital settings in Europe and North America, with a smaller proportion of data available from Australia, Hong Kong and Pakistan. The average age of participants in the studies was typically around 60 or 70 years. For the validity assessment we found the risk of bias to be a particular issue with ascertainment of mortality. Only 14 of the 23 studies gave details on the methods used in confirming whether participants had died during the follow-up period (see online table). Lack of information on patient management was another major potential source of bias as there was little information on the interventions used in treating pneumonia and whether antibiotic use was consistent both within a study and between different studies.

Indirect and direct comparisons of the ability to identify patients at risk of death

The paired sensitivities (proportion of patients who subsequently die who were correctly classified as ‘higher risk’) and specificities (proportion of survivors who were correctly classified as not being in the ‘higher risk’ category) of each study are shown with the SROC curves for the severity scale in figure 2A. Data on the studies with direct comparisons are shown in figure 2B and C, with 7 studies evaluating PSI versus CURB-65 and 4 studies comparing PSI with CRB-65 (the results for other direct comparisons are not shown owing to the small number of studies). Both the direct and indirect comparisons consistently indicate that PSI is more sensitive but less specific than CURB-65, CURB or CRB-65 in identifying those who subsequently die.

Figure 2

Receiver operating characteristics space illustrating performance of scoring systems at identifying patients at risk of death. (A) Paired sensitivity and specificity with SROC curves giving an indirect comparison of all four scoring systems. (B) Paired specificity and sensitivity of PSI versus CURB-65 in predicting mortality in head to head studies. (C) Paired specificity and sensitivity of PSI versus CRB-65 in predicting mortality in head to head studies. CRB-65, confusion, respiratory rate, blood pressure, age ≥65 years; CURB, confusion, urea nitrogen, respiratory rate, blood pressure; CURB-65, confusion, urea nitrogen, respiratory rate, blood pressure, age ≥65 years; PSI, Pneumonia Severity Index.

Forest plots of the raw data and estimated sensitivities and specificities from each study are shown in figure 3A–D. Pooled estimates of sensitivities, specificities, diagnostic ORs and positive and negative predictive values were measured for each scale and these results are shown in table 2. There was significant heterogeneity for the pooled estimates.

Figure 3

Forest plots of sensitivity and specificity according to study and prognostic scale. (A) Pneumonia Severity Index (PSI). (B) CURB-65 (confusion, urea nitrogen, respiratory rate, blood pressure, age ≥65 years). (C) CRB-65 (confusion, respiratory rate, blood pressure, age ≥65 years). (D) CURB (confusion, urea nitrogen, respiratory rate, blood pressure).

Table 2

Summary statistics* of test performance for the pneumonia risk stratification scales

As there were only a relatively small proportion of deaths, the negative predictive values of all four scales were similarly impressive, ranging from 0.94 (CRB-65) to 0.98 (PSI). The low prevalence of death also accounts for the somewhat low positive predictive values ranging from 0.14 (PSI) to 0.28 (CRB-65) (table 2).

Estimates of clinical impact of each severity scale

Measures of the clinical impact of risk stratification (such as positive and negative predictive values) depend on the underlying rate of pneumonia-related deaths. Figure 4A shows the absolute number of false negatives (number of patients wrongly classified as non-severe) according to severity scale per 1000 patients with pneumonia. The PSI had the lowest false negative rate, meaning that the test is able to correctly identify patients who have non-severe pneumonia and are at low risk of death. Figure 4B shows the absolute number of false positives (number of patients misclassified as ‘high risk’) per 1000 patients with pneumonia, according to test. This illustrates that the PSI errs on the side of caution by judging relatively more survivors as being at ‘high risk’, whereas the CURB-65 and its iterations are more specific in correctly classifying patients who have a greater likelihood of death. These figures also confirm that the absolute impact of test performance varies with the mortality rate and that the differences between tests become more apparent in populations with high proportions of pneumonia-related deaths.

Figure 4

(A) Number of patients wrongly classified as non-severe as a function of incidence of mortality per 1000 patients with pneumonia. (B) Number of patients wrongly classified as severe as a function of incidence of mortality per 1000 patients with pneumonia. CRB-65, confusion, respiratory rate, blood pressure, age ≥65 years; CURB, confusion, urea nitrogen, respiratory rate, blood pressure; CURB-65, confusion, urea nitrogen, respiratory rate, blood pressure, age ≥65 years; PSI, Pneumonia Severity Index.


To our knowledge, this is the first systematic review and meta-analysis covering the comparative test performance of pneumonia severity scales that are in common clinical use. Current risk stratification scales (PSI, CURB-65, CRB-65 and CURB) have different trade-offs and no single scale is clearly superior on all counts. The PSI is the most sensitive test with a low false negative rate (figure 4A), thus giving clinicians greater confidence in identifying patients who may not need hospital admission.34 Conversely, the CURB-65, CRB-65 and CURB scales are more specific and have higher positive predictive values than the PSI, which means that a greater proportion of patients in the ‘higher risk’ categories are correctly classified.

In theory, the poorer sensitivity of the CURB-65-based scales means that some patients may be incorrectly diagnosed and managed as non-severe even when they are actually at higher risk of death. However, as there was only a relatively small percentage of deaths (7.4%) among study participants, the negative predictive values of these CURB-65-based tests are very similar to those of PSI, indicating that the clinical differences between tests are likely to be small. In situations with a low prevalence of adverse outcomes, we can be more confident that a ‘non-severe’ test result reliably predicts that the patient will have a good outcome, but we are also less certain that a ‘severe’ rating genuinely predicts death.35 This is illustrated by the range of positive predictive values from 0.14 (PSI) to 0.28 (CRB-65).

The practical aspects and resource implications of the chosen scoring system should also be considered. The PSI involves a detailed history, physical examination, venous blood sampling, arterial blood gas measurements and chest x-ray, thus requiring the physician to gather a total of 12 parameters from the history and examination as well as 7 parameters derived from further investigations.6 Although most of the parameters for the PSI are available in hospital settings and can be worked out with a web-based PSI calculator,36 busy clinicians in emergency departments may end up not bothering to estimate the PSI or may rush through the task inaccurately. However, the CURB-65-based scales do not incorporate potentially important parameters such as hypoxaemia and bilateral pneumonia in their scores, while the CRB-65 (a potentially useful scale for community use) omits the urea measurement, thus possibly reducing sensitivity.

The choice of test depends on the attitude of the health providers regarding healthcare and resources use as well as the rate of pneumonia-related mortality. The most sensitive and labour-intensive test (such as the PSI) may be preferred in resource-rich healthcare settings, particularly where pneumonia mortality is relatively high. In certain community settings with limited resources and where pneumonia mortality is relatively low, the lower sensitivity of the CRB-65 is not a major disadvantage and its ease of use and higher specificity may help clinicians to focus on those requiring more clinical attention.

Study limitations

There are a number of limitations in our review. There is clearly considerable heterogeneity in the performance of the severity scales, which may be related to the diverse populations evaluated and differences in microbiological spectrum and antibiotic sensitivity. This affects the validity of the pooled estimates which we presented as a secondary analysis, even though we used a random effects model which incorporates study level variability into the meta-analysis. Other researchers have suggested that large degrees of heterogeneity are commonly seen in meta-analyses of diagnostic studies.37 Closer examination of our Forest plots did not demonstrate any consistent source to account for the substantial heterogeneity, and we believe that the data points for each severity scale are actually fairly consistently clustered together in the SROC plane. We also specifically chose to include studies with direct head-to-head evaluation of PSI and the CURB-65-related scoring systems within the same patient population so that heterogeneity would be minimised when comparing different severity scales.

Although we used a comprehensive search strategy, some retrieved papers had to be excluded because raw mortality data were not given or there was insufficient information for us to work out the number of deaths from the reported sensitivity or specificity. Two notable studies had to be excluded for not fulfilling our eligibility criteria: the very first PSI-based article studied patients from 1989 to 1994 which is a very different time period from that of the CURB-65-related studies that began around 1999,6 while a recent German study of 388 406 patients reported only on inpatient deaths rather than mortality at 4–8 weeks of follow-up.7 We did not have the resources to evaluate non-English publications and we chose to include only peer-reviewed published literature.

We were unable to properly assess the quality of the included studies owing to the lack of detail in reporting of key areas such as the methods used in confirming the outcomes and the antibiotic regimens. Only a few studies provided information on the treatment pathways, and it is possible that different treatment regimens may have contributed to the substantial heterogeneity seen here. For instance, variations in antibiotic use and antibiotic resistance, availability of chest specialists and intensive care beds could have affected mortality outcomes and performance of the severity scale.


Our findings suggest that the PSI and CURB-65 scoring systems perform well at identifying patients with pneumonia who have a low risk of death. However, it is also clear that all four prognostic scales have limitations and should only be used in conjunction with careful clinical judgement when making a management decision. Further research should focus on identifying the individual characteristics that account for differences among the four scoring systems so that further refinements can be made to enable more accurate risk classification and treatment of patients with pneumonia.


Supplementary materials

  • Web Only Data thx.2009.134072

    Files in this Data Supplement:


  • Linked articles 143297, 133280.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles