Background: A short but sensitive questionnaire evaluating changes in respiratory symptoms and well being during the treatment of community acquired pneumonia (CAP) is needed. We have developed a measurement and evaluated its psychometric properties in 67 patients admitted with CAP.
Methods: The patients were asked to indicate the presence and severity of dyspnoea, coughing, coughing up sputum, coughing up sputum with ease, the colour of the sputum, fatigue, fitness, and their state of health. The item fatigue showed substantial overlap with fitness and was therefore excluded. The response of the patients to the remaining eight items was used to calculate a CAP score.
Results: The percentage of missing data (0.2–1.7%), floor and ceiling effects (0.2/5.5%), internal consistency (Cronbach α = 0.87), and the intraclass correlation coefficient for test-retest reproducibility (0.83) met predefined criteria, indicating good acceptability and reliability. Face and clinical validity were satisfactory. Effect sizes under treatment were large, indicating high responsiveness.
Conclusion: The newly developed CAP score is a simple, reliable, valid, and highly responsive instrument. This makes it scientifically sound and clinically relevant for measuring outcome when evaluating treatment strategies in CAP.
- CRP, C-reactive protein
- ESR, erythrocyte sedimentation rate
- community acquired pneumonia
- outcome assessment
- resolution of symptoms
Statistics from Altmetric.com
In spite of significant progress, community acquired pneumonia (CAP) continues to be a life threatening disease. In the USA it is responsible for an average of 5.6 million cases annually.1 Mortality in CAP is estimated to be <1% for patients not admitted to hospital and 2–30% in hospitalised patients.2 Although effective antibiotic treatment for CAP is available, the rapid rise in antimicrobial drug resistance among common respiratory pathogens and side effects of current drugs require the evaluation of new drugs.
In clinical trials comparing new drugs with standard treatments the impact of treatment is usually evaluated on the basis of clinical outcomes such as mortality, length of hospital stay, or time to return to usual activities. These are inaccurate measurements when identifying small but significant differences between different treatment strategies. Furthermore, these outcomes do not measure the resolution of respiratory symptoms and may reflect poorly the general state of well being of the patient.
Several more recent studies have included resolution of symptoms as an outcome measure.3–5 Unfortunately, there are no validated instruments for the assessment of CAP related symptoms. So far, the psychometric properties of the available instruments have been insufficiently evaluated.5–7 A recently validated questionnaire for CAP covered many items that are not very specific for CAP,8 making this instrument less responsive for the effect of treatment and therefore less useful as a disease specific outcome measure. A few CAP related studies have included quality of life in the evaluation of clinical outcome. Quality of life depends on many factors and may be insensitive to some of the changes in symptoms induced by effective treatment.9–13
We have developed a short disease specific questionnaire to measure the recovery of CAP related symptoms over time as well as the general state of well being of patients with CAP. A study was undertaken to evaluate the acceptability, reliability, validity, and responsiveness of this questionnaire.
Development of the questionnaire
Six items were identified from textbooks, literature, and experts’ opinions as the most specific symptoms that characterise the respiratory condition in CAP. The items were the presence of dyspnoea (graded as presence of dyspnoea at rest, while walking around, washing and dressing, going for a walk, showering, or walking up stairs), severity of dyspnoea in general, coughing, coughing up sputum, coughing up sputum with ease, and colour of the sputum. To these respiratory symptoms we added three items to cover the general state of well being: the general state of health, fatigue, and fitness. The resulting questionnaire therefore contained nine items. Dyspnoea was rated using yes/no response options. Fatigue and fitness were measured using a visual analogue scale. All other items were rated using a Likert scale (see Appendix 1 available online at www.thoraxjnl.com/supplemental).
We tested the items for clarity and comprehensiveness in a pilot study of 18 patients and made minor changes in wording where necessary.
The psychometric properties of the questionnaire were evaluated in a subset of patients enrolled in a randomised, double blind, multicentre trial comparing two durations of treatment of CAP. Eight hospitals participated in the main study, but in this substudy we report data on four hospitals.
Eligibility was assessed according to the following criteria: temperature >38°C, clinical signs of pneumonia, a new infiltrate on the chest radiograph, and a pneumonia severity index (PSI) of <110.14 As the exclusion of afebrile patients may have excluded elderly patients, elderly patients who had evident clinical signs of pneumonia and chest radiograph abnormalities but a temperature of <38°C were also included. Patients with effective antibiotic treatment for more than 24 hours before admission or with another infection necessitating antibiotic treatment and patients with an inadequate cognitive state were excluded from the study. Consenting patients with CAP who met the inclusion criteria were treated with an intravenous β-lactam antibiotic. After 3 days patients with significant clinical improvement were randomised to receive placebo or oral amoxicillin for 5 days. All randomised patients were followed until 28 days after the beginning of antibiotic treatment. At the end of the follow up period we evaluated clinical cure, which was defined as complete recovery or lessening of pneumonia related symptoms and lack of progression of chest radiographic abnormalities. The study was approved by the medical ethical committees of the participating hospitals.
Collection of data
The questionnaire was completed at baseline and on days 3, 7, 10, 14 and 28 by seven different interviewers who were instructed in advance. The interviewers used the questionnaire in a face to face interview, except for day 14 when it was completed in a telephone interview.
At baseline the medical history was taken and a physical examination was performed by the treating physician who was also asked to indicate the presence or absence and the severity of respiratory symptoms (dyspnoea, coughing, coughing up sputum, and the colour of the sputum) using a separate standardised form. Body temperature, oxygen saturation (finger cuff), and respiratory rate were recorded. Blood was taken for measurement of white blood cell count (WBC), C-reactive protein (CRP), and erythrocyte sedimentation rate (ESR). A chest radiograph was also performed. These clinical and laboratory parameters were re-evaluated at days 3 (randomisation day), 7, 10, 14 and 28. The chest radiograph was repeated at day 10 and, if at that time complete resolution was lacking, repeated again at day 28.
Psychometric evaluation of the questionnaire
Based on data generated by the questionnaire, a single scale score was constructed. This total score (CAP score) was examined for the four psychometric properties—acceptability, reliability, validity, and responsiveness. These properties were tested using standardised procedures and instrument review criteria developed by the Scientific Advisory Committee of the Medical Outcomes Trust.15
Construction of single scale score
To combine multiple items into a single scale score, the items should be internally consistent. This was examined using three indicators of internal consistency—corrected item–total correlations, mean inter-item correlation, and Cronbach α coefficient.
Corrected item–total correlations indicate the extent to which each item relates to the construct measured by the total score. Correcting the total score by removing the item of interest prevents spuriously high values due to item overlap.16 A recommended minimum value is 0.40.17 Inter-item correlation indicates the mutual relation between individual items of a rating scale. It is recommended that the mean inter-item correlation should exceed 0.3.18 Internal consistency was further assessed using Cronbach α coefficient. The minimum recommended value is 0.7.19
The composite score was obtained using principal component analysis for categorical variables (PRINCALS). This procedure provides optimally scaled categories of the original items and subsequently calculates a composite score. This composite score was transformed to a 0–100 scale in order to allow manual assessment and calculation for practical use and is referred to as the CAP score.
Acceptability of the questionnaire was analysed from the number of missing data which should not exceed 5%. Floor and ceiling effects (percentage of patients with the lowest and highest scores) were also determined, indicating the potential to detect variances in the extremes. It is recommended that they do not exceed 20%.20
Reliability is the degree to which the instrument is free from random error. Internal consistency as described above is one indicator for reliability of the instrument, the other indicator for reliability being reproducibility or stability of an instrument over time. For this purpose the test-retest reliability was examined in a subset of 27 patients by determining intra-rater and inter-rater agreement between CAP scores on different occasions, to be reported as an intraclass correlation coefficient (ICC). The ICC should exceed 0.80.21
The validity of an instrument is defined as the degree to which the instrument measures what it is intended to measure. We examined three types of validity:
Construct-related validity: evidence that supports a proposed interpretation of scores based on theoretical implications associated with the construct. This is assessed by obtaining evidence that a single construct is measured and that items can be combined to form a summary score. This was determined by calculating the internal consistency of the items as described above.
Face validity: the extent to which the CAP score shows the supposed change during the clinical course of pneumonia. We assessed face validity by determining the improvement at all assessment points.
Clinical validity: the extent to which evidence can be obtained that the scale is correlated with objective clinical measures. This was assessed by comparing the CAP score with external clinical criteria. We examined the correlation between the CAP score at baseline and the doctor’s judgement of the extent of the respiratory symptoms and the PSI on admission. We further examined over time the correlations between the CAP score and the objective clinical, laboratory and radiographic measures—that is, temperature, respiratory rate, oxygen saturation, WBC, CRP, ESR, and chest radiograph.
Responsiveness is the ability of a scale to detect a clinically significant change over time under treatment. This can be determined by calculating effect sizes from admission to end of treatment. Standardised response means—mean change score divided by SD of change scores—are usually chosen for this purpose because these are the most relevant to clinical studies.22 Effect sizes for the CAP score were calculated from changes from the normal level to baseline (day 0) and from baseline to days 10 and 28 (end of follow up). Effect sizes are considered to be moderate when their absolute values exceed 0.50 and large when they exceed 0.80.23
Baseline assessments were available for 67 patients enrolled in the CAP study between November 2000 and November 2002 (table 1). At the time of the analysis the trial had recruited 45 men (66%) and 22 women (34%) of mean (SD) age 56.6 (17.8) years (range 21–96).
After 3 days five patients (7.5%) had not sufficiently improved to be randomised. After randomisation one patient was found to have another infection (endocarditis) at the time of inclusion and this patient was withdrawn from the study. Data on a total of 61 patients were therefore available for the analysis, of which 56 (84%) did not need an additive or alternative antibiotic treatment during follow up. The clinical status of five patients (7.5%) deteriorated while using the study drug and they were subsequently treated with an alternative antibiotic. These failures remained in the study for further analysis until the end of the follow up period. At day 28 all patients fulfilled the criteria for clinical cure.
The patients were interviewed by seven interviewers. It occurred often that one patient was interviewed by two or more interviewers during follow up. In total, the 67 patients were interviewed 352 times. The median number of interviews per interviewer was 36 (range 2–120).
One item (fatigue) showed substantial overlap with fitness and was therefore excluded. The remaining items had a high internal consistency with corrected item–total correlations ranging from 0.49 to 0.73, a mean inter-item correlation of 0.51, and a Cronbach α of 0.87. All recommended criteria for scale construction were therefore satisfied and the items were combined to form a composite score reflecting 59% of the variance in the original items. This composite score was further simplified for practical use into a CAP score. This simplified CAP score (see details in Appendices 1 and 2 available online at www.thoraxjnl.com/supplemental) showed perfect correlation with the original composite score (r = 0.99, p<0.01).
The number of missing data did not exceed the recommended 5%. The floor and ceiling effects of the CAP score were below the recommended maximum of 20% (table 2).
As shown in table 2, the CAP score satisfied the recommended criteria for internal consistency and test-retest reproducibility.
Construct validity: evidence of high internal consistency (table 2) supported the construct validity of our CAP questionnaire.
Face validity: the CAP score showed good improvement during follow up and also deterioration in the case of failure (fig 1A). We divided the CAP score into a respiratory section—subsequently referred to as the respiratory score—which consists of the dyspnoea symptoms, cough and sputum (fig 1B), and a well being section—subsequently referred to as the well being score—which consists of the items fitness and the general state of health (fig 1C). The respiratory score showed an excellent improvement during follow up. At the end of the follow up period the respiratory score approached the normal level of the patients—that is, 1 month before the development of pneumonia. The well being score showed less improvement during follow up and at the end of the follow up period it still had not reached the normal level.
Clinical validity: the CAP score at baseline was correlated with the doctors’ judgement on admission (r = 0.35, p = 0.04). Correlations of the CAP score with the objective clinical and laboratory criteria are shown in table 3. Temperature, respiratory rate, oxygen saturation, WBC, CRP, and ESR were all moderately correlated with the CAP score. No association was found between the CAP score and the PSI on admission or the findings on the chest radiograph.
Table 4 shows effect sizes for the change in CAP score between the normal level and baseline (day 0) and between baseline and days 10 and 28 (end of follow up).
As expected, the CAP score showed a substantial decline between the normal level and baseline and a gradual improvement between baseline and days 10 and 28. Effect sizes were large at all assessment points. We further distinguished changes in the respiratory and the well being sections. Both showed large effect sizes. The respiratory score still showed improvement between days 10 and 28 while the well being score remained stable between these time points.
We have developed a short, symptom based, valid and reliable questionnaire that can be used to evaluate different treatment strategies in CAP. The main reason for developing this questionnaire is the need for a disease specific instrument in clinical studies of CAP.
The CAP score consists of a small compact set of items with a high internal consistency. The acceptability and reliability of the CAP score met the predefined criteria. Face validity was excellent for all items combined as well as for the respiratory and well being subscores separately. The CAP score was shown to be highly responsive to changes over time in the clinical course of pneumonia. This feature of the CAP score makes the instrument very appropriate as a treatment specific instrument for follow up in clinical trials. Furthermore, dividing the CAP score into a respiratory and a well being score offers clinicians and researchers a better understanding of the resolution of pneumonia related symptoms. The slow resolution of pneumonia related symptoms in patients with CAP is well known.3,5,6,24 The division of the CAP score shows that the well being symptoms are mainly responsible for this delay. The questionnaire takes less than 2 minutes to complete and is administered by interview which results in few missing data.
This study has some limitations. Firstly, the patients enrolled in the study represent a select group of patients with CAP with a low to moderately high PSI. Patients who had not significantly improved could not be analysed because they were withdrawn from the study at day 3. As a result we do not know about the symptom resolution of patients with severe CAP, but we expect symptom resolution in these patients to take longer. Secondly, because of the small number of failures, limited data were available to show that the CAP score is sensitive enough to detect worsening of symptoms during follow up. Yet the CAP score did succeed in detecting the major deterioration between the normal level of the patient (1 month before the development of pneumonia) and baseline, as well as the worsening of symptoms in cases of failure on the study drug. Thirdly, at the time of admission the patients were asked to rate the severity of their symptoms 1 month before the onset of pneumonia to establish their pre-pneumonia level. This method is subject to recall bias and might result in an underestimation of their pre-pneumonia symptoms. However, the fact that after treatment the respiratory symptoms had returned to the normal level suggests that patients are able to describe their state of health in a realistic way. Finally, as the item age is incorporated in the PSI,14 this automatically led to the exclusion from the study of very old patients with underlying diseases and elderly patients with an inadequate cognitive state. Nonetheless, 40% of our substudy population was older than 65 years. We therefore believe that the CAP score is also valid in this population as an instrument to measure symptom resolution.
We acknowledge that there is no gold standard to examine the clinical validity of our CAP score. A comparison of the CAP score with several objective clinical and laboratory measures resulted in moderate correlations. High correlations between the CAP score and both indicators of severity of disease were not expected as these indicators resolve quickly during follow up while the resolution of symptoms is much slower. Furthermore, high or perfect correlations would render the CAP score redundant.
We observed no correlation between the CAP score and the presence or absence of chest radiographic abnormalities. As an underlying COPD component may cause a lack of improvement in the respiratory symptoms and the general state of well being, we eliminated this effect by excluding these patients from the analysis, but even then no association was found between the chest radiographic findings and our CAP score. This is consistent with previous studies reporting the lack of correlation between findings on the chest radiograph and patients’ symptoms.4,25 Defervescence or resolution of symptoms strongly support a response to antibiotic therapy while, in these cases, it is not unusual that chest radiographic abnormalities still persist or even progress, especially within the first few days. There was also a discrepancy between the CAP score and the PSI at baseline. This was to be expected as the PSI indicates the risk of mortality while the CAP score is a measure for pneumonia related symptoms.
We believe that the CAP score can be helpful in trials of patients with more severe CAP and hospital acquired pneumonia, although it should be revalidated before it can be used as an outcome measure in this group of patients. In future studies the CAP score can also be used to establish the probable differences in symptom resolution in pneumonia caused by different microbial pathogens.
In conclusion, the CAP score described here is easy to administer, has been fully evaluated for its psychometric properties, and showed high responsiveness to the clinical course of pneumonia. We feel that this instrument can be considered as a scientifically sound and clinically relevant measure of outcome in evaluating treatment strategies in CAP.
The authors thank the patients who participated in the study, I Roede, M van der Zwaan, W van den Berg, M Versloot, H Kragten, L Engelfriet and A Pijlman for conducting the patient interviews and their collaboration in the data collection, and R Breet (Department of Clinical Epidemiology and Biostatistics) for data management.
The appendices are available as downloadable PDFs (printer friendly files).
If you do not have Adobe Reader installed on your computer,
you can download this free-of-charge, please Click here
Files in this Data Supplement:
Funding: Healthcare Insurance Board, Amstelveen, the Netherlands (grant OG99-038).
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.