Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals

J Public Health Med. 1998 Mar;20(1):63-9. doi: 10.1093/oxfordjournals.pubmed.a024721.

Abstract

Background: The aim of the study was to assess the reproducibility of clinical coding in two National Health Service hospitals within North West Thames region.

Methods: A retrospective audit was carried out, of clinical coding in hospital episode statistics, involving comparison of the codes assigned by local staff with those assigned by members of an external team unaware of the locally assigned codes. Where local and external coders disagreed, the records were reviewed for a third time by a further independent coder. The subjects were a random sample of 1607 non-maternity, non-psychiatric admissions occurring between 1991 and 1993, stratified for year and type of disease (asthma, diabetes, appendicitis, fractured femur and 'general'--a random selection of any diagnoses). The main outcome measures were the levels of exact agreement between local and external teams over codes for main diagnosis and procedure, and the level of approximate agreement (over the first three characters of the ICD-9 code for diagnosis and the letter and first two digits of the OPCS-4 code for procedure). For disagreements, the outcome measure was the level of agreement between the 'third' coder and the local and external coders.

Results: For the main diagnosis in the 'general' group at hospital A, internal and external coders agreed exactly in 43 per cent of the admissions examined and agreed 'approximately' in 55 per cent (kappa = 0.54). For hospital B the corresponding figures were 60 per cent and 72 per cent (kappa = 0.72). Approximate agreement was higher for the specific diseases considered, particularly for asthma (A: 86 per cent; B: 91 per cent) and fractured femur (A: 84 per cent; B: 89 per cent). For the main procedure at hospital A, there was exact agreement for 58 per cent and approximate agreement for 70 per cent (kappa = 0.66). For hospital B, the corresponding figures were 76 per cent and 83 per cent (kappa = 0.80). In cases of disagreement over the first three digits of the ICD-9 code for main diagnosis, the third coder disagreed with both local and external coders in 53 per cent at hospital A and 38 per cent at hospital B. Agreement was slightly better for discharges in 1992-1993 than in 1991-1992.

Conclusions: The full clinical codes in NHS hospital episode statistics (HES) data should be treated with caution. The first three characters of ICD-9 codes for diagnoses and the OPCS-4 codes for procedures were more reliable. For some specific conditions such as asthma and fractured femur, reliability of the first three characters is much higher (for example, 86 per cent and 91 per cent for asthma in the two hospitals), but for the full codes can be worse. Secondary diagnoses or comorbidities may be significantly undercoded. A higher level of agreement in 1992-1993 than in 1991-1992 suggests that coding may be improving.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Abstracting and Indexing
  • Data Collection / standards*
  • Diagnosis-Related Groups / classification*
  • Diagnosis-Related Groups / standards
  • Hospitals / statistics & numerical data
  • Hospitals, Public / statistics & numerical data*
  • London
  • Medical Records / classification
  • Patient Discharge / statistics & numerical data
  • Reproducibility of Results
  • Retrospective Studies
  • State Medicine / statistics & numerical data
  • Utilization Review