Article Text

Download PDFPDF
Using routine health data for research: the devil is in the detail
  1. Hannah Whittaker1,
  2. Jennifer K Quint2
  1. 1 NHLI, Imperial College London, London, UK
  2. 2 Respiratory Epidemiology, Occupational Medicine and Public Health, Imperial College London, London, UK
  1. Correspondence to Dr Jennifer K Quint, Respiratory Epidemiology, Occupational Medicine and Public Health, Imperial College London, London SW7 2BU, UK; j.quint{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Electronic healthcare records (EHRs) are increasingly being used for population-based studies globally. Despite their strengths, hidden pitfalls exist and researchers must take extra care to ensure high-quality data to minimise measurement error and biasses. This article discusses the recent work by Kerkhof et al, in relation to disease misdiagnosis and misclassification, the importance of linked data sources and the usability of test variables; all of which are extremely important issues that researchers must be aware of when using EHRs. The devil is in the detail.

EHR databases systematically and routinely collect and store healthcare data electronically and can include data on routine processes in primary and secondary care (disease codes, prescriptions, procedures and tests). The information collected ranges from medical insurance claims, to mortality data, to specific disease registries, with each database coding and storing information differently. The original purpose of EHRs was simply to store medical information digitally. But EHRs are increasingly being used for research and population-based studies globally, offering large sample sizes, a wide breadth of study variables and the inclusion of more generalisable populations.

However, nothing is ever perfect and routinely collected EHR data can have issues; the devil is in the detail. Unlike studies which include purposeful prospective data collection, the original purpose of data collection for EHRs is not research; a controversial argument in data science.1 So, while EHRs allow researchers to study real-world populations seen in every day clinical practice, extra care must be taken to ensure the quality of the …

View Full Text


  • Contributors HW and JKQ wrote and edited this work.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Commissioned; externally peer reviewed.

Linked Articles