Accounting for overlap? An application of Mezzich's kappa statistic to test interrater reliability of interview data on parental accident and emergency attendance

P Eccleston; U Werneke; K Armon; T Stephenson; R MacFaul

doi:10.1046/j.1365-2648.2001.01718.x

Accounting for overlap? An application of Mezzich's kappa statistic to test interrater reliability of interview data on parental accident and emergency attendance

J Adv Nurs. 2001 Mar;33(6):784-90. doi: 10.1046/j.1365-2648.2001.01718.x.

Authors

P Eccleston¹, U Werneke, K Armon, T Stephenson, R MacFaul

Affiliation

¹ Academic Division of Child Health, School of Human Development, University of Nottingham, Nottingham, UK. pippa.eccleston@nottingham.ac.uk

PMID: 11298216
DOI: 10.1046/j.1365-2648.2001.01718.x

Abstract

Study rationale: The number of interview studies with service users is rising because of growth in health services research. The level of agreement between multiple interview data coders requires statistical calculation to support results. Basic kappa statistics are often used but this depends on having mutually exclusive data. Researchers should be aware that this is not valid when an interview word or paragraph can be coded into more than one category. The 'proportional overlap' kappa extension by Mezzich et al. (1981, Journal of Psychiatric Research 16, 29-39) has been investigated as an original solution.

Objectives: To assess the level of agreement beyond chance between several raters of interview data by applying the 'proportional overlap' kappa statistic by Mezzich et al. to verbal interview data. The clinical area investigated was child attendance at an Accident and Emergency Department, where parental attendance experiences have been under-explored.

Methods: Two researchers using a coding schedule coded a random sample of interview transcripts. These data were applied to Mezzich's procedure; coder 1 notes that a paragraph refers to category A and B but coder 2 notes A, B and C. The total agreement overlap in this case was 0.66 because two actual agreements out of three possible agreements were made. This was repeated for each paragraph and divided by the number of coding pairs. All agreement values were summed then subsequently divided by the total number of paragraphs to get Po (total number of observed agreements) and by the total number of coding pairs to get Pe (total number of agreements by chance alone). Po and Pe were used in the basic kappa formula to assess interview coding reliability.

Results: The overall mean Po was 0.61, the mean Pe was 0.32, with a kappa score of 0.43; a moderate level of agreement which was statistically significant (t=4.8, P < 0.001, d.f.=23).

Conclusion: Mezzich's procedure may be applied to interview data to calculate agreement levels between several coders.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Accidents / statistics & numerical data*
Adult
Anxiety / psychology
Attitude to Health
Child
Data Collection / methods*
Data Collection / standards
Data Interpretation, Statistical*
Emergency Service, Hospital / statistics & numerical data*
England
Health Services Research / methods*
Health Services Research / standards
Hospitals, University
Humans
Injury Severity Score
Interviews as Topic / standards*
Nursing Methodology Research / methods*
Nursing Methodology Research / standards
Observer Variation*
Parents / psychology*