Introduction Vocal cord dysfunction (VCD) typically involves abnormal vocal cord movement during inspiration. The recognised gold standard for diagnosis is fibreoptic laryngoscopy (FOL) during a symptomatic attack. Despite this there are no reported VCD FOL assessment scales to facilitate agreement in presentation, disease severity and treatment monitoring. Our VCD tertiary airways clinic receives over 300 referrals a year. We run a weekly diagnostic FOL list and identified the need for a VCD FOL classification for optimal care.
Aims To gain consensus for a VCD FOL appearance scale and identify its interrater reliability.
Methods An expert consensus group was convened comprising two respiratory consultant physicians and two respiratory speech and language therapists (SLTs). All have significant experience in VCD FOL interpretation. The group met, discussed and agreed on the VCD FOL appearance scale (Table 1). Two assessment teams were identified, each comprising a respiratory physician and a respiratory SLT. Each team rated patients, referred for FOL with a clinical suspicion of VCD, in three consecutive diagnostic FOL lists. All procedures were recorded and then blindly re-rated during playback by the other assessment team.
Results Eighteen patients received ratings; the mean (range) age was 51(19–80) and 78% were female. The assessing teams agreed on the rating for seven patients. For nine patients there was disagreement but adjacent classifications. Interrater agreement was performed using a weighted kappa (1 = complete agreement in classification; 0.5 = disagreement but adjacent classifications; 0 = disagreement and non-adjacent classifications). There was moderate agreement between the teams; 0.44 with a 95% confidence interval of 0.18–0.70. There was no bias between the assessment teams, as each had mean ratings for all patients of 2.4.
Conclusions The VCD FOL appearance scale is a promising clinical assessment tool for the VCD population. We expected further interrater agreement; interestingly the majority of disagreement would not have changed management as classification still yielded a positive diagnosis. The differential maybe attributed to whether ratings were performed live or in playback, and this should be investigated. With further development, standardisation of application and robust validation it will be a useful assessment to direct appropriate management and facilitate accurate and consistent diagnosis.