Background Accurate antibody tests are essential to monitor the SARS-CoV-2 pandemic. Lateral flow immunoassays (LFIAs) can deliver testing at scale. However, reported performance varies, and sensitivity analyses have generally been conducted on serum from hospitalised patients. For use in community testing, evaluation of finger-prick self-tests, in non-hospitalised individuals, is required.
Methods Sensitivity analysis was conducted on 276 non-hospitalised participants. All had tested positive for SARS-CoV-2 by reverse transcription PCR and were ≥21 days from symptom onset. In phase I, we evaluated five LFIAs in clinic (with finger prick) and laboratory (with blood and sera) in comparison to (1) PCR-confirmed infection and (2) presence of SARS-CoV-2 antibodies on two ‘in-house’ ELISAs. Specificity analysis was performed on 500 prepandemic sera. In phase II, six additional LFIAs were assessed with serum.
Findings 95% (95% CI 92.2% to 97.3%) of the infected cohort had detectable antibodies on at least one ELISA. LFIA sensitivity was variable, but significantly inferior to ELISA in 8 out of 11 assessed. Of LFIAs assessed in both clinic and laboratory, finger-prick self-test sensitivity varied from 21% to 92% versus PCR-confirmed cases and from 22% to 96% versus composite ELISA positives. Concordance between finger-prick and serum testing was at best moderate (kappa 0.56) and, at worst, slight (kappa 0.13). All LFIAs had high specificity (97.2%–99.8%).
Interpretation LFIA sensitivity and sample concordance is variable, highlighting the importance of evaluations in setting of intended use. This rigorous approach to LFIA evaluation identified a test with high specificity (98.6% (95%CI 97.1% to 99.4%)), moderate sensitivity (84.4% with finger prick (95% CI 70.5% to 93.5%)) and moderate concordance, suitable for seroprevalence surveys.
- viral infection
- clinical epidemiology
- respiratory infection
This article is made freely available for use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.https://bmj.com/coronavirus/usage
Statistics from Altmetric.com
What is the key question?
How well do lateral flow immunoassays perform in people who do not require hospitalisation, and how does finger-prick self-testing compare with performance in the laboratory with serum or laboratory-based ELISA?
What is the bottom line?
Lateral flow assays are highly specific, making many of them suitable for seroprevalence surveys, but their variable sensitivity and sample concordance means they must be evaluated with both sample and operator of intended use to characterise performance.
Why read on?
We describe a rigorous approach to lateral flow immunoassay evaluation which identified a suitable candidate for national seroprevalence survey and characterised performance in a non-hospitalised population.
There are currently more commercially available antibody tests for SARS-CoV-2 than any other infectious disease. By May 2020, over 200 tests were available or in development.1 Accurate antibody tests are essential to monitor the COVID-19 pandemic at population level, to understand immune response and to assess individuals’ exposure and possible immunity from reinfection with SARS-CoV-2. Serology for national surveillance remains the fourth key pillar of the UK’s national testing response.2
Access to high-throughput laboratory testing to support clinical diagnosis in hospitals is improving. However, the use of serology for large-scale seroprevalence studies is limited by the need to take venous blood and transport it to centralised laboratories, as well as assay costs. Lateral flow immunoassays (LFIAs) offer the potential for relatively cheap tests that are easily distributed and can be either self-administered or performed by trained healthcare workers. However, despite manufacturers’ claims of high sensitivity and specificity, reported performance of these assays has been variable3–9 and their use is limited to date.
In the UK, the Medicines and Healthcare Products Regulatory Agency (MHRA) requires that clinical sensitivity and specificity must be determined for each claimed specimen type, and that sample equivalence must be shown.10 For antibody tests intended to determine whether an individual has had the virus, the MHRA recommend a sensitivity >98% (95% CI 96% to 100%) (on a minimum of 200 known positive specimens, collected 20 days or more after symptom onset) and specificity >98% on a minimum 200 known negatives.10 To date, no LFIAs have been approved for use by these criteria. However, LFIAs with lower sensitivity can still play an important role in population seroprevalence surveys,11 in which individual results are not used to guide behaviour, provided specificity (and positive predictive value) is high. Such tests will need to have established performance characteristics for testing in primary care or community settings, including self-testing.
As part of the REACT (REal Time Assessment of Community Transmission) programme,12 we assessed LFIAs for their suitability for use in large seroprevalence studies. This study addresses the key questions of how well LFIAs perform in people who do not require hospitalisation, and how finger-prick self-testing compares with laboratory testing of serum on LFIAs and ELISA.
A STARD checklist (of essential items for reporting diagnostic accuracy studies) is provided in the online
Patient recruitment and selection of sera
Between 1 and 29 May 2020, adult NHS workers (clinical or non-clinical), who had previously tested positive for SARS-CoV-2 by PCR, but not hospitalised, were invited to enrol into a prospective rapid antibody testing study, across four hospitals in two London NHS trusts. Participants were enrolled once they were at least 21 days from the onset of symptoms, or positive swab test (whichever was earlier). Sera for specificity testing were collected prior to August 2019 as part of the Airwaves study13 from police personnel.
LFIAs were selected based on manufacturer’s performance data, published data, where available, and the potential for supply to large-scale seroprevalence surveys. Initially, five LFIAs were assessed, with a view to using the highest performing test in a national seroprevalence survey commencing in June 2020 (phase I). After selection of an initial candidate, further evaluation was undertaken of LFIAs to be considered for future seroprevalence surveys (phase II, ongoing). For all LFIAs, sensitivity analysis was conducted on a minimum 100 sera from the assembled cohort. LFIAs with >80% sensitivity underwent further specificity testing, and those with specificity >98% are being evaluated in clinic.
Of tests included in phase I, one detected combined immunoglobulin M (IgM) and immunoglobulin (IgG) as a single band, three had separate bands for IgM and IgG, and one detected IgG only. This study set out to determine sensitivity and specificity of tests in detecting IgG antibodies to SARS-CoV-2, at least 21 days from symptom onset. For consistency, in the three kits which had separate IgM and IgG bands, only IgG was counted as a positive result (ie, ‘MG’ or ‘G’ but not ‘M’, distinct from manufacturer guidance).
Each participant performed one of five LFIA self-tests with finger-prick capillary blood, provided a venous blood sample for laboratory analysis, and completed a questionnaire regarding their NHS role and COVID-19 symptoms, onset and duration (see online supplementary table ii : flow of participants). Participants were asked to rate their illness as asymptomatic, mild, moderate or severe, based on its effect on daily life, and record symptoms based on multiple choice tick box response. Baseline characteristics are shown in table 1 and in the supplement.
The LFIA self-tests were performed using instructions specific to each device (see online supplementary table i) observed by a member of the study team. Results were recorded at the times specified in the product insert. Participants were asked to grade intensity of the result band(s) from 0 (negative) to 6 according to a standardised scoring system on a visual guide (see online supplementary figure ii). Invalid tests were repeated. A photograph of the completed test was emailed to the study team.
The first 77 participants enrolled to the study all used the same device. Subsequent participants used different LFIAs according to the study site attended (i.e consecutive allocation). As new LFIAs became available, participants were invited for a second visit to perform an alternative LFIA. A simultaneous venous sample for laboratory analysis was taken at all visits.
To assess concordance, each finger-prick self-test in the clinic was performed with the same participant’s serum in the laboratory. Test evaluations were conducted according to manufacturer’s instructions, by a technician blinded from the clinic result or patient details. Any invalid tests in the laboratory were repeated. Initially, scoring was performed independently by two individuals, but this practice ceased after inter-rater scoring was found to be almost perfect by 7-point categorical score (0–6) (kappa=0.81)14 and perfect on binary outcome (positive/negative) (online supplementary table 4).
Given uncertainties over the proportion of individuals who develop antibodies with non-hospitalised disease, additional serological testing was performed with two laboratory ELISAs: spike protein ELISA (S-ELISA) and a hybrid spike protein receptor binding domain double antigen bridging assay (hybrid DABA). Both ELISAs were shown to be highly specific. Details of these methodologies and their prior specificity testing are available in the supplementary section. Sensitivity of each LFIA in clinic and laboratory was assessed versus PCR-confirmed cases, versus S-ELISA and versus hybrid DABA.
Sample size for individual tests was calculated using exact methods for 90% power and a significance level α=0.05 (one sided). To detect an expected sensitivity of 90% with a minimal acceptable lower limit of 80%, a sample size of 124 was targeted. For specificity, a sample size of 361 is required based on an expected specificity of 98% and a lower limit of 95%.
The primary outcome was the sensitivity and specificity of each rapid test. For sensitivity, tests were compared against two standards: (1) PCR-confirmed clinical disease (via swab testing) and (2) positivity in patients with either a positive S-ELISA and/or hybrid DABA in the laboratory.
LFIA performance was assessed with (1) finger-prick self-testing (participant interpretation); (2) finger-prick self-testing (trained observer interpretation); and (3) serum in the laboratory. Specificity of LFIAs was evaluated against the known negative samples, with all positives counting as false positives. The analysis included all available data for the relevant outcome and are presented with the corresponding binomial exact 95% CI.
Positive predictive value (PPV) and negative predictive value (NPV) are calculated for a range of population seroprevalence (from 0.1% to 20%). For the purposes of this calculation, we use LFIA sensitivity scores with serum in laboratory (rather than fingerprick) to ensure sample consistency with the prepandemic sera used for specificity analysis.
For comparison of individual test performance between clinic and laboratory, we compare cases where paired results from an individual were available from both settings. We calculate sensitivities and 95% CI and test differences using the McNemar test for dependent groups. Agreement between the testing methods was assessed using the Kappa statistic. Interpretation of kappa values is as follows: <0, poor agreement; 0.00–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and >0.8, almost perfect agreement.14
All data were analysed using Stata (V.14.2, StataCorp, Texas, USA), and a p value<0.05 was considered significant.
Patient and public involvement
As part of the REACT programme, there has been extensive input into the study from a patients’ panel, identified through the Patient Experience Research Centre (PERC) of Imperial College and IPSOS/MORI. This has included feedback around study materials, methods, questionnaires and extensive usability testing of LFIAs through patient panels. User-expressed difficulties interpreting results motivated us to investigate agreement between self-reported and clinician-reported results. Usability data from this public outreach will be published in an additional study. Results of the study, once published, will be disseminated to Imperial College Healthcare NHS staff.
We assessed LFIA sensitivity on sera from 276 NHS workers with confirmed SARS-CoV-2 infection at a median 44 days from symptom onset (range 21–100 days). Seventy-two per cent reported no, mild or moderate symptoms, 28% reported severe symptoms and none were hospitalised (table 1). The most common symptoms described were lethargy (78%), loss of smell (66%), fever (61%), myalgia (61%) and headache (61%) (online supplementary table iii). Less than half reported persistent cough (46%) or dyspnoea (41%). Median symptom duration was 13 days.
Evidence of antibody response was found in 94.5% (95% CI 91.4% to 96.8%) sera assayed using the S-ELISA, 94.8% (95% CI 91.6% to 97.1%) on hybrid DABA, and 95.2% (95% CI 92.2% to 97.3%) using a composite of the two (table 2). Agreement between the two laboratory ELISAs was very high (online supplementary figure i). Seven of 11 LFIAs assessed with serum detected less than 85% of samples positive on either ELISA (<85% sensitivity vs laboratory standard). Four LFIAs detected >85% positive sera. The most sensitive test identified antibodies in 93% (95% CI 86.3% to 96.5%) of positive samples from composite ELISA testing.
Of the five LFIAs tested in laboratory and clinic, sensitivity of two of the tests was reduced in a clinical setting using finger-prick self-testing, giving positive results for 21.9% (95% CI 13.1% to 33.1%) (80% in laboratory) and 61.2% (95% CI 46.2% to 74.8%) (71% in laboratory) of individuals whose sera tested positive with the ELISAs (figure 1). To explore whether this discrepancy was due to sample type (serum vs blood), or influenced by test operator (participant vs laboratory technician), we also tested four of the LFIAs with whole blood in laboratory (online supplementary table iv). The least sensitive test was significantly inferior with whole blood (57.1% (95% CI 45.4% to 68.4%)) versus composite of laboratory ELISAs than with serum (79.8% (95% CI 70.2% to 87.4%)), but the other three LFIAs were broadly similar with both whole blood and serum.
The two LFIAs that showed higher sensitivity with serum detected 95.6% (95% CI 84.9% to 99.5%) and 84.4% (95% CI 70.5% to 93.5%) composite laboratory ELISA positives from finger-prick self-testing in clinic.
Findings from the matched clinic and laboratory results are presented in table 3. Concordance between LFIA performance in clinic, with finger prick, and in laboratory, with serum, on the same participants, was variable, with three tests showing ‘moderate’ agreement (kappa 0.41, 0.54, 0.56), according to Landis and Koch interpretation,14 one showing ‘fair’ agreement (kappa 0.34) and the other only ‘slight’ (kappa 0.13) (table 3). Of the tests performed in the clinic, results reported by participants were consistent with those reported by a trained observer in four out of the five LFIAs. In one LFIA, observer-read positive results were frequently reported as negative by study participants.
Specificity was high for all LFIAs assessed (table 2), ranging from 97.2% to 99.8% in phase I and from 97.8% to 99.8% in phase II. For the purposes of this evaluation, in the LFIAs that had separate IgM and IgG bands, IgM alone was counted as a negative result. Counting IgM alone (without IgG) as a positive result made no difference in performance for most LFIAs, with the exception of the Fortress and Biomerica. In both these tests, specificity was reduced to 96% when IgM counted as positive.
PPV (probability that a positive test result is a true positive) was highest for the LFIAs with highest specificity and fell below 85% at 10% seroprevalence for two of the LFIAs tested in phase I (Menarini and Biosure/Mologic). NPV varied little between tests (online supplementary figure iv).
Any invalid tests were repeated. For one LFIA, 8 out of 508 (1.6%) results were invalid, two tests had 3 out of 503 (0.6%) invalid results, and the remaining six tests had no invalid results on specificity testing (table 3).
LFIAs offer an important tool for widespread community screening of immune responses to SARS-CoV-2. They have already been used for large regional and national seroprevalence surveys in the USA and Europe.15–17 However, to allow robust estimates of seroprevalence, a better understanding is needed of (1) the performance of LFIAs in the general population, where most infected patients have not been hospitalised (and may have lower antibody responses associated with asymptomatic or paucisymptomatic infection)18–20; (2) the performance of LFIAs in finger-prick self-testing; and (3) the reliability of LFIA user interpretation.
Specificity of the rapid tests was high. For six (of nine) LFIAs assessed, specificity exceeded 98% (the minimum standard recommended by MHRA for clinical use). All had sufficient specificity to be considered for seroprevalence studies. However, all 11 LFIAs assessed (in phase I and phase II) had lower sensitivity than reported in manufacturers’ instructions, in comparison with either PCR-confirmed cases or laboratory ELISAs.
Lower sensitivity than that reported by manufacturers could be explained by a number of factors. In contrast to previous studies,3 4 7 recruitment focused on non-hospitalised participants, the majority of whom did not have severe symptoms. Antibody responses in this group may be of lower titre.21 Of note, 5% of participants had no detectable antibody on either sensitive ‘in-house’ immunoassay. Therefore, positivity on these assays was used as a reference for comparison. Recruiting patients at least 21 days after symptoms may be expected to improve sensitivity.22 Median time from symptoms to recruitment here was 42 days. While it is possible that responses may be waning at this point, we did not see a difference in the mean strength of immune response in the ‘in house’ immunoassays with increasing time since symptom onset (online supplementary figure v). This provides some reassurance that antibody responses may be stable for up to 3 months, although this will be informed by emerging longitudinal data from individual patients.21 23
Instructions for all the kits in this study advise that they are suitable for use with whole blood or serum. Two of the kits additionally recommend finger-prick testing. In general, the sensitivity of tests was similar when comparing results from sera or whole blood in the laboratory with that from finger-prick blood in clinic. However, this was not uniformly the case and one test had significantly superior sensitivity with serum (80%) than with whole blood (57%) or finger prick (22%) (online supplementary table iv). Such sample discordance has also been described in other infections.24 25
Overall, there was good agreement between self-reported results and those reported by an observer. The exception was for one test which differs in its design from the other LFIAs. It has a cylindrical plastic housing surrounding the lateral flow strip within which very faint lines were common and sometimes not reported by participants. Inter-practitioner variation with this kit may have arisin because these results were not routinely read against a white card, which would normally be recommended. The data here support the use of the other tests for self-administration, and potentially others like them, if detailed instructions are provided. However, it should be noted that although many participants were healthcare workers (from a range of areas including both clinical and non-clinical staff), they may not be representative of the general population. Further work is underway to assess the tests with a study group better representing the general population.
It is not possible to generalise these findings to all LFIAs, particularly as manufacturers continue to develop better assays and housings. However, these results emphasise the need to evaluate new tests in the population of intended use and demonstrate that laboratory performance cannot be assumed to be a surrogate for finger-prick testing.
In summary, this study describes a systematic approach to clinical testing of commercial LFIA kits. Based on a combination of kit usability, high specificity (98.6% (95% CI 97.1% to 99.4%)), moderate sensitivity (84% with fingerprick (95% CI 70.5% to 93.5%), 88% with serum (95% CI 83.3% to 91.2%)), high PPV (87% (95% CI 76.9% to 93.5%)), moderate sample concordance (kappa 0.56 (95% CI 0.25% to 0.86%)) and availability for testing at scale, the Fortress test was selected for a further validation study in over 5000 police force personnel (REACT Study 4) and use in a large, nationally representative seroprevalence study. The REACT seroprevalence study commenced in England in June 2020. Further analysis of additional LFIAs from phase II will be used to inform subsequent rounds of seroprevalence studies, as test performance continues to improve.
We would like to thank all the participants who volunteered for finger-prick testing to help with this study. We extend our gratitude to Margaret-Anne Bevan, Helen Stockmann, Billy Hopkins, Miranda Cowen, Norman Madeja, Nidhi Gandhi, Vaishali Dave, Narvada Jugnee and Chloe Wood, who ran the antibody testing clinics.
BF and JCB contributed equally.
WSB and GSC contributed equally.
Contributors All listed authors made substantial contributions to the conception or design of the work; or the acquisition, analysis or interpretation of data for the work; and drafting the work or revising it critically for important intellectual content; and final approval of the version to be published; and agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This work was supported by funding from The Department of Health and Social Care (DHSC) and NIHR Biomedical Research Centre of Imperial College NHS Trust. GC is supported by an NIHR Professorship. WB is the Action Medical Research Professor. AD is an NIHR senior investigator. DA is an Emeritus NIHR Senior Investigator. HW is an NIHR Senior Investigator. RC holds IPR on the hybrid DABA and this work was supported by UKRI/MRC grant (reference is MC_PC_19078). The sponsor is Imperial College London.
Disclaimer The funders had no role in the production of this manuscript.
Competing interests All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Patient consent for publication Not required.
Ethics approval The study’s conduct and reporting is fully compliant with the World Medical Association’s Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects. This work was undertaken as part of the REACT 2 study, with ethical approval from South Central–Berkshire B Research Ethics Committee (REC ref: 20/SC/0206; IRAS 283805). Samples for negative controls were taken from the Airwave study approved by North West–Haydock Research Ethics Committee (REC ref: 19/NW/0054).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available on reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information. Anonymised data with results of positive/negative individual tests can be provided on request through contact with study team. Email firstname.lastname@example.org; ORCID ID: 0000-0002-2659-544X.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.