Background: The role of tumour markers such as carbohydrate antigen (CA) 125, CA 15-3, CA 19-9 and CYFRA 21-1 (a fragment of cytokeratin 19) in differentiating malignant pleural effusions (MPE) from benign effusions is not yet clear.
Methods: After a systematic review of English language studies, sensitivity, specificity and other measures of accuracy of pleural concentrations of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 or their combinations in the diagnosis of MPE were pooled using random effects models. Summary receiver operating characteristic curves were used to summarise overall test performance.
Results: Twenty-nine studies met the inclusion criteria for the analysis. The summary estimates of the sensitivity and specificity of these tumour markers were as follows: CA 125, 0.48/0.85; CA 15-3, 0.51/0.96; CA 19-9, 0.25/0.96; CYFRA 21-1, 0.55/0.91 for diagnosing MPE. The estimated summary receiver operating characteristic curves showed that the performance of pleural CA 125 and CA 19-9 measurement in the diagnosis of MPE was limited, whereas that of CA 15-3 and CYFRA 21-1 was better. When two or more of the above four tumour markers were combined, or combined with carcinoembryonic antigen, the sensitivity and specificity were all increased to different extents.
Conclusions: The current evidence does not recommend using one tumour marker alone for the diagnosis of MPE, but the combination of two or more tumour markers seems to be more sensitive. The results of tumour marker assays should be interpreted in parallel with clinical findings and the results of conventional tests.
Statistics from Altmetric.com
Pleural effusions may occur in patients suffering from physical trauma or systemic disorders such as infection, inflammation or cancer.1 Malignancy is one of the main causes of pleural effusions, and >90% of malignant pleural effusions (MPE) are due to metastatic disease.2 Carcinoma of any organ can metastasise to the pleura, but the most frequent are lung and breast carcinomas and lymphomas, less frequently digestive and ovary carcinomas.3 It is important to elucidate their precise aetiologies to differentiate MPE from benign effusions. Cytological examination is a standard method for the diagnosis of MPE. Although repeated thoracenteses can increase the sensitivity of cytology, it is typically only 50–70%.2 Blind pleural biopsy can be performed in addition. However, among cytology-negative cases, only 7–13% are proved to be MPE by an additional biopsy.4 5
To improve the diagnosis of MPE, a number of tumour markers have been intensively evaluated but the search for a highly accurate tumour marker in pleural fluid that reliably confirms MPE has so far been fruitless.6 Carcinoembryonic antigen (CEA) is the most common marker to have been studied extensively and has been found to be of diagnostic significance. Carbohydrate antigen (CA) 125 is a tumour-associated antigen commonly seen in ovarian carcinoma and is used to assess the response to chemotherapy and for early detection of relapse.7 Using immunostaining of cells in pleural fluid specimens with anti-CA 15-3 antibody, the sensitivity of CA 15-3 was 91% for breast carcinoma and 80% for all adenocarcinomas and the specificity was 94% for breast carcinoma and for all adenocarcinomas.8 CA 19-9 is a tumour antigen whose level increases particularly in gastrointestinal tumours. Molina and coworkers9 reported a high level of CA 19-9 in the serum of patients with lung cancer. CYFRA 21-1 is a fragment of cytokeratin 19 which provides a useful marker for epithelial malignancies, distinctly reflecting ongoing cell activity.10 Although the diagnostic accuracy of CA 125, CA 15-3, CA 19-9 or/and CYFRA 21-1 for MPE has been extensively studied, the exact roles of these detections remain controversial. We performed a meta-analysis to establish the overall diagnostic accuracy of pleural CA 125, CA 15-3, CA 19-9, CYFRA 21-1 and CEA (either singly or in combination) for MPE.
Search strategy and study selection
Embase, Ovid, Web of Science, the Cochrane database and Medline (using PubMed as the search engine) were searched to identify suitable studies up to 30 December 2006; no start date limit was applied. Articles were also identified by use of the related articles function in PubMed. References of articles identified were also searched manually. The search terms were “tumour marker”, “carbohydrate antigen 125/CA 125”, “carbohydrate antigen 15-3/CA 15-3”, “carbohydrate antigen 19-9/CA 19-9”, “fragment of cytokeratin 19/CYFRA 21-1”, “lung cancer”, “malignant pleural mesothelioma”, “pleural effusion/pleural fluid”, “sensitivity and specificity” and “accuracy”. Although no language restrictions were imposed initially, for the full-text review and final analysis our resources only permitted review of English articles. Conference abstracts and letters to journal editors were excluded because of the limited data presented in them.
A study was included in the meta-analysis if it provided CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 values for both sensitivity and specificity of the diagnosis of MPE. Only studies including at least 10 pleural fluid specimens were selected since very small studies may be vulnerable to selection bias. Publications with evidence of a possible overlap of patients with other studies were discussed by QLL, HZS and XJQ and only the best quality study was used. Two reviewers (QLL and HZS) independently judged study eligibility while screening the citations. Disagreements were resolved by consensus.
Data extraction and quality assessment
The final set of English articles was assessed independently by two reviewers (QLL and HZS). The reviewers were blinded to publication details and disagreements were resolved by consensus. Data retrieved from the reports included author, publication year, participant characteristics, test methods, sensitivity and specificity data, cut-off value and methodological quality.
The methodological quality of the studies was assessed using guidelines published by the STARD (standards for reporting diagnostic accuracy, maximum score 25) initiative11 (ie, guidelines that aim to improve the quality of reporting in diagnostic studies) and the QUADAS (quality assessment for studies of diagnostic accuracy, maximum score 14) tool12 (ie, appraisal by use of empirical evidence, expert opinion and formal consensus to assess the quality of primary studies of diagnostic accuracy). In addition, for each study the following characteristics of study design were retrieved: (1) cross-sectional design (versus case-control design); (2) consecutive or random sampling of patients; (3) blinded (single or double) interpretation of determination and reference standard results; (4) prospective data collection. If no data on the above criteria were reported in the primary studies, the information was requested from the authors. If the authors did not respond, the “unknown” items were treated as “No”.
Standard methods recommended for meta-analyses of diagnostic test evaluations were used.13 Analyses were performed using the following statistical software programs: Stata Version 8.2 (Stata Corporation, College Station, Texas, USA); Meta-Test Version 0.6 (New England Medical Center, Boston, Massachusetts, USA) and Meta-DiSc for Windows (XI Cochrane Colloquium; Barcelona, Spain). The following measures of test accuracy were computed for each study: sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and diagnostic odds ratio (DOR).
The analysis was based on a summary receiver operating characteristic (SROC) curve.13 14 The sensitivity and specificity for the single test threshold identified for each study were used to plot an SROC curve14 15 (see statistical methods given in the Appendix available online only). A random-effects model was used to calculate the average sensitivity, specificity and the other measures across studies.16 17
The χ2 and Fisher exact tests were used to detect statistically significant heterogeneity across the studies. To assess the effects of STARD and QUADAS scores on the diagnostic ability of CA 125, CA 15-3, CA 19-9, and CYFRA 21-1, we included them and the study design characteristics as covariates in univariate meta-regression analysis (inverse variance weighted). The relative DOR (RDOR) was calculated according to standard methods to analyse the change in diagnostic precision in the study per unit increase in the covariate.18 19 Since publication bias is of concern for meta-analyses of diagnostic studies, we tested for the potential presence of this bias using funnel plots and the Egger test.20
After independent review, 59 non-English publications were excluded from the meta-analysis (publication list available on request); 46 publications dealing with pleural concentrations of CA 125, CA 15-3, CA 19-9 and/or CYFRA 21-1 for diagnosis of MPE were considered to be eligible for inclusion in the analysis (Appendix references A1–A46 available in the online supplement). Of these publications, two were excluded because they recruited fewer than 10 patients in one of the study groups (references A30 and A31), three were excluded because the tumour marker concentrations were determined only in cases of MPE (references A32–A34), eight were excluded because they included MPE and malignant peritoneal effusions as a single group (references A35–A42), one was excluded because it did not allow the calculation of sensitivity or specificity (reference A43), and three were excluded because the same authors published several reports on the same patients and only the best quality study was considered (references A44–A46). A total of 29 articles (A1–A29) were therefore available for analysis of the diagnosis accuracy of CA 125, CA 15-3, CA 19-9 and/or CYFRA 21-1 in MPE.
Quality of reporting and study characteristics
As shown in Appendix table 1 (available in online supplement), 9 of the 29 studies (31.0%) had a cross-sectional design; in 17 studies (58.6%) the samples were collected from consecutive patients; 14 studies (48.3%) reported blinded interpretation of tumour marker assays independent of the reference standard; and 13 studies (44.8%) had a prospective study design. The clinical characteristics, together with STARD and QUADAS scores of studies of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1, are outlined in Appendix tables 2, 3, 4 and 5, respectively (available in the online supplement). The average sample size of the CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 studies was 131 (range 41–416), 162 (range 39–416), 138 (range 61–336) and 127 (range 35–416), respectively. In the present meta-analysis, all patients with pleural malignancies were confirmed based on the conventional “gold standard”; MPE was confirmed by cytological study, pleural biopsy specimens or necropsy.
Figure 1 shows the forest plots of sensitivities and specificities for pleural concentrations of the four tumour markers in the diagnoses of MPE. The graphs of the SROC curves for the determinations of tumour markers showing true positive rates versus false positive rates from individual studies are indicated in fig 2. Pooled results of the diagnostic accuracy of each tumour marker in MPE are shown in table 1.
The sensitivity of pleural CA-125 measurement in the diagnosis of MPE varied between 0.17 and 1.00, and the specificity varied between 0.05 and 1.00; PLR was 5.96 (range 1.06–85.67), NLR was 0.54 (range 0.02–0.83) and DOR was 19.61 (range 2.14–731.00). For CA 15-3, the sensitivity ranged from 0.30 to 0.80, while specificity ranged from 0.75 to 1.00; PLR was 11.69 (range 2.05–151.80), NLR was 0.52 (range 0.22–0.70) and DOR was 24.74 (range 3.15–224.00). For CA 19-1, the sensitivity ranged from 0.13 to 0.89, while specificity ranged from 0.73 to 1.00; PLR was 10.42 (range 1.42–58.50), NLR was 0.70 (range 0.11–0.87) and DOR was 19.88 (range 1.69–348.50). For CYFRA 21-1, the sensitivity ranged from 0.20 to 0.91, while specificity ranged from 0.08 to 1.00; PLR was 6.55 (range 0.99–103.43), NLR was 0.43 (range 0.10–0.1.18) and DOR was 16.24 (range 0.83–130.02).
As shown in table 1, the Q values for sensitivity, specificity, PLR, NLR and DOR of studies of all the tumour markers were high, with all p values <0.001 indicating significant heterogeneity between all studies.
The SROC curve and its area under the curve (AUC) present an overall summary of test performance and display the trade-off between sensitivity and specificity. In all 29 studies included in the meta-analysis, both sensitivity and specificity were indicated directly in each publication and a single optimal cut-off value was selected by all investigators and was reported in their publications, respectively. The meta-analysis showed that the mean values of the maximum joint sensitivity and specificity of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were 0.81, 0.68, 0.72 and 0.76, respectively, and their mean AUCs were 0.88, 0.73, 0.78 and 0.83, respectively, indicating that the overall accuracy was not as high as expected.
In the publications included in the meta-analysis, some studies evaluated the simultaneous determination of two or more pleural tumour markers in the diagnosis of MPE. The pooled results of the diagnostic accuracy of a combination of two or more tumour markers of CA 125, CA 15-3, CA 19-9, CYFRA 21-1 and CEA are shown in table 2. The results indicate that some combinations of tumour markers have a greater diagnostic role than one tumour marker alone.
Multiple regression analysis and publication bias
The scores of both STARD and QUADAS were used in the meta-regression analysis to assess the effect of study quality on RDOR of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 in the diagnosis of MPE. Table 3 shows the findings of the meta-regression analysis of the RDOR (dependent variable) between studies with higher and lower study quality scores. CA 125 assays of higher quality (STARD ⩾13) produced RDOR that were not significantly higher than studies of lower quality (STARD <13). Also, studies with QUADAS ⩾10 did not have a better performance than those with QUADAS <10. Similarly, studies of CA 15-3, CA19-9 and CYFRA 21-1 of higher quality did not have a better test performance than those of lower quality. Differences in studies with or without blind design, cross-sectional, consecutive/random and prospective design did not reach statistical significance (data not shown). These results indicate that the study quality and design did not substantially affect the accuracy of pleural tumour markers in the diagnosis of MPE.
Evaluation of publication bias showed that the Egger tests for studies of CA 125 (p = 0.014), CA 15-3 (p = 0.015), CA 19-9 (p = 0.009) and CYFRA 21-1 (p = 0.043) in the diagnosis of MPE were all significant. Four funnel plots for publication bias show some asymmetry (see fig 1 in Appendix available online only). These results indicate a potential for publication bias.
Making a differential diagnosis between MPE and non-MPE is a critical clinical problem and conventional tests are not always.21 Determination of tumour markers in pleural fluid has been proposed as an alternative non-invasive way of establishing a diagnosis of MPE.6
The overall specificity of CA 15-3, CA 19-9 and CYFRA 21-1, but not CA 125, was more than 0.90. The summary estimate of the sensitivities for the four tumour markers were, however, all quite low and were more variable than the specificity. These data suggest a potential role for determination of these tumour markers in confirming (ruling in) MPE. However, these tests maximise specificity at the cost of sensitivity, and this trade-off has significant clinical implications. By contrast with the higher specificity, these tumour markers had low sensitivities that were not sufficiently low to exclude non-MPE when the pleural tumour marker concentrations are lower than the cut-off values. Negative tests do not therefore mean absence of MPE, and patients with negative tumour marker results have a fairly high chance of having MPE.
The SROC curve presents a global summary of test performance and shows the trade-off between sensitivity and specificity. As a global measure of test efficacy, we used the maximum joint sensitivity and specificity—the point of intersection of the SROC curve with a diagonal line from the left upper corner to the right lower corner of the SROC space—which corresponds to the highest common value of sensitivity and specificity for the test.14 This point does not indicate the only (or even the best) combination of sensitivity and specificity for a particular clinical setting, but it represents an overall measure of the discriminatory power of a test. Our data showed that the values of the maximum joint sensitivity and specificity of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were 0.81, 0.68, 0.72 and 0.76, respectively. On the other hand, their mean AUCs were 0.88, 0.73, 0.78 and 0.83, respectively. All these data suggest that the overall accuracy of tumour markers in diagnosing MPE are not as high as expected.
The DOR is a single indicator of test accuracy22 that combines the data from sensitivity and specificity into a single number. The DOR of a test is the ratio of the odds of a positive test result in a subject with the disease relative to the odds of a positive test result in a subject without the disease. The value of a DOR ranges from 0 to infinity, with higher values indicating better discriminatory test performance (higher accuracy). A DOR of 1.0 indicates that a test does not discriminate between patients with the disorder and those without it. DOR values <1.00 suggest improper test interpretation (a greater proportion of negative test results in the group with disease). In the present meta-analysis, we found that the mean DOR values for CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were 19.61, 24.74, 19.88 and 16.24, respectively, indicating that, although not as good as expected, measurement of these four tumour markers could be helpful in the diagnosis of MPE.
Since the SROC curve and the DOR are not easy to interpret and use in clinical practice,23 and since the likelihood ratios are considered more clinically meaningful,23 24 we also presented both PLR and NLR as our measures of diagnostic accuracy for the tumour markers. Likelihood ratios of >10 or <0.1 generate large and often conclusive shifts from pre-test to post-test probability (indicating high accuracy).24 Out data showed that overall PLR values of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were 5.96, 11.69, 10.42 and 6.55, respectively, suggesting that patients with MPE have a near 6-fold higher chance of being CA 125 test-positive, a near 12-fold higher chance of being CA 15-3 test-positive, about a 10-fold higher chance of being CA 19-9 test-positive, and a near 7-fold higher chance of being CYFRA test-positive, respectively, compared with patients without MPE. On the other hand, the mean NLR values of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were 0.54, 0.52, 0.70 and 0.43, respectively, so if the assay results of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1 were negative, the probability that this patient has MPE is about 54%, 52%, 70% and 43%, respectively, which is too high to rule out MPE.
In addition to the four tumour markers analysed in the present meta-analysis, other biomarkers such as CEA,25 neuron-specific enolase,26 CA 549 and CA72-4 have been evaluated for their use in the diagnosis of MPE. In a recent meta-analysis we found that the summary estimates for CEA in the diagnosis of MPE were: sensitivity 0.54 (95% CI 0.52 to 0.55), specificity 0.94 (95% CI 0.93 to 0.95), PLR 9.52 (95% CI 6.97 to 13.01), NLR 0.49 (95% CI 0.44 to 0.54) and DOR 22.5 (95% CI 15.6 to 32.5) (unpublished results). In the present meta-analysis we found that the combination of two or more of CA 125, CA 15-3, CA 19-9 and CYFRA 21-1, as well as CEA, resulted in a greater sensitivity than that of any one of these tumour markers alone.
An exploration of the reasons for heterogeneity rather than computation of a single summary measure is an important goal of meta-analysis.27 The regression coefficients for the variables give a measure of the difference in diagnostic accuracy of tumour markers in the two groups, with positive coefficients indicating better discriminating power and negative coefficients corresponding to reduced discriminatory ability. In our meta-analysis, for all four tumour makers analysed, both STARD and QUADAS scores were used in the meta-regression analysis to assess the effect of study quality on RDOR. We did not find that studies of higher quality had a better test performance than those of lower quality, although we found a significant heterogeneity for sensitivity, specificity, PLR, NLR and DOR between these studies. We also noted that differences for studies with or without blinded design, cross-sectional, consecutive/random and prospective design did not reach statistical significance.
Our meta-analysis had some limitations. First, the exclusion of conference abstracts, letters to journal editors and non-English language studies may have led to publication bias, an inflation of accuracy estimates due to preferential acceptance of papers reporting favourable results, and the potential for publication bias in studies included in the present meta-analysis was observed. Second, we did not address issues such as cost-effectiveness, reliability, the incremental benefit of adding tumour marker assays to other tests, and the net effect of tumour marker assays on clinical care and patient outcomes. Also, because of lack of required data reported in the original publications, we could not analyse the effect of factors such as laboratory infrastructure, expertise with tumour marker assay technology, patient spectrum and setting on the accuracy of the tumour marker measurements.
The accuracy of tumour marker determinations for MPE seems to be similar to that of conventional tests such as cytological examination with a high specificity and low sensitivity. This similarity might make tumour markers less useful in practice because they do not have test properties that complement the properties of conventional tests. Based on the findings in our meta-analysis, we cannot recommend using any one tumour marker alone for the diagnosis of MPE. However, it should be pointed out that, to date, there are insufficient related studies to evaluate the diagnostic accuracy of the combination of two or more tumour markers in MPE.
In conclusion, current evidence suggests that CA 15-3, CA 19-9 and CYFRA 21-1 are highly specific but insufficiently sensitive to diagnose MPE, and the combination of two or more tumour markers seems to be more sensitive. Based on our data, we think that every patient with unexplained pleural effusion should undergo thoracocentesis with measurement of tumour markers. Patients with negative cytological examinations and positive tumour marker levels should undergo further invasive procedures, and management decisions should depend on positive cytological or biopsy results of the pleura.
The authors are grateful to the following authors who sent additional information on their primary studies: D Jurman, D Shitrit, F Alatas, H Satoh, K Shimokata, J M Porcel, M Miédougé, M Paganuzzi, N V Lyubimova, P Sthaneshwar, S M Ghayumi, V Villena, W C Su, W Dejsomritrutai.
Funding: This study was supported in part by research grant 30660064 from National Natural Science Foundation of China, in part by Program NCET-04-0835 for New Century Excellent Talents in Chinese Universities, and in part by research grant 0639044 from Natural Science Foundation of Guangxi Zhuang Autonomous Zone, China.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.