Review Article
Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review

https://doi.org/10.1016/j.jclinepi.2004.10.016Get rights and content

Abstract

Objective

To determine whether adjusting for confounder bias in observational studies using propensity scores gives different results than using traditional regression modeling.

Methods

Medline and Embase were used to identify studies that described at least one association between an exposure and an outcome using both traditional regression and propensity score methods to control for confounding. From 43 studies, 78 exposure–outcome associations were found. Measures of the quality of propensity score implementation were determined. The statistical significance of each association using both analytical methods was compared. The odds or hazard ratios derived using both methods were compared quantitatively.

Results

Statistical significance differed between regression and propensity score methods for only 8 of the associations (10%), κ = 0.79 (95% CI = 0.65–0.92). In all cases, the regression method gave a statistically significant association not observed with the propensity score method. The odds or hazard ratio derived using propensity scores was, on average, 6.4% closer to unity than that derived using traditional regression.

Conclusions

Observational studies had similar results whether using traditional regression or propensity scores to adjust for confounding. Propensity scores gave slightly weaker associations; however, many of the reviewed studies did not implement propensity scores well.

Introduction

In observational studies, patient assignment to the exposure of interest is not under the investigators' control. Therefore, there are likely to be important differences in confounding factors between the exposure groups, so any differences in outcome may be caused by the exposure itself, by differences in the measured and unmeasured confounders, or by both.

Multivariate regression is often used to lessen the bias caused by measured confounders, although it cannot adjust for unmeasured confounders; however, investigators frequently seek to construct parsimonious regression models using as few covariates as possible to predict the outcome, and interaction and nonlinear terms are rarely added. Achieving the best possible adjustment for bias may be sacrificed to improve the comprehensibility of the model. Furthermore, regression modeling may not alert investigators to situations where the confounders do not adequately overlap between exposure groups, threatening the validity of conclusions drawn from the data. This problem could be exaggerated when small differences in each of a large number of confounders produce marked separation between the exposure groups, and hence irresolvable selection bias.

Trying to circumvent these difficulties, Rosenbaum and Rubin [1] proposed “propensity scores” in 1983 as a method of controlling for confounding in observational studies. An individual's propensity score is defined as his or her conditional probability of a particular exposure versus another, given the observed confounders. It can be estimated with logistic regression, modeling the exposure as the dependent variable and the potential confounders as the independent variables. Because the model itself is not the focus of the study, it need not be parsimonious and easy to understand, so it can include numerous covariates (including those with statistically insignificant coefficients) and interactions and nonlinear terms. Two patients with the same propensity score have an equal estimated probability of exposure. If one was exposed and the other unexposed, the exposure allocation could be considered random, conditional on the observed confounders. Therefore, akin to a randomized trial, there is balance of the confounders between exposure groups after adjusting for the propensity score. Such balance can be assessed by comparing the distribution of confounders between exposure groups within propensity score strata or within cohorts matched on propensity score. The final propensity score model selected should maximize confounder balance between the groups. Inability to balance important confounders alerts investigators that the exposure groups are inadequately overlapping and that there is selection bias that cannot be resolved. Like regression modeling, propensity score methods cannot control for unknown confounders, but the sensitivity of the model to unknown confounders can be estimated [2].

Propensity scores can be applied in several ways. Exposed and unexposed cohorts matched on the propensity score can be formed, and the outcomes can be compared between them. Alternatively, patients can be stratified by the propensity score, and pooled stratum-specific estimates of the outcome can be computed. The former method results in well-balanced but smaller groups for comparison; the latter method retains a larger sample size, but the exposure groups are more heterogeneous within each stratum. In yet another application, the propensity score itself, representing a summary of all the other potential confounders, can be included with exposure as a covariate in a multivariate regression model predicting outcome, with or without inclusion of other potential confounders as additional covariates.

There has been an explosion of observational studies using propensity scores; however, whether this method yields different results from traditional regression modeling has not been determined.

Section snippets

Searching

We performed a systematic review of published observational studies that used both traditional regression and propensity score methodology to control for confounding. Citations indexed up to June 2003 were sought from both Medline and Embase. The search strategy selected articles containing “propensity scor$” as a textword, or those containing “propensity” as a textword and also indexed with the exploded MeSH subject headings “regression analysis” or “multivariate analysis.”

Selection

To be selected for

Trial flow

The search strategy (Fig. 1) identified 536 potentially relevant citations. Of the 130 retrieved for further consideration, 87 did not meet the selection criteria, for a net 43 studies included.

Study characteristics

Forty-three articles that reported both traditional regression and propensity score methods to control for confounding in the same exposure–outcome association were selected for review. The majority of studies were from the cardiology and cardiac surgery literature. Nearly two thirds were published in

Discussion

Braitman and Rosenbaum [46] discussed two strategies to adjust for overt biases in observational studies. One focuses on the relationship between prognostic variables and outcomes and models the response directly, using traditional regression methods. The other focuses on the relationship between prognostic variables and exposures without any consideration of outcome and uses propensity scores to emulate randomization. Propensity score methods offer theoretical advantages over traditional

Acknowledgments

We are grateful to Drs. Thérèse Stukel and Muhammad Mamdani for their helpful comments on this manuscript. Dr. Shah is supported by a Clinician-Scientist award and Dr. Austin by a New Investigator award from the Canadian Institutes of Health Research (CIHR). Dr. Laupacis is a Senior Scientist of the CIHR.

References (50)

  • R.L. Mehta et al.

    Nephrology consultation in acute renal failure: does timing matter?

    Am J Med

    (2002)
  • N. Moazami et al.

    Stage III non-small cell lung cancer and metachronous brain metastases

    J Thorac Cardiovasc Surg

    (2002)
  • W.O. Myers et al.

    Coronary Artery Surgery Study. Time to first new myocardial infarction in patients with mild angina and three-vessel disease comparing medicine and early surgery: a CASS registry study of survival

    Ann Thorac Surg

    (1987)
  • Y. Nakamura et al.

    Long-term nitrate use may be deleterious in ischemic heart disease: a study using the databases from two large-scale postinfarction studies

    Am Heart J

    (1999)
  • C.R. Regueiro et al.

    Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatment. A comparison of generalist and pulmonologist care for patients hospitalized with severe chronic obstructive pulmonary disease: resource intensity, hospital costs, and survival

    Am J Med

    (1998)
  • D.M. Shavelle et al.

    National Registry of Myocardial Infarction 2. Is there a benefit to early angiography in patients with ST-segment depression myocardial infarction? An observational study

    Am Heart J

    (2002)
  • M.H. Shishehbor et al.

    Association of educational status with heart rate recovery: a population-based propensity analysis

    Am J Med

    (2002)
  • S.C. Stamou et al.

    Stroke after conventional versus minimally invasive coronary artery bypass

    Ann Thorac Surg

    (2002)
  • U. Stenestrand et al.

    Early revascularisation and 1-year survival in 14-day survivors of acute myocardial infarction: a prospective cohort study

    Lancet

    (2002)
  • K. Hayashi et al.

    Effects of angiotensin-converting enzyme inhibitors on the treatment of anemia with erythropoietin

    Kidney Int

    (2001)
  • N.P. Jenkins et al.

    Beta-blockers are associated with lower C-reactive protein concentrations in patients with coronary artery disease

    Am J Med

    (2002)
  • P.R. Rosenbaum et al.

    The central role of the propensity score in observational studies for causal effects

    Biometrika

    (1983)
  • P.R. Rosenbaum et al.

    Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome

    J R Stat Soc

    (1983)
  • D. Abramov et al.

    The influence of cardiopulmonary bypass flow characteristics on the clinical outcome of 1820 coronary bypass patients

    Can J Cardiol

    (2003)
  • A.W. Chan et al.

    Early and sustained survival benefit associated with statin therapy at the time of percutaneous coronary intervention

    Circulation

    (2002)
  • Cited by (0)

    View full text