Skip to main content
Log in

Regression standardization with the R package stdReg

  • METHODS
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

When studying the association between an exposure and an outcome, it is common to use regression models to adjust for measured confounders. The most common models in epidemiologic research are logistic regression and Cox regression, which estimate conditional (on the confounders) odds ratios and hazard ratios. When the model has been fitted, one can use regression standardization to estimate marginal measures of association. If the measured confounders are sufficient for confounding control, then the marginal association measures can be interpreted as poulation causal effects. In this paper we describe a new R package, stdReg, that carries out regression standardization with generalized linear models (e.g. logistic regression) and Cox regression models. We illustrate the package with several examples, using real data that are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Rothman K, Greenland S, Lash T. Mod Epidemiol. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.

    Google Scholar 

  2. Gail M, Byar D. Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biom J. 1986;28(5):587–99.

    Article  Google Scholar 

  3. Sjölander AAF. stdReg: Regression Standardization. R package version 0.1. 2016.

  4. Dahlqwist E, Sjölander AAF. Model-based estimation of confounder-adjusted attributable fractions. R package version 0.1 2015.

  5. Stefanski L, Boos D. The calculus of M-estimation. Am Stat. 2002;56(1):29–38.

    Article  Google Scholar 

  6. Breslow N, Day N. Statistical methods in cancer research. The analysis of case–control studies, vol. 1. Lyon: IARC/WHO; 1980.

    Google Scholar 

  7. van der Laan M. Estimation based on case–control designs with known prevalence probability. Int J Biostat. 2008;4(1):a17.

    Google Scholar 

  8. De Jong U, Breslow N, Hong G, Ewe J, Sridharan M, Shanmugaratnam K. Aetiological factors in oesophageal cancer in singapore chinese. Int J Cancer. 1974;13(3):291–303.

    Article  PubMed  Google Scholar 

  9. Sjölander A, Vansteelandt S, Humphreys K. A principal stratification approach to assess the differences in prognosis between cancers caused by hormone replacement therapy and by other factors. Int J Biostat. 2010;6(1):a20.

    Article  Google Scholar 

  10. Breslow N. Discussion of the paper by D. R. Cox. J R Stat Soc B. 1972;34(2):216–7.

    Google Scholar 

  11. Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.

    Article  PubMed  Google Scholar 

  12. Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14(6):680–6.

    Article  PubMed  Google Scholar 

  13. Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Progr Biomed. 2004;75(1):45–9.

    Article  Google Scholar 

  14. Robins J. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc. 2000;1999:6–10.

    Google Scholar 

  15. Bai X, Tsiatis A, O’Brien S. Doubly-robust estimators of treatment-specific survival distributions in observational studies with stratified sampling. Biometrics. 2013;69(4):830–9.

    Article  PubMed  Google Scholar 

  16. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9):1393–512.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvid Sjölander.

Appendix 1: Asymptotic distribution for standardized measures

Appendix 1: Asymptotic distribution for standardized measures

For generalized linear models, let \(x_0\) and \(x_1\) be fixed constants. Let \(\psi =g\{\theta (x_0),\theta (x_1)\}\) be a function of \(\theta (x_0)\) and \(\theta (x_1)\), e.g. \(\theta (x_1)-\theta (x_0)\). Define \(\nu =\{\beta ,\theta (x_0),\theta (x_1),\psi \}\). The estimator \(\hat{\nu }=[\hat{\beta },\hat{\theta }(x_0),\hat{\theta }(x_1),g\{\hat{\theta } (x_0),\hat{\theta } (x_1)\}]\) is an M-estimator [5] that solves the estimating equation

$$\begin{aligned} \sum _{i=1}^nU_{\nu ,i}(\nu )=\sum _{i=1}^n \left[ \begin{array}{l} U_{\beta ,i}(\beta )\\ U_{\theta (x_0),i}\{\beta ,\theta (x_0)\}\\ U_{\theta (x_1),i}\{\beta ,\theta (x_1)\}\\ U_{\psi ,i}\{\theta (x_0),\theta (x_1),\psi \} \end{array}\right] =0, \end{aligned}$$

where \(U_{\beta ,i}(\beta )\) is the contribution to the maximum likelihood score function from subject i, \(U_{\theta (x),i}\{\beta ,\theta (x)\}=\eta ^{-1}\{h(X=x,Z_i;\beta )\}-\theta (x)\) for \(x=x_1\) and \(x=x_0\), and \(U_{\psi ,i}\{\theta (x_0),\theta (x_1),\psi \}=g\{\theta (x_0),\theta (x_1)\}-\psi\).

For Cox regression models, let \(x_0\), \(x_1\) and t be fixed constants. Let \(\psi =g\{\theta (t,x_0),\theta (t,x_1)\}\) be a function of \(\theta (t,x_0)\) and \(\theta (t,x_1)\), e.g. \(\theta (t,x_1)-\theta (t,x_0)\). Define \(\nu =\{\beta ,{\varLambda }_0(t),\theta (t,x_0),\theta (t,x_1),\psi \}\). The estimator \(\hat{\nu }=[\hat{\beta },\hat{{\varLambda }}_0(t),\hat{\theta }(t,x_0),\hat{\theta }(t,x_1),g\{\hat{\theta } (t,x_0),\hat{\theta } (t,x_1)\}]\) is an M-estimator [5] that solves the estimating equation

$$\begin{aligned} \sum _{i=1}^nU_{\nu ,i}(\nu )=\sum _{i=1}^n\left[ \begin{array}{l}U_{\beta ,i}(\beta )\\ U_{{\varLambda }_0(t),i}\{\beta ,{\varLambda }_0(t)\}\\ U_{\theta (t,x_0),i}\{\beta ,{\varLambda }_0(t),\theta (t,x_0)\}\\ U_{\theta (t,x_1),i}\{\beta ,{\varLambda }_0(t),\theta (t,x_1)\}\\ U_{\psi ,i}\{\theta (t,x_0),\theta (t,x_1),\psi \} \end{array}\right] =0, \end{aligned}$$

where \(U_{\beta ,i}(\beta )\) is the contribution to the Cox partial likelihood score function from subject i, \(U_{{\varLambda }_0(t),i}\{\beta ,{\varLambda }_0(t)\}\) is the contribution to the estimating function for Breslow’s estimator of the cumulative baseline hazard from subject i, \(U_{\theta (t,x),i}\{\beta ,{\varLambda }_0(t),\theta (t,x)\}=\text {exp}[-{\varLambda }_0(t)\text {exp}\{h(X=x,Z_i;\beta )\}]-\theta (t,x)\) for \(x=x_1\) and \(x=x_0\), and \(U_{\psi ,i}\{\theta (t,x_0),\theta (t,x_1),\psi \}=g\{\theta (t,x_0),\theta (t,x_1)\}-\psi\).

For both generalized linear models and Cox regression models it now follows from standard theory for M-estimators [5] that \(n^{1/2}(\hat{\nu }-\nu )\) is asymptotically normal with mean 0 and variance given by the ‘sandwich formula’

$$\begin{aligned} {\varSigma }=E^{\prime}\left\{ \frac{\partial U_{\nu ,i}(\nu )}{\partial \nu }\right\} ^{-1}\text {var}\{U_{\nu ,i}(\nu )\}E\left\{ \frac{\partial U_{\nu ,i}(\nu )}{\partial \nu }\right\} ^{-1}. \end{aligned}$$
(5)

A consistent estimate of the variance of \(\hat{\nu }\) is obtained by replacing \(\nu\) in (5) with \(\hat{\nu }\), and the population moments in (5) by their sample counterparts.

The sandwich formula assumes that \(U_{\nu ,i}(\nu )\) and \(U_{\nu ,i^{\prime}}(\nu )\) are independent, for \(i\ne i^{\prime}\). When data are clustered, as in the example in ‘Standardization with generalized linear models’ section, we may define \(U_{\nu ,i}(\nu )=\sum _{j=1}^{n_i}U_{\nu ,ij}(\nu )\), where \(U_{\nu ,ij}(\nu )\) is the contribution to the estimating equation from subject j within cluster i, and \(n_i\) is the total number of subjects in cluster i. Provided that the clusters are independent we thus have that \(U_{\nu ,i}(\nu )\) and \(U_{\nu ,i^{\prime}}(\nu )\) are independent as well, for \(i\ne i^{\prime}\), so that the sandwich formula still applies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sjölander, A. Regression standardization with the R package stdReg . Eur J Epidemiol 31, 563–574 (2016). https://doi.org/10.1007/s10654-016-0157-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-016-0157-3

Keywords

Navigation