Regression standardization with the R package stdReg

Sjölander, Arvid

doi:10.1007/s10654-016-0157-3

Regression standardization with the R package stdReg

METHODS
Published: 14 May 2016

Volume 31, pages 563–574, (2016)
Cite this article

European Journal of Epidemiology Aims and scope Submit manuscript

Arvid Sjölander ORCID: orcid.org/0000-0001-5226-6685¹

3625 Accesses
57 Citations
5 Altmetric
Explore all metrics

Abstract

When studying the association between an exposure and an outcome, it is common to use regression models to adjust for measured confounders. The most common models in epidemiologic research are logistic regression and Cox regression, which estimate conditional (on the confounders) odds ratios and hazard ratios. When the model has been fitted, one can use regression standardization to estimate marginal measures of association. If the measured confounders are sufficient for confounding control, then the marginal association measures can be interpreted as poulation causal effects. In this paper we describe a new R package, stdReg, that carries out regression standardization with generalized linear models (e.g. logistic regression) and Cox regression models. We illustrate the package with several examples, using real data that are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Generalized Estimating Equations

Interpreting Effects in Generalized Linear Modeling

Simple and Multiple Linear Regression

References

Rothman K, Greenland S, Lash T. Mod Epidemiol. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.
Google Scholar
Gail M, Byar D. Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biom J. 1986;28(5):587–99.
Article Google Scholar
Sjölander AAF. stdReg: Regression Standardization. R package version 0.1. 2016.
Dahlqwist E, Sjölander AAF. Model-based estimation of confounder-adjusted attributable fractions. R package version 0.1 2015.
Stefanski L, Boos D. The calculus of M-estimation. Am Stat. 2002;56(1):29–38.
Article Google Scholar
Breslow N, Day N. Statistical methods in cancer research. The analysis of case–control studies, vol. 1. Lyon: IARC/WHO; 1980.
Google Scholar
van der Laan M. Estimation based on case–control designs with known prevalence probability. Int J Biostat. 2008;4(1):a17.
Google Scholar
De Jong U, Breslow N, Hong G, Ewe J, Sridharan M, Shanmugaratnam K. Aetiological factors in oesophageal cancer in singapore chinese. Int J Cancer. 1974;13(3):291–303.
Article PubMed Google Scholar
Sjölander A, Vansteelandt S, Humphreys K. A principal stratification approach to assess the differences in prognosis between cancers caused by hormone replacement therapy and by other factors. Int J Biostat. 2010;6(1):a20.
Article Google Scholar
Breslow N. Discussion of the paper by D. R. Cox. J R Stat Soc B. 1972;34(2):216–7.
Google Scholar
Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.
Article PubMed Google Scholar
Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14(6):680–6.
Article PubMed Google Scholar
Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Progr Biomed. 2004;75(1):45–9.
Article Google Scholar
Robins J. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc. 2000;1999:6–10.
Google Scholar
Bai X, Tsiatis A, O’Brien S. Doubly-robust estimators of treatment-specific survival distributions in observational studies with stratified sampling. Biometrics. 2013;69(4):830–9.
Article PubMed Google Scholar
Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9):1393–512.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Nobels väg 12 A, 171 77, Stockholm, Sweden
Arvid Sjölander

Authors

Arvid Sjölander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arvid Sjölander.

Appendix 1: Asymptotic distribution for standardized measures

For generalized linear models, let $x_0$ and $x_1$ be fixed constants. Let $\psi =g\{\theta (x_0),\theta (x_1)\}$ be a function of $\theta (x_0)$ and $\theta (x_1)$, e.g. $\theta (x_1)-\theta (x_0)$. Define $\nu =\{\beta ,\theta (x_0),\theta (x_1),\psi \}$. The estimator $\hat{\nu }=[\hat{\beta },\hat{\theta }(x_0),\hat{\theta }(x_1),g\{\hat{\theta } (x_0),\hat{\theta } (x_1)\}]$ is an M-estimator [5] that solves the estimating equation

$$\begin{aligned} \sum _{i=1}^nU_{\nu ,i}(\nu )=\sum _{i=1}^n \left[ \begin{array}{l} U_{\beta ,i}(\beta )\\ U_{\theta (x_0),i}\{\beta ,\theta (x_0)\}\\ U_{\theta (x_1),i}\{\beta ,\theta (x_1)\}\\ U_{\psi ,i}\{\theta (x_0),\theta (x_1),\psi \} \end{array}\right] =0, \end{aligned}$$

where $U_{\beta ,i}(\beta )$ is the contribution to the maximum likelihood score function from subject i, $U_{\theta (x),i}\{\beta ,\theta (x)\}=\eta ^{-1}\{h(X=x,Z_i;\beta )\}-\theta (x)$ for $x=x_1$ and $x=x_0$, and $U_{\psi ,i}\{\theta (x_0),\theta (x_1),\psi \}=g\{\theta (x_0),\theta (x_1)\}-\psi$.

For Cox regression models, let $x_0$, $x_1$ and t be fixed constants. Let $\psi =g\{\theta (t,x_0),\theta (t,x_1)\}$ be a function of $\theta (t,x_0)$ and $\theta (t,x_1)$, e.g. $\theta (t,x_1)-\theta (t,x_0)$. Define $\nu =\{\beta ,{\varLambda }_0(t),\theta (t,x_0),\theta (t,x_1),\psi \}$. The estimator $\hat{\nu }=[\hat{\beta },\hat{{\varLambda }}_0(t),\hat{\theta }(t,x_0),\hat{\theta }(t,x_1),g\{\hat{\theta } (t,x_0),\hat{\theta } (t,x_1)\}]$ is an M-estimator [5] that solves the estimating equation

$$\begin{aligned} \sum _{i=1}^nU_{\nu ,i}(\nu )=\sum _{i=1}^n\left[ \begin{array}{l}U_{\beta ,i}(\beta )\\ U_{{\varLambda }_0(t),i}\{\beta ,{\varLambda }_0(t)\}\\ U_{\theta (t,x_0),i}\{\beta ,{\varLambda }_0(t),\theta (t,x_0)\}\\ U_{\theta (t,x_1),i}\{\beta ,{\varLambda }_0(t),\theta (t,x_1)\}\\ U_{\psi ,i}\{\theta (t,x_0),\theta (t,x_1),\psi \} \end{array}\right] =0, \end{aligned}$$

where $U_{\beta ,i}(\beta )$ is the contribution to the Cox partial likelihood score function from subject i, $U_{{\varLambda }_0(t),i}\{\beta ,{\varLambda }_0(t)\}$ is the contribution to the estimating function for Breslow’s estimator of the cumulative baseline hazard from subject i, $U_{\theta (t,x),i}\{\beta ,{\varLambda }_0(t),\theta (t,x)\}=\text {exp}[-{\varLambda }_0(t)\text {exp}\{h(X=x,Z_i;\beta )\}]-\theta (t,x)$ for $x=x_1$ and $x=x_0$, and $U_{\psi ,i}\{\theta (t,x_0),\theta (t,x_1),\psi \}=g\{\theta (t,x_0),\theta (t,x_1)\}-\psi$.

For both generalized linear models and Cox regression models it now follows from standard theory for M-estimators [5] that $n^{1/2}(\hat{\nu }-\nu )$ is asymptotically normal with mean 0 and variance given by the ‘sandwich formula’

$$\begin{aligned} {\varSigma }=E^{\prime}\left\{ \frac{\partial U_{\nu ,i}(\nu )}{\partial \nu }\right\} ^{-1}\text {var}\{U_{\nu ,i}(\nu )\}E\left\{ \frac{\partial U_{\nu ,i}(\nu )}{\partial \nu }\right\} ^{-1}. \end{aligned}$$

(5)

A consistent estimate of the variance of $\hat{\nu }$ is obtained by replacing $\nu$ in (5) with $\hat{\nu }$, and the population moments in (5) by their sample counterparts.

The sandwich formula assumes that $U_{\nu ,i}(\nu )$ and $U_{\nu ,i^{\prime}}(\nu )$ are independent, for $i\ne i^{\prime}$. When data are clustered, as in the example in ‘Standardization with generalized linear models’ section, we may define $U_{\nu ,i}(\nu )=\sum _{j=1}^{n_i}U_{\nu ,ij}(\nu )$, where $U_{\nu ,ij}(\nu )$ is the contribution to the estimating equation from subject j within cluster i, and $n_i$ is the total number of subjects in cluster i. Provided that the clusters are independent we thus have that $U_{\nu ,i}(\nu )$ and $U_{\nu ,i^{\prime}}(\nu )$ are independent as well, for $i\ne i^{\prime}$, so that the sandwich formula still applies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sjölander, A. Regression standardization with the R package stdReg . Eur J Epidemiol 31, 563–574 (2016). https://doi.org/10.1007/s10654-016-0157-3

Download citation

Received: 09 February 2016
Accepted: 30 April 2016
Published: 14 May 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10654-016-0157-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Regression standardization with the R package stdReg

Abstract

Access this article

Similar content being viewed by others

Generalized Estimating Equations

Interpreting Effects in Generalized Linear Modeling

Simple and Multiple Linear Regression

References

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Asymptotic distribution for standardized measures

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regression standardization with the R package stdReg

Abstract

Access this article

Similar content being viewed by others

Generalized Estimating Equations

Interpreting Effects in Generalized Linear Modeling

Simple and Multiple Linear Regression

References

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Asymptotic distribution for standardized measures

Appendix 1: Asymptotic distribution for standardized measures

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation