Original Article
Analytical results in longitudinal studies depended on target of inference and assumed mechanism of attrition

https://doi.org/10.1016/j.jclinepi.2015.03.011Get rights and content

Abstract

Objectives

To compare methods for analysis of longitudinal studies with missing data due to participant dropout and follow-up truncated by death.

Study Design and Setting

We analyzed physical functioning in an Australian longitudinal study of elderly women where the missing data mechanism could either be missing at random (MAR) or missing not at random (MNAR). We assumed either an immortal cohort where deceased participants are implicitly included after death or a mortal cohort where the target of inference is surviving participants at each survey wave. To illustrate the methods a covariate was included. Simulation was used to assess the effect of the assumptions.

Results

Ignoring attrition or restricting analysis to participants with complete follow up led to biased estimates. Linear mixed model was appropriate for an immortal cohort under MAR but not MNAR. Linear increment model and joint modeling of longitudinal outcome and time to death were the most robust to MNAR. For a mortal cohort, inverse probability weighting and multiple imputation could be used, but care is needed in specifying dropout and imputation models, respectively.

Conclusion

Appropriate analysis methodology to deal with attrition in longitudinal studies depends on the target of inference and the missing data mechanism.

Introduction

What is new?

  • Researchers may not be aware that for some methods of analysis, such as likelihood-based methods, deceased participants are implicitly included after death (i.e., cohorts are immortal).

  • Using a longitudinal study of 12,432 Australian women aged over 70, we illustrated that appropriate methods for dealing with attrition depend on the target of inference and the missing data mechanism.

  • Linear-mixed models are appropriate for data missing at random for an immortal cohort but not for a mortal cohort, whereas multiple imputation and inverse probability weighting can be used for a mortal cohort.

  • Linear increment model and joint modeling of longitudinal outcome and time to death are both appropriate under missing not at random (MNAR) and an immortal cohort, but more research is needed for methods robust to MNAR for a mortal cohort.

The prevention and treatment of missing data in clinical trials was discussed recently in the New England Journal of Medicine [1], and an accompanying article suggests the issues raised also apply to observational studies [2]. The recommendations were that missing data should not be ignored; methods such as complete case analysis or single imputation should only be used in the minority of cases where a simple approach is justified; and model-based methods or methods that include appropriate weighting are generally preferred. These high profile publications reflect the increasing awareness of the importance of dealing appropriately with missing data in health research studies.

Longitudinal studies where participants are repeatedly measured over time are essential for estimating changes in health-related variables in populations because they estimate within-cohort change which is not possible with repeated cross-sectional studies. Within-cohort change may represent the natural progression of health in a population and provide valuable information for clinical management of patients and governmental policy making. For these purposes, it is critical that information provided is accurate and representative of the population of interest. In this regard, a significant limitation of many longitudinal studies is that a proportion of the participants fail to provide data at all waves of data collection. Nonresponse may be transitory, and the participant later returns to the study or it may be due to participant dropout or death, in which case further response is not provided. This latter type of nonresponse is the focus of this article.

Participant dropout leads to missing data, whereas participant death results in truncation of follow-up. This situation is particularly problematic if the participants who drop out or die are systematically different with respect to the outcomes of interest from the participants who remain. Common sense would suggest that participants who drop out or die are likely to have poorer health on average than participants who remain in the study. Therefore, ignoring the potential impact of this type of missing data is likely to lead to biased estimates, especially at later stages of the study where greater proportions of participants have either dropped out or died. In addition, the direction of the bias is likely to be toward showing better health over time than is true.

There is an important philosophical aspect of nonresponse due to death. How should deaths be taken into account when estimating mean levels of health-related variables over time? For example, how is a dead person's quality of life to be represented on a quality of life scale? Is their quality of life zero? Should their quality of life even be estimated in an analysis after they have died? If not then how should deaths be handled in the analysis? Clearly, there are no simple answers to these questions [3]. Moreover, the way in which deaths are handled in the analysis will depend on the study aims [4]. For example, are we describing the population as defined at the initiation of the study? Or are we describing the population as it existed at each stage of the study? The former scenario implies an immortal population and a cohort where participants who die continue to be implicitly included after death, whereas the latter scenario implies a mortal population where deceased participants are excluded after death leading to a cohort of diminishing size over time. Alternatively, we could report estimates separately for dropouts, deaths, and those participants who remain in the study.

Missing data can be classified into one of three types according to Rubin and Little [5]. Missing completely at random (MCAR) refers to situations where the missing data are a random subset of the total data and do not depend on observed or unobserved measurements. In this scenario, an analysis that ignores the missing data will lead to unbiased estimates and valid inference. Missing at random (MAR) occurs when the missing data are not a random sample of the total data; however, given the observed data, the missingness mechanism does not depend on the unobserved data. In this case, there is sufficient information in the data collected to enable unbiased estimates and valid inference provided an appropriate analysis such as a likelihood-based method is performed. Missing not at random (MNAR) is where the missing data are systematically different from the nonmissing data, and even after accounting for the observed information, the reason for the missing data still depends on the unobserved observations. In this situation, there is insufficient information contained in the observed data to ensure unbiased estimates and valid inference.

There have been many analytical methods proposed for dealing with missing data due to participant dropout or death in longitudinal studies [6], [7], [8]. These methods include: joint models [9], [10], inverse probability weighting (IPW) [11], [12], multiple imputation (MI) [13], [14], [15], and linear increment (LI) models [16], [17]. All these methods rely on specific assumptions being met to ensure valid inference. Although these assumptions often cannot be formally tested, their plausibility can be carefully considered in the context of the study, the data collected, and the aims of the analysis. In addition, sensitivity analyses and/or simulation studies can be performed to determine the robustness of the conclusions obtained.

The aim of this article was to illustrate and compare a number of methods of analysis for longitudinal studies with significant participant dropout or death. In Section 2, we describe the example data used for the analysis. The methods of analysis are then described in Section 3 (with the more technical details included in an Appendix at www.jclinepi.com). Simulation studies are described in Section 4; analysis results presented in Section 5, and we conclude with a discussion of the results in Section 6 where we also provide some general guidelines for presenting results of longitudinal studies with dropout and death.

Section snippets

Example data

Data were obtained from the Australian Longitudinal Study on Women's Health (ALSWH) [18]. The ALSWH began in 1996 when around 40,000 adult women in three age group cohorts were recruited by randomly sampling the nationally representative Medicare database. The study was approved by Ethics Committees at the University of Queensland and University of Newcastle. We use data from the older cohort of 12,432 participants born between 1921 and 1926. These women have been surveyed at approximately

Methods for comparison

The methods compared are described below. Methods 1 and 2 are often used in practice but would only be valid in a minority of cases. They have been included to illustrate the potential problem with using simplistic methods. Methods 3–5 assume an immortal cohort where responses are implicitly included for missing data after death. This may not be an appropriate assumption in many cases. Therefore, we have also included two methods that assume a mortal cohort (Methods 6 and 7). In the case of a

Simulation studies

We conducted simulation studies to evaluate each of the analytical methods based on MAR and MNAR missing data mechanisms. However, we did not include complete case analysis (method 2) in the evaluation because the direction of bias is clear from the analysis results alone. For the simulation studies, we identified the participants with complete data for all five waves for the variables of interest. This resulted in a total of 3,799 participants. For each simulation, we randomly chose a sample

Analysis results

Fig. 1 shows the decline in PF stratified by completers, dropouts, and deaths. As expected, the dropouts and participants who died had poorer PF than completers at each wave. Trends over time are similar for the groups; however, there appears to be a slightly steeper decline for the participants between the two surveys before death. Other notable features are that dropouts had higher PF than the participants who died and PF was worst for the participants who died earlier in the study. This

Discussion

The results obtained from our comparative analysis and simulation study suggest that ignoring missing data leads to biased estimates of PF. In addition, the apparent bias increased over time as greater proportions of participants had either dropped out or died. Restricting the analysis to just the completers also produced biased estimates at all waves. Bias was in the expected direction, that is, the estimates suggest better average PF than was actually the case. These results are consistent

Conclusion

The assumptions of any method of analysis need to be carefully considered before implementation in studies with missing data due to participant dropout. In addition, the method should appropriately reflect the study aims and target of inference when follow-up has been truncated by death. Sensitivity analysis can be useful to determine the robustness of results to varying plausible missing data mechanisms. We suggest the primary analysis should be performed using a method that is robust to the

References (30)

  • A. Donders et al.

    Review: a gentle introduction to imputation of missing data

    J Clin Epidemiol

    (2006)
  • S. Crawford et al.

    A comparison of analytic methods for non-random missingness of outcome data

    J Clin Epidemiol

    (1995)
  • R. Little et al.

    The prevention and treatment of missing data in clinical trials

    N Engl J Med

    (2012)
  • J. Ware et al.

    Missing data

    N Engl J Med

    (2012)
  • B. Chaix et al.

    Weighing up the dead and missing: reflections on inverse-probability weighting and principal stratification to address truncation by death

    Epidemiology

    (2012)
  • B. Kurland et al.

    Longitudinal data with follow up truncated by death: match the analysis to research aims

    Stat Sci

    (2009)
  • D. Rubin et al.

    Statistical analysis with missing data

    (2002)
  • R. Little

    Modelling the dropout mechanism in repeated measures studies

    J Am Stat Assoc

    (1995)
  • J. Matthews et al.

    Dropout in crossover and longitudinal studies: is complete case so bad?

    Stat Methods Med Res

    (2014)
  • J. Hogan et al.

    Tutorial in Biostatistics: handling drop-out in longitudinal studies

    Stat Med

    (2004)
  • R. Henderson et al.

    Joint modelling of longitudinal measurements and event time data

    Biostatistics

    (2000)
  • G. Verbeke et al.

    The analysis of multivariate longitudinal data: a review

    Stat Methods Med Res

    (2014)
  • S. Seaman et al.

    Review of inverse probability weighting for dealing with missing data

    Stat Methods Med Res

    (2013)
  • C. Dufouil et al.

    Analysis of longitudinal studies with death and drop-out: a case study

    Stat Med

    (2004)
  • J. Schafer

    Analysis of incomplete multivariate data

    (1997)
  • Cited by (22)

    • Education and wealth inequalities in healthy ageing in eight harmonised cohorts in the ATHLOS consortium: a population-based study

      2020, The Lancet Public Health
      Citation Excerpt :

      We also found the distributions of education and wealth levels to be similar across follow-up waves (appendix p 13). To account for potential missing-not-at-random data due to mortality, we fitted a joint model of longitudinal data on healthy ageing scores and survival data on all-cause mortality combining multilevel modelling and parametric Weibull survival regression.22 We present the results of joint models as hazard ratios (HRs) with 95% CIs.

    • The proportion of missing data should not be used to guide decisions on multiple imputation

      2019, Journal of Clinical Epidemiology
      Citation Excerpt :

      Bias reduction has also been found to be greater with increasing sample size for longitudinal data [22]. Finally, we have only investigated correctly specified MI—if the imputation model is incorrectly specified, the bias may not be completely removed or could even be larger than in the CCA [9,10,41]. In practice, the variables related to missingness are seldom known with certainty.

    View all citing articles on Scopus

    Funding: The Australian Longitudinal Study on Women's Health is funded by the Australian Department of Health. M.J. is funded by the Australian National Health and Medical Research Council (APP1000986) and G.D.M. is funded by an Australian Research Council Future Fellowship. The funding sources had no involvement in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

    Conflict of interest: None.

    View full text