Elsevier

Atmospheric Environment

Volume 77, October 2013, Pages 598-606
Atmospheric Environment

Comparison of performance of land use regression models derived for Catalunya, Spain

https://doi.org/10.1016/j.atmosenv.2013.05.054Get rights and content

Highlights

  • We validate three overlapping land use regression models with external datasets.

  • Despite differences in sampling protocols validation R2s are reasonably high.

  • Predictions from different models at cohort addresses are relatively well correlated.

Abstract

Land use regression (LUR) models have become a popular tool to capture small-scale variations in air pollution exposures in epidemiological analyses, and have been developed with a variety of approaches with no clear indication of the most efficient and appropriate one. We evaluated the performance of the LUR model for NO2 developed for the European Study of Cohorts for Air Pollution Effects (ESCAPE) for Catalunya, Spain, compared to two other LUR models derived locally and independently for two cohort studies (INMA-Sabadell and REGICOR-Girona) in different sub-areas of the ESCAPE domain. We made use of sampling campaigns from the three studies as independent sets of measurements by which to evaluate each model. We compared changes in R2 and measures of bias when applying each model to its own dataset vs. the independent datasets from the other studies. The three studies differed principally in their scale of analysis (from urban area only for INMA-Sabadell to large province covering urban and rural areas for ESCAPE-Catalunya and REGICOR-Girona) and sampling protocol (e.g. site selection).

The LUR models performed similarly well in terms of their model adjusted R2 and cross-validation R2, ranging respectively from 0.62 and 0.63 to 0.75 and 0.73. The ESCAPE model performed well at the ESCAPE sites in Sabadell (R2 = 0.69) and moderately well at the ESCAPE sites in Girona province (R2 = 0.53). The ESCAPE model predicted the external sites less well: R2 were 0.51 and 0.36 in Sabadell and Girona province. The INMA-Sabadell and REGICOR-Girona models showed a similar pattern: the R2 for the INMA model dropped from 0.69 to 0.50 at INMA versus ESCAPE sites in Sabadell, while the R2 for the REGICOR model dropped from 0.63 to 0.44 for REGICOR versus ESCAPE sites in Girona province. The drop in performance for external sites is likely a combination of overfitting and differences in the sampling campaigns (years, site selection). Agreement between models was 53%–74% for the classification of low, medium, and high levels of air pollution predicted at cohort addresses. Despite the drop in performance, the three models still explained a substantial fraction of the variation at independent sites, especially in Sabadell, supporting their use in epidemiological studies.

Introduction

Near-road traffic related air pollution has been associated in many epidemiologic studies with a multitude of health effects ranging from reproductive outcomes to all-cause mortality (HEI, 2010). While these studies indicate that living close to busy roads is associated with adverse health effects, findings are not easily translated into policies and generalization from one city to another is limited as “proximity” does not necessarily reflect the same types and levels of pollution. Thus, alternative objective measures of local traffic-related pollution are warranted. Land Use Regression (LUR) modelling is often the chosen exposure assessment methodology to capture small-scale differences in air pollution concentrations, particularly from traffic sources, with medium implementation costs (Jerrett et al., 2005). Numerous epidemiological studies use LUR models to investigate the health effects of air pollution. Nitrogen dioxide (NO2) has most frequently been used as marker of near-road traffic related pollutants. A landmark activity of the ESCAPE project (European Study of Cohorts for Air Pollution Effects) is the adoption of uniform air pollution exposure assessment methodologies to assess the spatial variability of traffic-related air pollutants (Beelen et al., 2013, Cyrys et al., 2012). The same methods of measurements and modelling were applied to 36 study areas across Europe in sites where pre-existing cohort studies were available to provide health outcome data. NO2 was one of the pollutants chosen as an indicator of traffic-related exposures to develop LUR estimates to then assign exposures to the participants of local studies.

A major challenge in the ESCAPE LUR development was the selection of the feasible number of measurement sites for a large number of European study areas within a limited budget. As previously shown, the number of measurement sites used to develop LUR models also impacts the performance of those models (Basagana et al., 2012, Wang et al., 2013). These studies have shown that for models developed on a small number of sites, the model and cross-validation R2 overestimate predictive ability at independent test sites. This highlights the importance of validating models with independent datasets. A second – and partly related – challenge was the ambitious spatial coverage demanded all across Europe for regions of heterogeneous sizes, namely varying from intra-city to large regional domains. One example of a single large but geographically diverse region is Catalunya, Spain, where ESCAPE provided a LUR model for several cohort studies from different towns and cities. For two cohorts – namely REGICOR (Girona Heart Registry) in the region of Girona (Rivera et al., 2012), and INMA (Environment and Childhood) in the city of Sabadell (Aguilera et al., 2008) – independent exposure assessment studies were undertaken to build NO2 LUR models for both study areas prior to the start of ESCAPE. The existence of parallel measurement campaigns and LUR models offers an interesting opportunity to compare the performance of the regional ESCAPE LUR model for NO2 with LUR models derived locally for each of these cohorts based on a higher density of measurement sites.

Comparisons between LUR models developed for the same area are scant. Dijkema et al. (2011) compared two LUR models developed at different scales (large area and city-specific, encompassing the same core-area of Amsterdam) and using different monitoring campaigns. They found in both cases a drop in model performance in terms of adjusted R2 when applying the model to the other model's monitoring sites, and highlighted in their conclusions the importance of a sampling location strategy purposefully designed to reflect locations where models are to be applied.

The first objective of our study was to study the performance of the LUR model by comparing predictions with observed values at locations that were not used for model development.

Because the ultimate goal of LUR modelling in epidemiology is to assign estimates of air pollution exposure to participants of health studies, our second objective was to compare the different model predictions at the residential addresses of the two local cohort studies.

Section snippets

Methods

Three distinct LUR models were developed within the region of Catalunya for three different epidemiological studies: ESCAPE (Cyrys et al., 2012), INMA-Sabadell (Aguilera et al., 2008), and REGICOR-Girona (Rivera et al., 2012). The model domain and the methodology and concepts of model development varied for each of the study models. As shown in Fig. 1, the ESCAPE domain encompasses the domain of the other two studies, as it was developed to assess exposures for participants of three European

Comparison of model performances at measurement sites (analyses 1 and 2)

The ESCAPE model performed well, explaining 69% of the variability in NO2 concentration at ESCAPE sites in Sabadell (Fig. 2a). The R2 for the ESCAPE model dropped to 53% when it was applied to INMA sites in Sabadell (Fig. 2b). The ESCAPE model performed less well in Girona province, with an R2 of 0.51 for ESCAPE sites in the province and 0.36 for REGICOR sites (Fig. 2c and d). The R2 values obtained by applying the ESCAPE model at independent INMA and REGICOR sites in Sabadell and Girona were

Summary and comparison with other studies

We evaluated the performance of the LUR model developed for the Catalunya region in Spain for the ESCAPE project and two other LUR models based on independent monitoring campaigns derived locally for the sub-regions of the INMA-Sabadell and REGICOR-Girona cohort studies (both included in the ESCAPE domain). The ESCAPE model employed relatively few sites for a large spatial area in comparison to more dense monitoring campaigns and different site selection protocols for INMA-Sabadell and

Conclusion

Three land use regression models developed on different scales and with different philosophies on overlapping regions were evaluated comparing predictions to measurements made on their own and on the other models' sampling dataset. In all three models validation R2 derived from the independent dataset were lower than the leave-one-out cross-validation R2, but still provided reasonable predictions. The three models still explained a substantial fraction of the variation at independent sites,

References (23)

  • I. Aguilera et al.

    Estimation of outdoor NO(x), NO(2), and BTEX exposure in a cohort of pregnant women using land use regression modeling

    Environmental Science & Technology

    (2008)
  • Cited by (9)

    • The application of semicircular-buffer-based land use regression models incorporating wind direction in predicting quarterly NO<inf>2</inf> and PM<inf>10</inf> concentrations

      2015, Atmospheric Environment
      Citation Excerpt :

      Dispersion models can potentially reflect the temporal and spatial variation of pollutants, but they can not meet high resolution requirement (Gulliver et al., 2013; Su et al., 2008; Wu et al., 2011). LUR models have been viewed as promising approach and have been successfully applied in many studies (de Nazelle et al., 2013; Madsen et al., 2011; Johnson et al., 2010; Wang et al., 2012). LUR models can be used to estimate mean annual or quarterly pollutant concentrations at unmeasured locations by establishing a statistical relationship between pollutant measurements and potential predictor variables (Saraswat et al., 2013).

    • Child exposure to indoor and outdoor air pollutants in schools in Barcelona, Spain

      2014, Environment International
      Citation Excerpt :

      The levels of PM2.5, NO2, and UFP found at schools in Barcelona in both indoor and outdoor environments are higher than expected since PM2.5 and NO2 concentrations are 1.7 and 1.2 times higher than those found in the UB-PR station. Outdoor levels of NO2 at BREATHE schools can be considered to be representative of all schools in Barcelona considering that they agree with modelled data employing Land Use Regression from the ESCAPE project for all the schools in Barcelona (Cyrys et al., 2012; De Nazelle et al., 2013). The modelled data yielded an average of 50 μg·m−3, which is practically the same as the value obtained with measurements at the 39 BREATHE schools, and higher than the value at the reference station of UB-PR (41 μg·m− 3).

    View all citing articles on Scopus
    View full text