Comparison of performance of land use regression models derived for Catalunya, Spain

doi:10.1016/j.atmosenv.2013.05.054

Atmospheric Environment

Volume 77, October 2013, Pages 598-606

https://doi.org/10.1016/j.atmosenv.2013.05.054 Get rights and content

Highlights

•
We validate three overlapping land use regression models with external datasets.
•
Despite differences in sampling protocols validation R²s are reasonably high.
•
Predictions from different models at cohort addresses are relatively well correlated.

Abstract

Land use regression (LUR) models have become a popular tool to capture small-scale variations in air pollution exposures in epidemiological analyses, and have been developed with a variety of approaches with no clear indication of the most efficient and appropriate one. We evaluated the performance of the LUR model for NO₂ developed for the European Study of Cohorts for Air Pollution Effects (ESCAPE) for Catalunya, Spain, compared to two other LUR models derived locally and independently for two cohort studies (INMA-Sabadell and REGICOR-Girona) in different sub-areas of the ESCAPE domain. We made use of sampling campaigns from the three studies as independent sets of measurements by which to evaluate each model. We compared changes in R² and measures of bias when applying each model to its own dataset vs. the independent datasets from the other studies. The three studies differed principally in their scale of analysis (from urban area only for INMA-Sabadell to large province covering urban and rural areas for ESCAPE-Catalunya and REGICOR-Girona) and sampling protocol (e.g. site selection).

The LUR models performed similarly well in terms of their model adjusted R² and cross-validation R², ranging respectively from 0.62 and 0.63 to 0.75 and 0.73. The ESCAPE model performed well at the ESCAPE sites in Sabadell (R² = 0.69) and moderately well at the ESCAPE sites in Girona province (R² = 0.53). The ESCAPE model predicted the external sites less well: R² were 0.51 and 0.36 in Sabadell and Girona province. The INMA-Sabadell and REGICOR-Girona models showed a similar pattern: the R² for the INMA model dropped from 0.69 to 0.50 at INMA versus ESCAPE sites in Sabadell, while the R² for the REGICOR model dropped from 0.63 to 0.44 for REGICOR versus ESCAPE sites in Girona province. The drop in performance for external sites is likely a combination of overfitting and differences in the sampling campaigns (years, site selection). Agreement between models was 53%–74% for the classification of low, medium, and high levels of air pollution predicted at cohort addresses. Despite the drop in performance, the three models still explained a substantial fraction of the variation at independent sites, especially in Sabadell, supporting their use in epidemiological studies.

Introduction

Near-road traffic related air pollution has been associated in many epidemiologic studies with a multitude of health effects ranging from reproductive outcomes to all-cause mortality (HEI, 2010). While these studies indicate that living close to busy roads is associated with adverse health effects, findings are not easily translated into policies and generalization from one city to another is limited as “proximity” does not necessarily reflect the same types and levels of pollution. Thus, alternative objective measures of local traffic-related pollution are warranted. Land Use Regression (LUR) modelling is often the chosen exposure assessment methodology to capture small-scale differences in air pollution concentrations, particularly from traffic sources, with medium implementation costs (Jerrett et al., 2005). Numerous epidemiological studies use LUR models to investigate the health effects of air pollution. Nitrogen dioxide (NO₂) has most frequently been used as marker of near-road traffic related pollutants. A landmark activity of the ESCAPE project (European Study of Cohorts for Air Pollution Effects) is the adoption of uniform air pollution exposure assessment methodologies to assess the spatial variability of traffic-related air pollutants (Beelen et al., 2013, Cyrys et al., 2012). The same methods of measurements and modelling were applied to 36 study areas across Europe in sites where pre-existing cohort studies were available to provide health outcome data. NO₂ was one of the pollutants chosen as an indicator of traffic-related exposures to develop LUR estimates to then assign exposures to the participants of local studies.

A major challenge in the ESCAPE LUR development was the selection of the feasible number of measurement sites for a large number of European study areas within a limited budget. As previously shown, the number of measurement sites used to develop LUR models also impacts the performance of those models (Basagana et al., 2012, Wang et al., 2013). These studies have shown that for models developed on a small number of sites, the model and cross-validation R² overestimate predictive ability at independent test sites. This highlights the importance of validating models with independent datasets. A second – and partly related – challenge was the ambitious spatial coverage demanded all across Europe for regions of heterogeneous sizes, namely varying from intra-city to large regional domains. One example of a single large but geographically diverse region is Catalunya, Spain, where ESCAPE provided a LUR model for several cohort studies from different towns and cities. For two cohorts – namely REGICOR (Girona Heart Registry) in the region of Girona (Rivera et al., 2012), and INMA (Environment and Childhood) in the city of Sabadell (Aguilera et al., 2008) – independent exposure assessment studies were undertaken to build NO₂ LUR models for both study areas prior to the start of ESCAPE. The existence of parallel measurement campaigns and LUR models offers an interesting opportunity to compare the performance of the regional ESCAPE LUR model for NO₂ with LUR models derived locally for each of these cohorts based on a higher density of measurement sites.

Comparisons between LUR models developed for the same area are scant. Dijkema et al. (2011) compared two LUR models developed at different scales (large area and city-specific, encompassing the same core-area of Amsterdam) and using different monitoring campaigns. They found in both cases a drop in model performance in terms of adjusted R² when applying the model to the other model's monitoring sites, and highlighted in their conclusions the importance of a sampling location strategy purposefully designed to reflect locations where models are to be applied.

The first objective of our study was to study the performance of the LUR model by comparing predictions with observed values at locations that were not used for model development.

Because the ultimate goal of LUR modelling in epidemiology is to assign estimates of air pollution exposure to participants of health studies, our second objective was to compare the different model predictions at the residential addresses of the two local cohort studies.

Section snippets

Methods

Three distinct LUR models were developed within the region of Catalunya for three different epidemiological studies: ESCAPE (Cyrys et al., 2012), INMA-Sabadell (Aguilera et al., 2008), and REGICOR-Girona (Rivera et al., 2012). The model domain and the methodology and concepts of model development varied for each of the study models. As shown in Fig. 1, the ESCAPE domain encompasses the domain of the other two studies, as it was developed to assess exposures for participants of three European

Comparison of model performances at measurement sites (analyses 1 and 2)

The ESCAPE model performed well, explaining 69% of the variability in NO₂ concentration at ESCAPE sites in Sabadell (Fig. 2a). The R² for the ESCAPE model dropped to 53% when it was applied to INMA sites in Sabadell (Fig. 2b). The ESCAPE model performed less well in Girona province, with an R² of 0.51 for ESCAPE sites in the province and 0.36 for REGICOR sites (Fig. 2c and d). The R² values obtained by applying the ESCAPE model at independent INMA and REGICOR sites in Sabadell and Girona were

Summary and comparison with other studies

We evaluated the performance of the LUR model developed for the Catalunya region in Spain for the ESCAPE project and two other LUR models based on independent monitoring campaigns derived locally for the sub-regions of the INMA-Sabadell and REGICOR-Girona cohort studies (both included in the ESCAPE domain). The ESCAPE model employed relatively few sites for a large spatial area in comparison to more dense monitoring campaigns and different site selection protocols for INMA-Sabadell and

Conclusion

Three land use regression models developed on different scales and with different philosophies on overlapping regions were evaluated comparing predictions to measurements made on their own and on the other models' sampling dataset. In all three models validation R² derived from the independent dataset were lower than the leave-one-out cross-validation R², but still provided reasonable predictions. The three models still explained a substantial fraction of the variation at independent sites,

References (23)

X. Basagana et al.
Effect of the number of measurement sites on land use regression models in estimating local air pollution
Atmospheric Environment
(2012)
R. Beelen et al.
Development of NO₂ and NO_x land use regression models for estimating air pollution exposure in 36 study areas in Europe – the ESCAPE project
Atmospheric Environment
(2013)
R. Beelen et al.
Comparison of the performances of land use regression modelling and dispersion modelling in estimating small-scale variations in long-term air pollution concentrations in a Dutch urban area
Atmospheric Environment
(2010)
D.J. Briggs et al.
A regression-based method for mapping traffic-related air pollution: application and testing in four contrasting urban environments
The Science of The Total Environment
(2000)
J. Cyrys et al.
Variation of NO₂ and NO_x concentrations between and within 36 European study areas: results from the ESCAPE study
Atmospheric Environment
(2012)
M. Johnson et al.
Evaluation of land-use regression models used to predict air quality concentrations in an urban area
Atmospheric Environment
(2010)
J.D. Marshall et al.
Within-urban variability in ambient air pollution: comparison of estimation methods
Atmospheric Environment
(2008)
M. Rivera et al.
Spatial distribution of ultrafine particles in urban settings: a land use regression model
Atmospheric Environment
(2012)
J.R. Stedman et al.
New high resolution maps of estimated background ambient NO_x and NO₂ concentrations in the U.K.
Atmospheric Environment
(1997)
R. Wang et al.
Temporal stability of land use regression models for traffic-related air pollution
Atmospheric Environment
(2013)

I. Aguilera et al.

Estimation of outdoor NO(x), NO(2), and BTEX exposure in a cohort of pregnant women using land use regression modeling

Environmental Science & Technology

(2008)

Cited by (9)

Spatio-temporal evaluation of air pollution using ground-based and satellite data during COVID-19 in Ecuador
2024, Heliyon
The concentration of gases in the atmosphere is a topic of growing concern due to its effects on health, ecosystems etc. Its monitoring is commonly carried out through ground stations which offer high precision and temporal resolution. However, in countries with few stations, such as Ecuador, these data fail to adequately describe the spatial variability of pollutant concentrations. Remote sensing data have great potential to solve this complication. This study evaluates the spatiotemporal distribution of nitrogen dioxide (NO₂) and ozone (O₃) concentrations in Quito and Cuenca, using data obtained from ground-based and Sentinel-5 Precursor mission sources during the years 2019 and 2020. Moreover, a Linear Regression Model (LRM) was employed to analyze the correlation between ground-based and satellite datasets, revealing positive associations for O₃ (R² = 0.83, RMSE = 0.18) and NO₂ (R² = 0.83, RMSE = 0.25) in Quito; and O₃ (R² = 0.74, RMSE = 0.23) and NO₂, (R² = 0.73, RMSE = 0.23) for Cuenca. The agreement between ground-based and satellite datasets was analyzed by employing the intra-class correlation coefficient (ICC), reflecting good agreement between them (ICC ≥0.57); and using Bland and Altman coefficients, which showed low bias and that more than 95% of the differences are within the limits of agreement. Furthermore, the study investigated the impact of COVID-19 pandemic-related restrictions, such as social distancing and isolation, on atmospheric conditions. This was categorized into three periods for 2019 and 2020: before (from January 1st to March 15th), during (from March 16th to May 17th), and after (from March 18th to December 31st). A 51% decrease in NO₂ concentrations was recorded for Cuenca, while Quito experienced a 14.7% decrease. The tropospheric column decreased by 27.3% in Cuenca and 15.1% in Quito. O₃ showed an increasing trend, with tropospheric concentrations rising by 0.42% and 0.11% for Cuenca and Quito respectively, while the concentration in Cuenca decreased by 14.4%. Quito experienced an increase of 10.5%. Finally, the reduction of chemical species in the atmosphere as a consequence of mobility restrictions is highlighted. This study compared satellite and ground station data for NO₂ and O₃ concentrations. Despite differing units preventing data validation, it verified the Sentinel-5P satellite's effectiveness in anomaly detection. Our research's value lies in its applicability to developing countries, which may lack extensive monitoring networks, demonstrating the potential use of satellite technology in urban planning.
The application of semicircular-buffer-based land use regression models incorporating wind direction in predicting quarterly NO<inf>2</inf> and PM<inf>10</inf> concentrations
2015, Atmospheric Environment
Citation Excerpt :
Dispersion models can potentially reflect the temporal and spatial variation of pollutants, but they can not meet high resolution requirement (Gulliver et al., 2013; Su et al., 2008; Wu et al., 2011). LUR models have been viewed as promising approach and have been successfully applied in many studies (de Nazelle et al., 2013; Madsen et al., 2011; Johnson et al., 2010; Wang et al., 2012). LUR models can be used to estimate mean annual or quarterly pollutant concentrations at unmeasured locations by establishing a statistical relationship between pollutant measurements and potential predictor variables (Saraswat et al., 2013).
Land use regression (LUR) models have proven to be a robust technique for predicting spatial distribution of pollutants with high resolution. Wind direction is an important factor affecting atmospheric environment quality. However, conventional LUR models have difficulties taking wind direction into consideration. This study put forward a semicircular-buffer-based (SCBB) LUR model to overcome this challenge. To assess the impact of wind direction on model performance, we set up two different LUR models for nitrogen dioxide (NO₂) and particulate matter (PM₁₀) in the urban area of Changsha, China. A location-allocation approach was used to identify sampling sites. Integrated 14-day mean concentrations of NO₂ and PM₁₀ were measured at 80 sites and 40 sites, respectively. Measured mean concentrations ranged from 17.0 to 75.7 for NO₂ and 34.7 to 118.7 μg/m³ for PM₁₀. Random samples of 75% of monitoring sites were used to the develop model and the remaining 25% of sites were retained for evaluation. Predictor variables were created in a geographic information system (GIS) and LUR models were developed with the most significant variables. The results showed SCBB LUR models had significantly higher R² values than traditional LUR models, supporting the feasibility of this new approach incorporating wind direction in the LUR model.
Child exposure to indoor and outdoor air pollutants in schools in Barcelona, Spain
2014, Environment International
Citation Excerpt :
The levels of PM2.5, NO2, and UFP found at schools in Barcelona in both indoor and outdoor environments are higher than expected since PM2.5 and NO2 concentrations are 1.7 and 1.2 times higher than those found in the UB-PR station. Outdoor levels of NO2 at BREATHE schools can be considered to be representative of all schools in Barcelona considering that they agree with modelled data employing Land Use Regression from the ESCAPE project for all the schools in Barcelona (Cyrys et al., 2012; De Nazelle et al., 2013). The modelled data yielded an average of 50 μg·m−3, which is practically the same as the value obtained with measurements at the 39 BREATHE schools, and higher than the value at the reference station of UB-PR (41 μg·m− 3).
Proximity to road traffic involves higher health risks because of atmospheric pollutants. In addition to outdoor air, indoor air quality contributes to overall exposure. In the framework of the BREATHE study, indoor and outdoor air pollution was assessed in 39 schools in Barcelona. The study quantifies indoor and outdoor air quality during school hours of the BREATHE schools. High levels of fine particles (PM_2.5), nitrogen dioxide (NO₂), equivalent black carbon (EBC), ultrafine particle (UFP) number concentration and road traffic related trace metals were detected in school playgrounds and indoor environments. PM_2.5 almost doubled (factor of 1.7) the usual urban background (UB) levels reported for Barcelona owing to high school-sourced PM_2.5 contributions: [1] an indoor-generated source characterised mainly by organic carbon (OC) from organic textile fibres, cooking and other organic emissions, and by calcium and strontium (chalk dust) and; [2] mineral elements from sand-filled playgrounds, detected both indoors and outdoors. The levels of mineral elements are unusually high in PM_2.5 because of the breakdown of mineral particles during playground activities. Moreover, anthropogenic PM components (such as OC and arsenic) are dry/wet deposited in this mineral matter. Therefore, PM_2.5 cannot be considered a good tracer of traffic emissions in schools despite being influenced by them. On the other hand, outdoor NO₂, EBC, UFP, and antimony appear to be good indicators of traffic emissions. The concentrations of NO₂ are 1.2 times higher at schools than UB, suggesting the proximity of some schools to road traffic. Indoor levels of these traffic-sourced pollutants are very similar to those detected outdoors, indicating easy penetration of atmospheric pollutants. Spatial variation shows higher levels of EBC, NO₂, UFP and, partially, PM_2.5 in schools in the centre than in the outskirts of Barcelona, highlighting the influence of traffic emissions. Mean child exposure to pollutants in schools in Barcelona attains intermediate levels between UB and traffic stations.
The impact of different validation datasets on air quality modeling performance
2018, Transportation Research Record
Independent Validation of National Satellite-Based Land-Use Regression Models for Nitrogen Dioxide Using Passive Samplers
2016, Environmental Science and Technology
Development, Evaluation, and Comparison of Land Use Regression Modeling Methods to Estimate Residential Exposure to Nitrogen Dioxide in a Cohort Study
2016, Environmental Science and Technology

View all citing articles on Scopus

View full text

Comparison of performance of land use regression models derived for Catalunya, Spain

Highlights

Abstract

Introduction

Section snippets

Methods

Comparison of model performances at measurement sites (analyses 1 and 2)

Summary and comparison with other studies

Conclusion

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

The Science of The Total Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Atmospheric Environment

Estimation of outdoor NO(x), NO(2), and BTEX exposure in a cohort of pregnant women using land use regression modeling

Environmental Science & Technology