143-8 Hazy Data, Clearer Cities: Modeling Urban PM2.5 Pollution with Machine Learning and Multisource Data
Session: A Showcase of Student Research in Geoinformatics and Data Science (Posters)
Poster Booth No.: 31
Presenting Author:
Jada MacharieAuthors:
Macharie, Jada1, Nath, Bibhash2, Ni-Meister, Wenge3Abstract:
Urban environments experience intensified air pollution exposure due to dense populations, complex land use, and transportation-sourced emissions. Fine particulate matter (PM2.5), a significant contributor to urban air quality degradation, poses severe risks to human health, especially in marginalized communities. Accurate prediction of PM2.5 concentrations is imperative to the future of public health and environmental policies.
This study evaluates how the quantity and temporal variability of input data influence the performance of PM2.5 prediction models using machine learning. We applied Random Forest (RF) and Extreme Gradient Boost (XGB) algorithms across three major U.S. cities, New York City, Washington, D.C., and Boston, leveraging a combination of satellite and ground-based datasets. Specifically, we integrated Moderate Resolution Imaging Spectroradiometer (MODIS) derived Aerosol Optical Depth (AOD) with surface meteorological variables, including relative humidity, barometric pressure, wind speed and direction, and outdoor temperature.
Our findings indicate that MODIS AOD alone provides limited predictive power due to resolution constraints and spatiotemporal inconsistencies. However, when coupled with meteorological data, it significantly improves PM2.5 predictions across all cities, highlighting the importance of multisource data integration. Additionally, we evaluated the model’s ability to generalize regionally by training on data from two cities and predicting PM2.5 in the third. This revealed data variability as a key factor in both model accuracy and geographic transferability.
This research offers useful insights for urban geoscientists and planners by describing how remote sensing and environmental monitoring data can be operationalized through machine learning to fill air quality information gaps in underserved communities.
Ultimately, this work contributes to the growing field of urban geoinformatics by providing a reproducible approach to PM2.5 modeling that supports equitable and evidence-based solutions to environmental hazards in cities. The integration of satellite and surface data using interpretable machine learning not only advances our understanding of urban atmospheric dynamics but also supports decision-making for healthier, more resilient cities.
Geological Society of America Abstracts with Program. Vol. 57, No. 6, 2025
doi: 10.1130/abs/2025AM-10893
© Copyright 2025 The Geological Society of America (GSA), all rights reserved.
Hazy Data, Clearer Cities: Modeling Urban PM2.5 Pollution with Machine Learning and Multisource Data
Category
Topical Sessions
Description
Session Format: Poster
Presentation Date: 10/20/2025
Presentation Room: HBGCC, Hall 1
Poster Booth No.: 31
Author Availability: 3:30–5:30 p.m.
Back to Session