38-9 Integrating Machine Learning with Geochemical Modeling: A Hybrid Random Forest–Gaussian Processes Approach
Session: Geoscience and Hydrogeology in the AI Era: From Predictive Models to Real-Time Applications
Presenting Author:
Javier SamperAuthors:
Samper-Pilar, Javier1, Yang, Changbing2, Samper, Javier3, Mon, Alba4(1) Civil Engineering Department & School. Interdisciplinary Center for Biology and Chemistry (CICA), Universidade da Coruña, A Coruña, Galicia, Spain, (2) Civil Engineering Department & School. Interdisciplinary Center for Biology and Chemistry (CICA), Universidade da Coruña, A Coruña, Galicia, Spain, (3) Civil Engineering Department & School. Interdisciplinary Center for Biology and Chemistry (CICA), Universidade da Coruña, A Coruña, Galicia, Spain, (4) Civil Engineering Department & School. Interdisciplinary Center for Biology and Chemistry (CICA), Universidade da Coruña, A Coruña, Galicia, Spain,
Abstract:
Recent advances in numerical methods and computing power have accelerated the use of artificial intelligence (AI) and machine learning (ML) across scientific domains, including geochemistry for radioactive waste management. ML is increasingly applied to accelerate reactive transport simulations, enhance multiscale and multiphysics model couplings, and support uncertainty quantification and sensitivity analyses. This paper presents a hybrid machine learning approach combining Random Forest (RF) and Gaussian Processes (GP), referred to as RF-GP, to solve two benchmark problems recently introduced by Prasianakis et al. (2025). The RF-GP method operates in two stages: first, an RF algorithm classifies geochemical systems into sub-groups based on factors such as pH and mineral composition; second, a GP regressor models geochemical behavior within each sub-system. The RF classifier implemented using the scikit-learn 1.2.2 Python library, is trained with 80% of the data using a randomized shuffle-split strategy. Given input features, the trained RF model predicts the sub-group classification. Once the classification is determined, the corresponding GP model is applied to simulate geochemical outputs. A key advantage of the RF-GP method is its low number of parameters, which helps reduce the risk of overfitting. However, computational challenges remain, particularly with respect to CPU time and memory when processing large datasets (N > 10,000). In the cement-based benchmark case, RF-GP demonstrated strong performance, achieving low root mean square error (RMSE) values across most geochemical outputs. The hybrid approach outperformed standalone GP by leveraging both classification and regression strengths. RF-GP yielded especially accurate predictions for aqueous, sorbed, exchanged, and precipitated uranium species. However, its performance declined for relative metrics and for outputs with very low aqueous and sorbed uranium concentrations. Overall, RF-GP shows strong potential for improving the efficiency and accuracy of reactive transport modeling in complex geochemical systems. While its dual-model architecture provides enhanced predictive capabilities, further improvements are needed to address computational scalability for large-scale applications.
Acknowledgements. This research was funded by ENRESA within the Work Packages DONUT of EURAD (European Joint Programme on Radioactive Waste Management of the European Union) (Grant Agreement No. 84759) and HERMES of EURAD-2 (Grant Agreement No. 101166718) and the PID2023-153202OB-I00 Project from the Spanish the Ministry of Science and Innovation.
Geological Society of America Abstracts with Program. Vol. 57, No. 6, 2025
doi: 10.1130/abs/2025AM-9388
© Copyright 2025 The Geological Society of America (GSA), all rights reserved.
Integrating Machine Learning with Geochemical Modeling: A Hybrid Random Forest–Gaussian Processes Approach
Category
Topical Sessions
Description
Session Format: Oral
Presentation Date: 10/19/2025
Presentation Start Time: 04:05 PM
Presentation Room: HBGCC, 210AB
Back to Session