143-3 Testing the efficacy of machine learning regression algorithms in predicting geochemical marine redox changes from the mid-Paleoproterozoic to the Holocene.
Session: A Showcase of Student Research in Geoinformatics and Data Science (Posters)
Poster Booth No.: 26
Presenting Author:
Maya ThompsonAuthors:
Stockey, Richard G.1, Lui, Timmy C.C.2, Farrell, Úna C.3, Trace Metal Working Group, SGP4, Thompson, Maya5, Sperling, Erik A.6Abstract:
Historical changes to Earth’s marine redox state are recorded in various geochemical proxies such as molybdenum (Mo), uranium (U), and vanadium (V). Using statistical analyses and mass balance modelling, researchers have attempted to quantify global redox changes through time using sedimentary abundances of these redox-sensitive trace metals. As we build bigger sedimentary geochemical databases through the Sedimentary Geochemistry and Paleoenvironments Project (SGP), it is important to continue developing increasingly sophisticated statistical tools for analysis.
This research builds on work done in Stockey et al. (2024; Nature Geoscience), who developed stratigraphy-based age modelling and applied a Monte Carlo random forest analysis to sedimentary trace metal data ranging from 1000 Ma – 300 Ma. In modifying this framework for use with Python’s (v3.12.1) sci-kit learn (v1.5.1) and xgboost (v.2.1.3) packages, we can determine feature importance on a finer scale via numerical encoding of categorical variables. Using the larger SGP Phase 2 dataset, we test the predictive efficacy of seven machine learning algorithms across an expanded temporal dataset (2200 Ma – 15 Ma). The ultimate goals are to determine algorithm performances based on root mean squared error and identify predictor variable importance.
In the trace metal analyses, eXtreme gradient boosting (XGB) medially scores the best. Random forest almost always scores second best, and is computationally conservative due to its smaller hyperparameter grid search and shorter computation time. When determining what features of the models were the most effective predictors, Total Organic Carbon (TOC, in wt%) almost always scored best, with site latitude and the stratigraphic age model sometimes scoring first. Additionally, we tested the effects of preparatory, experimental, analytical geochemical methods on model accuracy. Based on permutation importance, only a few methods scored higher than the two dummy predictors; none scored highly enough to be significant. This suggests that geochemical methodology does not have an important effect on predicting trace metal patterns through time in large datasets.
Geological Society of America Abstracts with Program. Vol. 57, No. 6, 2025
doi: 10.1130/abs/2025AM-7008
© Copyright 2025 The Geological Society of America (GSA), all rights reserved.
Testing the efficacy of machine learning regression algorithms in predicting geochemical marine redox changes from the mid-Paleoproterozoic to the Holocene.
Category
Topical Sessions
Description
Session Format: Poster
Presentation Date: 10/20/2025
Presentation Room: HBGCC, Hall 1
Poster Booth No.: 26
Author Availability: 3:30–5:30 p.m.
Back to Session