209-6 Large Language Models in Geoscience Research: Emerging Tools, Geological Applications, and Future Directions
Session: Deep-Time Earth and the AI Revolution
Presenting Author:
Yitian XiaoAuthors:
Xiao, Yitian1, Ye, Jieping2, Boquet, Grant3, Ogg, James4, Stephenson, Michael5, Yu, Ting6, Jiang, Ting7, Bah, Mohamed Jaward8, Li, Yilin9, Chen, Hongyang10, Zhou, Ying11(1) Zhejiang Laboratory, Hangzhou, China, (2) Zhejiang Laboratory, Hangzhou, China, (3) Zhejiang Laboratory, Hangzhou, China, (4) Purdue University, West Lafayette, Indiana, USA, (5) Stephenson Geoscience Consulting Ltd, London, United Kingdom, (6) Zhejiang Laboratory, Hangzhou, China, (7) Zhejiang Laboratory, Hangzhou, China, (8) Zhejiang Laboratory, Hangzhou, China, (9) Zhejiang Laboratory, Hangzhou, China, (10) Zhejiang Laboratory, Hangzhou, China, (11) Zhejiang Laboratory, Hangzhou, China,
Abstract:
Large language models (LLMs) are beginning to influence every branch of the geological sciences, from core description to global tectonic synthesis. This contribution provides the first systematic survey aimed at a GSA audience, cataloguing more than twenty domain‑focused LLMs and foundation models released or announced by mid‑2025. We classify them into five functional categories: (1) geospatial vision transformers trained on continental‑scale orthomosaics (e.g., IBM–NASA Prithvi; USGS Landsat–Sentinel prototypes); (2) climate and weather FMs (Microsoft Aurora, NOAA HRRR‑Cast) that supply boundary conditions and hazard scenarios for geomorphic studies; (3) Earth‑system megamodels (ORBIT) that embed stratigraphic and paleoclimate memory; (4) retrieval‑augmented geological LLMs—GeoGPT, EarthLM—that integrate licensed literature, well logs, and thin‑section images to support lithologic interpretation, basin analysis, and mineral prospectivity; and (5) integrated platforms (NASA AI4Science, Microsoft Planetary Computer Pro) that expose curated data APIs and turnkey fine‑tuning pipelines. We compare training corpora (100–1000 GB), parameter counts (7–113 B), and benchmark performance on tasks directly relevant to GSA disciplines: automated geologic map legend harmonization, sequence‑stratigraphic boundary picking, ore‑deposit type prediction, and earthquake‑triggered landslide susceptibility. Enabling trends—radically open literature repositories, self‑supervised learning on 5 m Sentinel‑2 composites, and federated learning across geological surveys—are highlighted. Finally, we outline unresolved scientific and ethical challenges: incorporating physical constraints (e.g., mass balance, rheology) into generative architectures; quantifying epistemic uncertainty; ensuring Indigenous and multilingual data representation; and establishing governance frameworks that balance openness with intellectual‑property stewardship. The survey offers GSA members a concise roadmap of current capabilities and a research agenda for responsibly deploying LLMs to advance geologic discovery, resource management, and geohazard mitigation.
Geological Society of America Abstracts with Program. Vol. 57, No. 6, 2025
doi: 10.1130/abs/2025AM-7751
© Copyright 2025 The Geological Society of America (GSA), all rights reserved.
Large Language Models in Geoscience Research: Emerging Tools, Geological Applications, and Future Directions
Category
Topical Sessions
Description
Session Format: Oral
Presentation Date: 10/21/2025
Presentation Start Time: 03:10 PM
Presentation Room: HBGCC, 301C
Back to Session