209-10 LLM-Powered Domain Knowledge Graph Construction: Applications in Hydrocarbon Source Rock Analysis
Session: Deep-Time Earth and the AI Revolution
Presenting Author:
Jianhao WangAuthors:
Hu, Huiting1, Liu, Huili2, Feng, Zhiqiang3, Wang, Haixue4, Liu, Bo5, Yu, Ting6, Jiang, Ting7, Li, Yilin8, Wang, Jianhao9(1) College of Geosciences, Northeast Petroleum University, daqing, heilongjiang, China, (2) College of Geosciences, Northeast Petroleum University, daqing, China, (3) Zhejiang Lab, hangzhou, zhejiang, China, (4) College of Geosciences, Northeast Petroleum University, daqing, heilongjiang, China, (5) Northeast Petroleum University, daqing, heilongjiang, China, (6) Zhejiang Lab, hangzhou, zhejiang, China, (7) Zhejiang Lab, hangzhou, zhejiang, China, (8) Northeast Petroleum University, daqing, heilongjiang, China, (9) College of Geosciences, Northeast Petroleum University, daqing, heilongjiang, China,
Abstract:
The integration of large language models (LLMs) with knowledge graphs (KGs) offers a novel approach for building scalable, high-quality domain KGs. This paper presents an LLM-driven methodology for domain KG construction and demonstrates its application in hydrocarbon source rock analysis.
The method begins by analyzing the domain's knowledge structure and data characteristics to construct and iteratively expand a foundational ontology, defining hierarchies, rules, and relationships. Multi-modal domain data (documents, charts, databases) undergo screening and preprocessing. Leveraging the ontology, LLMs (specifically GeoGPT for geology) extract knowledge triples and rule conclusions from this data. To ensure accuracy and mitigate LLM hallucinations, extracted knowledge undergoes rigorous verification that combines model confidence scoring and expert sampling. Only high-confidence, relevant data enriches the ontology.
Subsequently, knowledge reasoning and completion techniques, including rule-based inference and representation learning for link prediction, infer potential relationships between nodes, enhancing graph connectivity and cohesion. This process – extraction, verification, enrichment, reasoning – iterates until a comprehensive domain KG is achieved, enabling human-AI collaborative automation.
Applied to source rock analysis, the method constructs a KG covering properties, location, and sedimentary environment. Starting with "source rock" as the core concept, a top-down ontology is built using standards as axioms. GeoGPT extracts entities and rules from documents, patents, and charts. Confidence-based filtering and expert verification ensure data quality before ontology expansion. Rule reasoning and representation learning infer potential node relationships. Iteration yields a comprehensive source rock KG.
This KG integrates complex domain knowledge and leverages LLM semantic understanding for intelligent association and deep reasoning. It consolidates massive data into a queryable repository, significantly enhancing source rock data retrieval and novel insight discovery. Further applications in agent systems enable evaluation and prediction. The method provides a novel research paradigm for source rock studies and offers a reusable framework for KG construction and application across diverse domains.
Geological Society of America Abstracts with Program. Vol. 57, No. 6, 2025
doi: 10.1130/abs/2025AM-11226
© Copyright 2025 The Geological Society of America (GSA), all rights reserved.
LLM-Powered Domain Knowledge Graph Construction: Applications in Hydrocarbon Source Rock Analysis
Category
Topical Sessions
Description
Session Format: Oral
Presentation Date: 10/21/2025
Presentation Start Time: 04:15 PM
Presentation Room: HBGCC, 301C
Back to Session