A Methodology for Constructing a Geological Prospecting Knowledge Graph and Question Answering Model Based on GeoGPT and LightRAG
-
Abstract
Abstract: Objective The application of massive geological text data to mineral prospecting prediction is hindered by challenges in semantic recognition, integration, and utilization. While the advent of Large Language Models (LLMs) offers a novel approach, they exhibit significant bottlenecks in understanding specialized geological terminology and constructing knowledge graphs. Methods This paper proposes a new methodology for constructing a geological prospecting knowledge graph and a knowledge-based question-answering model by integrating a specialized GeoGPT model with a lightweight LightRAG framework. This approach leverages the a priori geological knowledge of GeoGPT for autonomous ontology definition, entity recognition, and relation extraction, in conjunction with a post-processing module for association enhancement and degradation, to build the geological prospecting knowledge graph. Subsequently, a geological knowledge question-answering model was developed using LightRAG's distinctive dual-layer retrieval and incremental update mechanisms, enabling contextual knowledge completion through retrieval from external geological knowledge bases. Results In the recognition of key entities such as minerals and ore deposits, GeoGPT demonstrated an improvement in F1-score of 17%–28% compared to general LLMs like DeepSeek-V3 and Qwen2.5-72B. Compared to GraphRAG, LightRAG significantly enhanced retrieval efficiency by avoiding computationally expensive community summarization and global reconstruction. The geological knowledge question-answering model, built upon the LightRAG framework with GeoGPT at its core, exhibited superior performance over the DeepSeek-V3 and Qwen2.5-72B general models, with win rates higher by 8%–28% in the geochemistry domain and 52%–78% in the remote sensing geology domain. Conclusion This study introduces a novel method that integrates GeoGPT and LightRAG for the construction of a geological knowledge graph and a corresponding knowledge question-answering model. This approach yields substantial improvements in retrieval efficiency, specialized question-answering, and incremental updating of the knowledge base. The proposed methodology offers a new paradigm for the efficient utilization of textual data in geological prospecting.
-
-