Abstract:
Objective The integration of massive geological text data into mineral prospecting is challenged by difficulties in semantic recognition and synthesis. While Large Language Models (LLMs) offer new possibilities, they face significant bottlenecks in understanding domain−specific terminology and constructing knowledge graphs.
Methods This paper proposes a methodology for constructing a geological prospecting knowledge graph and a question−answering (QA) model by integrating GeoGPT with the LightRAG framework. Leveraging GeoGPT’s domain−specific prior knowledge, the approach enables autonomous ontology definition, entity recognition, and relation extraction. A post−processing module for association enhancement and degradation is employed to refine the knowledge graph. Furthermore, a QA model is developed using LightRAG’s distinctive dual−layer retrieval and incremental update mechanisms to achieve contextual knowledge completion from external geological databases.
Results In key entity recognition (e.g., minerals, ore deposits), GeoGPT’s F1−score outperformed general LLMs (DeepSeek−V3, Qwen2.5−72B) by 17%–28%. Compared to GraphRAG, LightRAG significantly improved retrieval efficiency by bypassing high−cost community summarization and global reconstruction. The GeoGPT−based QA model achieved win rates 8%–28% higher in geochemistry and 52%–78% higher in remote sensing geology compared to the general models.
Conclusions This study provides an efficient method for constructing geological knowledge graphs and lightweight QA models. By substantially improving retrieval efficiency and incremental updating, this methodology offers a robust new paradigm for the intelligent utilization of geological text data.