基于GeoGPT与LightRAG的地质找矿知识图谱及知识问答模型构建方法研究

    A Methodology for Constructing a Geological Prospecting Knowledge Graph and Question Answering Model Based on GeoGPT and LightRAG

    • 摘要: 【研究目的】海量地质文本数据应用于地质找矿预测面临语义识别、整合与利用难题。尽管通用大模型(LLMs)的兴起为此提供了新路径,但其在地质专业术语理解、知识图谱构建上存在显著瓶颈。【研究方法】本文提出了一种基于GeoGPT地质大模型与LightRAG轻量化技术构建地质找矿知识图谱与知识问答模型的新方法。该方法利用GeoGPT在地质领域的先验知识自主本体定义、实体识别与关系提取,结合关联增强与关联退化后处理模块构建了地质找矿知识图谱。以知识图谱为核心,利用LightRAG独有的双层检索与增量更新机制,构建了可通过检索外部地质知识库实现上下文知识补全的地质知识问答模型。【研究结果】GeoGPT在矿物、矿床等关键实体的识别中F1分数相比DeepSeek-V3、Qwen2.5-72B通用大模型提升17%~28%;LightRAG相较于GraphRAG避免了高消耗的社区摘要与全局重建,大幅提升了检索效率;基于LightRAG框架以GeoGPT为核心构建的地质知识问答模型相较于DeepSeek-V3和Qwen2.5-72B通用大模型在地球化学领域结果优胜比率高出8%~28%、遥感地质领域高出52%~78%。【结论】本研究提出了一种基于GeoGPT高效构建地质找矿知识图谱,并利用LightRAG轻量化特性形成了一种快速构建地质知识问答模型的新方法。在检索效率、专业问答、知识库增量更新等方面性能大幅提升,该方法为地质找矿文本数据高效利用提供了一种新的借鉴。

       

      Abstract: Abstract: Objective The application of massive geological text data to mineral prospecting prediction is hindered by challenges in semantic recognition, integration, and utilization. While the advent of Large Language Models (LLMs) offers a novel approach, they exhibit significant bottlenecks in understanding specialized geological terminology and constructing knowledge graphs. Methods This paper proposes a new methodology for constructing a geological prospecting knowledge graph and a knowledge-based question-answering model by integrating a specialized GeoGPT model with a lightweight LightRAG framework. This approach leverages the a priori geological knowledge of GeoGPT for autonomous ontology definition, entity recognition, and relation extraction, in conjunction with a post-processing module for association enhancement and degradation, to build the geological prospecting knowledge graph. Subsequently, a geological knowledge question-answering model was developed using LightRAG's distinctive dual-layer retrieval and incremental update mechanisms, enabling contextual knowledge completion through retrieval from external geological knowledge bases. Results In the recognition of key entities such as minerals and ore deposits, GeoGPT demonstrated an improvement in F1-score of 17%–28% compared to general LLMs like DeepSeek-V3 and Qwen2.5-72B. Compared to GraphRAG, LightRAG significantly enhanced retrieval efficiency by avoiding computationally expensive community summarization and global reconstruction. The geological knowledge question-answering model, built upon the LightRAG framework with GeoGPT at its core, exhibited superior performance over the DeepSeek-V3 and Qwen2.5-72B general models, with win rates higher by 8%–28% in the geochemistry domain and 52%–78% in the remote sensing geology domain. Conclusion This study introduces a novel method that integrates GeoGPT and LightRAG for the construction of a geological knowledge graph and a corresponding knowledge question-answering model. This approach yields substantial improvements in retrieval efficiency, specialized question-answering, and incremental updating of the knowledge base. The proposed methodology offers a new paradigm for the efficient utilization of textual data in geological prospecting.

       

    /

    返回文章
    返回