基于GeoGPT与LightRAG的地质找矿知识图谱及知识问答模型构建方法研究

    The methodology for constructing a geological prospecting knowledge graph and question answering model based on GeoGPT and LightRAG

    • 摘要:
      研究目的 海量地质文本数据应用于地质找矿预测时面临语义识别、整合与利用难题。尽管通用大模型(LLMs)的兴起为此提供了新路径,但其在地质专业术语理解、知识图谱构建方面存在显著瓶颈。
      研究方法 提出一种基于GeoGPT地质大模型与LightRAG轻量化技术,构建地质找矿知识图谱与知识问答模型的新方法。该方法利用GeoGPT在地质领域的先验知识自主本体定义、实体识别与关系提取,结合关联增强与关联退化后处理模块构建了地质找矿知识图谱。以知识图谱为核心,利用LightRAG独有的双层检索与增量更新机制,构建了可通过检索外部地质知识库实现上下文知识补全的地质知识问答模型。
      研究结果 GeoGPT在矿物、矿床等关键实体的识别中F1分数相比DeepSeek-V3、Qwen2.5-72B通用大模型提升17%~28%;LightRAG相较于GraphRAG避免了高消耗的社区摘要与全局重建,大幅提升了检索效率;基于LightRAG框架以GeoGPT为核心构建的地质知识问答模型相较于DeepSeek-V3和Qwen2.5-72B通用大模型,在地球化学领域结果优胜比率高出8%~28%,在遥感地质领域高出52%~78%。
      结论 本次研究基于GeoGPT高效构建地质找矿知识图谱,并利用LightRAG轻量化特性形成了一种快速构建地质知识问答模型的新方法。在检索效率、专业问答、知识库增量更新等方面性能大幅提升,该方法为地质找矿文本数据高效利用提供了一种新的借鉴。

       

      Abstract:
      Objective The integration of massive geological text data into mineral prospecting is challenged by difficulties in semantic recognition and synthesis. While Large Language Models (LLMs) offer new possibilities, they face significant bottlenecks in understanding domain−specific terminology and constructing knowledge graphs.
      Methods This paper proposes a methodology for constructing a geological prospecting knowledge graph and a question−answering (QA) model by integrating GeoGPT with the LightRAG framework. Leveraging GeoGPT’s domain−specific prior knowledge, the approach enables autonomous ontology definition, entity recognition, and relation extraction. A post−processing module for association enhancement and degradation is employed to refine the knowledge graph. Furthermore, a QA model is developed using LightRAG’s distinctive dual−layer retrieval and incremental update mechanisms to achieve contextual knowledge completion from external geological databases.
      Results In key entity recognition (e.g., minerals, ore deposits), GeoGPT’s F1−score outperformed general LLMs (DeepSeek−V3, Qwen2.5−72B) by 17%–28%. Compared to GraphRAG, LightRAG significantly improved retrieval efficiency by bypassing high−cost community summarization and global reconstruction. The GeoGPT−based QA model achieved win rates 8%–28% higher in geochemistry and 52%–78% higher in remote sensing geology compared to the general models.
      Conclusions This study provides an efficient method for constructing geological knowledge graphs and lightweight QA models. By substantially improving retrieval efficiency and incremental updating, this methodology offers a robust new paradigm for the intelligent utilization of geological text data.

       

    /

    返回文章
    返回