谢雪景, 谢忠, 马凯, 陈建国, 邱芹军, 李虎, 潘声勇, 陶留锋. 2023: 结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别. 地质通报, 42(5): 846-855. DOI: 10.12097/j.issn.1671-2552.2023.05.014
    引用本文: 谢雪景, 谢忠, 马凯, 陈建国, 邱芹军, 李虎, 潘声勇, 陶留锋. 2023: 结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别. 地质通报, 42(5): 846-855. DOI: 10.12097/j.issn.1671-2552.2023.05.014
    XIE Xuejing, XIE Zhong, MA Kai, CHEN Jianguo, QIU Qinjun, LI Hu, PAN Shengyong, TAO Liufeng. 2023: Geological named entity recognition combined BERT and BiGRU-Attention-CRF model. Geological Bulletin of China, 42(5): 846-855. DOI: 10.12097/j.issn.1671-2552.2023.05.014
    Citation: XIE Xuejing, XIE Zhong, MA Kai, CHEN Jianguo, QIU Qinjun, LI Hu, PAN Shengyong, TAO Liufeng. 2023: Geological named entity recognition combined BERT and BiGRU-Attention-CRF model. Geological Bulletin of China, 42(5): 846-855. DOI: 10.12097/j.issn.1671-2552.2023.05.014

    结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别

    Geological named entity recognition combined BERT and BiGRU-Attention-CRF model

    • 摘要: 从地质文本中提取地质命名实体, 对地质大数据的深度挖掘与应用具有重要意义。定义了地质命名实体的概念并制订了标注规范, 设计了地质实体对象化表达模型。地质文本存在大量长实体、复杂嵌套实体, 增加了地质命名实体识别的挑战性。针对上述问题, ①引入BERT模型生成顾及上下文信息的高质量词向量表征; ②采用双向门控循环单元-注意力机制-条件随机场(BiGRU-Attention-CRF)对前一层输出的语义编码进行序列标注与解码。通过与主流深度学习模型进行对比, 该模型的F1值为84.02%, 均比其他模型表现出更优异的性能, 能在小规模地质语料库上有较好的识别效果。

       

      Abstract: Extracting geological named entities from geological texts is of great significance for deep mining and application of geological big data.In this paper, we define the concept of geological named entities, formulate annotation specifications, and design an object-oriented representation model for geological entities.Geological texts have a large number of long entities and complex nested entities, which increase the challenge of geological named entity recognition tasks.To address these problems, ①the BERT model is introduced to generate high-quality word vector representations that take into account contextual information; ②BiGRU-Attention-Conditional Random Field(BiGRU-Attention-CRF)is used to sequence label and decode the semantic encoding output from the previous layer.Compared with mainstream deep learning models, the F1 value of this model is 84.02%, which shows better performance than other models and can have better recognition effects on small-scale geological corpora.

       

    /

    返回文章
    返回