Abstract:
Extracting geological named entities from geological texts is of great significance for deep mining and application of geological big data.In this paper, we define the concept of geological named entities, formulate annotation specifications, and design an object-oriented representation model for geological entities.Geological texts have a large number of long entities and complex nested entities, which increase the challenge of geological named entity recognition tasks.To address these problems, ①the BERT model is introduced to generate high-quality word vector representations that take into account contextual information; ②BiGRU-Attention-Conditional Random Field(BiGRU-Attention-CRF)is used to sequence label and decode the semantic encoding output from the previous layer.Compared with mainstream deep learning models, the F1 value of this model is 84.02%, which shows better performance than other models and can have better recognition effects on small-scale geological corpora.