智能地质调查大数据应用体系架构与关键技术

    Big data application architecture and key technologies of intelligent geological survey

    • 摘要: 地质调查数据主要由结构化和非结构化多样性的数据构成。由非结构化多样性数据文件组成的报告,由于技术原因,长期以来一直以传统的目录文件方式进行存储。这种存储方式导致数据的查询、统计、更新等操作不但低效,而且非常不利于检索、查询、挖掘等应用,使得数据服务能力极低。通过把Hadoop生态体系融入中国地质调查云平台架构,基于Hadoop HDFS和HBase存储架构,建立非结构化地质数据基础内容库存储组织模式,采用Lucene全文搜索引擎架和地质领域本体词库构建快速随机访问的索引文件机制,改变了多样化、碎片化的复杂地质调查非结构化数据的存储、阅读、搜索和应用模式,为智能地质调查提供精确、快速服务奠定基础。

       

      Abstract: Geological survey data are mainly composed of structured and unstructured data. The composition report, consisting of Word, PDF, excel, graphics, pictures, video, PPT and other unstructured data files, is stored in files directory traditionally due to technical reasons. This traditional storage mode is not only inefficient in data retrieving, statistic analysis, updating and other operations but also not conducive to searching, querying and mining applications, thus resulting in extremely low data service capability. Through incorporating Hadoop ecosystem into cloud platform architecture of China Geological Survey, the authors established the geological data base of unstructured content library storage organization model based on HDFS and HBase storage architecture, and set up indexing mechanism for fast random access using full-text search engine ‘Lucene’ and geological domain main part lexicon to change the diversity of complex fragmentation of storage, reading, retrieval and application mode for geological survey unstructured data, and to provide accurate basis for fast service of smart geological survey.

       

    /

    返回文章
    返回