Abstract:
Geological survey data are mainly composed of structured and unstructured data. The composition report, consisting of Word, PDF, excel, graphics, pictures, video, PPT and other unstructured data files, is stored in files directory traditionally due to technical reasons. This traditional storage mode is not only inefficient in data retrieving, statistic analysis, updating and other operations but also not conducive to searching, querying and mining applications, thus resulting in extremely low data service capability. Through incorporating Hadoop ecosystem into cloud platform architecture of China Geological Survey, the authors established the geological data base of unstructured content library storage organization model based on HDFS and HBase storage architecture, and set up indexing mechanism for fast random access using full-text search engine ‘Lucene’ and geological domain main part lexicon to change the diversity of complex fragmentation of storage, reading, retrieval and application mode for geological survey unstructured data, and to provide accurate basis for fast service of smart geological survey.