测绘通报 ›› 2017, Vol. 0 ›› Issue (11): 96-100.doi: 10.13474/j.cnki.11-2246.2017.0356

• 技术交流 • 上一篇    下一篇

基于MapReduce的空间数据并行划分算法

付艳丽1,2, 吴艳民3, 张金标4, 郑坤2, 赵长虹3, 郑康2,5, 方发林2,5   

  1. 1. 济南市勘察测绘研究院, 山东 济南 250013;
    2. 中国地质大学(武汉)信息工程学院, 湖北 武汉 430074;
    3. 北京创时空科技发展有限公司, 北京 100083;
    4. 广东省气象探测数据中心, 广东 广州 510610;
    5. 武汉兆图科技有限公司, 湖北 武汉 430070
  • 收稿日期:2017-05-16 出版日期:2017-11-25 发布日期:2017-12-07
  • 通讯作者: 张金标。E-mail:zhangjb@grmc.gov.cn E-mail:zhangjb@grmc.gov.cn
  • 作者简介:付艳丽(1988-),女,硕士,工程师,主要从事GIS开发工作。E-mail:fuyli@126.com
  • 基金资助:
    国家重点研发计划(2016YFB0502603);湖北省自然科学基金(ZRY2015001543);中国地质大学(武汉)中央高校基本科研业务费资金(1610491B20)

Spatial Data Parallel Partitioning Algorithm Based on MapReduce

FU Yanli1,2, WU Yanmin3, ZHANG Jinbiao4, ZHENG Kun2, ZHAO Changhong3, ZHENG Kang2,5, FANG Falin2,5   

  1. 1. Jinan Geotechnical Investigation and Surverying Institute, Jinan 250013, China;
    2. School of Information Engineering, China University of Geosciences(Wuhan), Wuhan 430074, China;
    3. Beijing Create Space-time Science and Technology Limited Company, Beijing 100083, China;
    4. Guangdong Meteorological Observation Data Center, Guangzhou 510610, China;
    5. Wuhan Trillion Map Technology Limited Company, Wuhan 430070, China
  • Received:2017-05-16 Online:2017-11-25 Published:2017-12-07

摘要: 针对海量空间数据分布式存储中存在的不顾及空间邻近性、分布不均和数据倾斜的问题,基于MapReduce并行编程模型,对Hilbert空间曲线层次分解的思想和节点容量感知的方法进行了研究,提出了一种层次分解的空间数据并行划分策略,并通过临界值判定实现空间数据的均衡存储。最后通过实例分析说明该方法可以在保证空间数据邻近特性的同时,解决海量空间数据分布式存储不均和数据倾斜的问题。

关键词: MapReduce, Hilbert空间曲线, 空间数据并行划分

Abstract: Spatial data partitioning method plays an important role in spatial data distributed storage, and its key problem is how topartition spatial data to distributed storage nodes in network environment. This paper discusses massive spatial data partitioning strategies and analyses their disadvantages which these partitioning methods have not taken into account spatial object size and spatial proximity. Aiming at these questions,this paper proposes a new spatial data parallelpartitioning strategy based on MapReduce and capacity-aware method to improve load balance which could avoid unevenly distributed data storage and data skew. Experimental analysis shows that the presented spatial data parallel partitioning algorithm not only achieves better storage load balance in distributed storage system,but also keeps well spatial locality of data objects after partitioning.

Key words: MapReduce, Hilbert space filling curve, spatial data parallel partitioning

中图分类号: