测绘通报 ›› 2017, Vol. 0 ›› Issue (11): 96-100.doi: 10.13474/j.cnki.11-2246.2017.0356

Previous Articles     Next Articles

Spatial Data Parallel Partitioning Algorithm Based on MapReduce

FU Yanli1,2, WU Yanmin3, ZHANG Jinbiao4, ZHENG Kun2, ZHAO Changhong3, ZHENG Kang2,5, FANG Falin2,5   

  1. 1. Jinan Geotechnical Investigation and Surverying Institute, Jinan 250013, China;
    2. School of Information Engineering, China University of Geosciences(Wuhan), Wuhan 430074, China;
    3. Beijing Create Space-time Science and Technology Limited Company, Beijing 100083, China;
    4. Guangdong Meteorological Observation Data Center, Guangzhou 510610, China;
    5. Wuhan Trillion Map Technology Limited Company, Wuhan 430070, China
  • Received:2017-05-16 Online:2017-11-25 Published:2017-12-07

Abstract: Spatial data partitioning method plays an important role in spatial data distributed storage, and its key problem is how topartition spatial data to distributed storage nodes in network environment. This paper discusses massive spatial data partitioning strategies and analyses their disadvantages which these partitioning methods have not taken into account spatial object size and spatial proximity. Aiming at these questions,this paper proposes a new spatial data parallelpartitioning strategy based on MapReduce and capacity-aware method to improve load balance which could avoid unevenly distributed data storage and data skew. Experimental analysis shows that the presented spatial data parallel partitioning algorithm not only achieves better storage load balance in distributed storage system,but also keeps well spatial locality of data objects after partitioning.

Key words: MapReduce, Hilbert space filling curve, spatial data parallel partitioning

CLC Number: