测绘通报 ›› 2024, Vol. 0 ›› Issue (10): 25-31.doi: 10.13474/j.cnki.11-2246.2024.1005.

• 水环境监测 • 上一篇    

面向实时水环境监测大数据分析的分布式处理框架研究

陈江龙1, 宋炜炜1, 李嘉豪1, 韦群兰1, 王金霞1, 代博兰2   

  1. 1. 昆明理工大学国土资源工程学院, 云南 昆明 650093;
    2. 中国船舶集团有限公司第七二二研究所, 湖北 武汉 430200
  • 收稿日期:2024-01-29 发布日期:2024-11-02
  • 通讯作者: 宋炜炜,E-mail:asong@vip.163.com
  • 作者简介:陈江龙(2000—),男,硕士生,主要研究方向为时空数据挖掘。E-mail:20222201082@stu.kust.edu.cn
  • 基金资助:
    云南省重大科技专项(202202AD080010)

Research on distributed processing framework for big data analysis of real-time water environment monitoring

CHEN Jianglong1, SONG Weiwei1, LI Jiahao1, WEI Qunlan1, WANG Jinxia1, DAI Bolan2   

  1. 1. Faculty of Land Resource Engineering, Kunming University of Science and Technology, Kunming 650093, China;
    2. 722 Research Institute of China State Shipbuilding Corporation, Wuhan 430200, China
  • Received:2024-01-29 Published:2024-11-02

摘要: 水质监测数据的迅速积累和增长,给水质研究带来了新的机遇和挑战。针对目前水质监测与分析面临的实时性差、数据显示不直观、数据处理效率低等问题,本文基于大数据技术和数据可视化技术,构建了用于实时水环境监测与大数据分析的分布式处理框架WaterSpark。使用改进的加拿大环境部长理事会水质指数(CCME-WQI)和Spark机器学习库(MLlib),进行云南省九大高原湖泊水质监测数据。结果表明,WaterSpark在实时水质传输、清洗归档、高效计算分析等方面表现出色,能够及时准确地捕获和分析大规模水质数据,分布式数据集和集群能够应对不断增长的水质数据,保证性能的可扩展性,支持更多的水质指标和更大规模的水质监测。

关键词: 水质监测, 分布式处理框架, 水质指数, 云南省九大高原湖泊

Abstract: The rapid accumulation and growth of water quality monitoring data brings new opportunities and challenges to water quality research. In view of the problems of poor timeliness, unintuitive data display, and inefficient data processing faced by water quality monitoring and analysis at present, this paper constructs WaterSpark, a distributed processing framework for real-time water environment monitoring and big data analysis, based on big data technology and data visualization technology, using the improved Canadian Council of Ministers of the Environment water quality index (CCME-WQI) and the Spark machine learning library (MLlib), Applying the water quality monitoring data of nine plateau lakes in Yunnan province, the results show that WaterSpark has excellent performance in real-time water quality transmission, cleaning and archiving, and efficient computing and analysis. It can enable large-scale water quality data to be captured and analyzed timely and accurately, and the distributed data sets and clusters can cope with the growing water quality data to ensure performance scalability, and to support more water quality indicators and water quality monitoring on a larger scale.

Key words: water quality monitoring, distributed processing framework, water quality index, nine plateau lakes in Yunnan province

中图分类号: