一种基于抽样改进加权核K-means的大数据谱聚类算法

doi:10.13474/j.cnki.11-2246.2018.0354

测绘通报 ›› 2018, Vol. 0 ›› Issue (11): 78-82.doi: 10.13474/j.cnki.11-2246.2018.0354

一种基于抽样改进加权核K-means的大数据谱聚类算法

金海¹, 张劲松², 吴睿^1,3

1. 深圳职业技术学院, 广东深圳 518055;
2. 浙江工业大学, 浙江杭州 310014;
3. 西安交通大学, 陕西西安 710061

收稿日期:2018-06-11 修回日期:2018-08-26 出版日期:2018-11-25 发布日期:2018-11-29
作者简介:金海(1979-),男,硕士,副教授,研究方向为工业设计程序与方法、交互设计。E-mail:wurui198312@163.com
基金资助:
国家自然科学基金（61501337）；深圳职业技术学院校级基金课题（601522S25007）

A Large Scale Spectral Clustering Algorithm Using Sampling Improved Weighted Kernel K-means

JIN Hai¹, ZHANG Jinsong², WU Rui^1,3

1. Shenzhen Polytechnic, Shenzhen 518055, China;
2. Zhejiang University of Technology, Hangzhou 310014, China;
3. Xi'an Jiaotong University, Xi'an 710061, China

Received:2018-06-11 Revised:2018-08-26 Online:2018-11-25 Published:2018-11-29

摘要/Abstract

摘要：

经典谱聚类将数据聚类转化为加权图划分问题，在分析Normalized Cut目标函数与加权核K-means函数等价基础上，设计了一种基于抽样改进加权核K-means算法的大规模数据谱聚类算法。算法通过Leaders进行初始聚类预处理，以控制后续随机抽样的数据规模及对原始数据类别的覆盖，通过抽样子集内加权核K-means迭代优化，避免Laplacian矩阵特征分解的大量资源占用，从而以部分核矩阵的使用避免全部核矩的时间、空间复杂度。试验结果表明，改进算法在保持与经典算法相近聚类精度基础上，大幅提高了聚类效率。

关键词: 大规模数据集谱聚类, 加权核K-means算法, 数据抽样, 核矩阵

Abstract:

Classical spectral clustering algorithm transforms data clustering into graph partitioning problems, so, based on analyzing the equivalence between its Normalized Cut objective function and the weighted nuclear K-means function, a large-scale data spectrum based on sampling improved weighted nuclear K-means is designed, in which initial clustering preprocessing by Leaders is used to control the size of subsequent random sampling data and coverage of the original data categories, and the weighted kernel K-means iterative optimization is used to avoid the large resource consumption of Laplacian matrix feature decomposition of classical spectral clustering algorithm, thereby avoiding the time-space complexity of all nuclear moments by using of partial kernel matrices. Experimental results show that, the improved algorithm can greatly improve the clustering efficiency on the basis of maintaining similar clustering accuracy with the classic algorithm.

Key words: big data spectral clustering, weighted kernel K-means, data sampling, kernel matrix

中图分类号:

P208

金海, 张劲松, 吴睿. 一种基于抽样改进加权核K-means的大数据谱聚类算法[J]. 测绘通报, 2018, 0(11): 78-82.

JIN Hai, ZHANG Jinsong, WU Rui. A Large Scale Spectral Clustering Algorithm Using Sampling Improved Weighted Kernel K-means[J]. 测绘通报, 2018, 0(11): 78-82.

/ 推荐

参考文献

[1] 王建明, 史文中, 邵攀. 自适应距离和模糊拓扑优化的模糊聚类SAR影像变化检测[J]. 测绘学报, 2018,47(5):611-619.
[2] 王娜, 李霞. 基于监督信息特性的主动半监督谱聚类算法[J]. 电子学报. 2010, 38(1):172-176.
[3] 李欣. 分布式增量机制下的交通流大数据聚类分析[J]. 测绘通报, 2017(7):61-65.
[4] 杨艺, 马儒宁. 基于核心点的大数据谱聚类算法[J]. 中国科学技术大学学报, 2016, 46(9):757-763.
[5] 朱庆, 付萧. 多模态时空大数据可视分析方法综述[J]. 测绘学报, 2017, 46(10):1672-1677.
[6] 钱鹏江, 王士同, 邓赵红, 等. 基于最小包含球的大数据集快速谱聚类算法[J]. 电子学报, 2010, 38(9):2035-2041.
[7] LI M, LIAN X C, KORK J T, et al. Time and Space Efficient Spectral Clustering Via Column Sampling[C]//IEEE Conference on Computer Vision and Pattern Recognition.[S.l.]:IEEE Computer Society, 2011:2297-2304.
[8] 张顺龙, 库涛, 周浩. 针对多聚类中心大数据集的加速K-means聚类算法[J]. 计算机应用研究, 2016, 33(2):413-416.
[9] FOWLKES C, BELONGIE S, CHUNG F, et al. Spectral Grouping Using the Nyström Method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2):214-225.
[10] KUMAR S, MOHRI M, TALWALKAR A. Sampling Methods for the Nyström Method[J]. Journal of Machine Learning Research, 2012, 13(1):981-1006.
[11] DHILLON I S, GUAN Y, KULIS B. Weighted Graph Cuts Without Eigenvectors:A Multilevel Approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(11):1944-1957.
[12] HE L, ZHANG H. Kernel K-means Sampling for Nyström Approximation[J]. IEEE Transactions on Image Processing, 2018, 27(5):2108-2120.
[13] HERN T, REICHEL L. Fast Computation of Convolution Operations via Low-rank Approximation[J]. Applied Numerical Mathematics, 2014, 75(75):136-153.
[14] ROMERO E. Using the Leader Algorithm with Support Vector Machines for Large Data Sets[J]. Artificial Neural Networks & Machine Learning Icann, 2011, 6791:225-232.
[15] 朱光辉, 黄圣彬, 袁春风, 等. SCoS:基于Spark的并行谱聚类算法设计与实现[J]. 计算机学报, 2018, 41(4):868-884.
[16] HAVENS T, BEZDEK J, LECKIE C,et al. Fuzzy C-means Algorithms for Very Large Data[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(6):1130-1146.

一种基于抽样改进加权核K-means的大数据谱聚类算法

A Large Scale Spectral Clustering Algorithm Using Sampling Improved Weighted Kernel K-means

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	葛鹏飞, 刘辉, 陈蜜, 李昱, 丁瑞力, 刘菲. 时序InSAR监测京雄城际铁路河北段地面沉降[J]. 测绘通报, 2022, 0(7): 64-70.
[2]	吕峥, 孙群, 温伯威, 马京振. 一种自身全局最优的道路网Stroke生成方法[J]. 测绘通报, 2022, 0(7): 93-99.
[3]	钟祺康, 王志一, 王娜, 郗富瑞. 陕北干旱区景观生态风险空间分异特征及驱动因素分析[J]. 测绘通报, 2022, 0(7): 100-106.
[4]	强德霞, 马海政, 朱自平, 苟彦梅. 甘肃省积石山县泥石流空间分布及分析[J]. 测绘通报, 2022, 0(7): 107-111,117.
[5]	韩文立, 张继贤, 陈海鹏, 黄海英, 章力博, 葛娟, 沈晶, 卢遥. 新型基础测绘质检技术探讨[J]. 测绘通报, 2022, 0(7): 148-153.
[6]	陶肖静. 基于TEA算法的地理信息数据安全保护技术及验证分析[J]. 测绘通报, 2022, 0(7): 154-157,167.
[7]	蔡柔丹. 一种基于用户异步轨迹的身份识别智能方法[J]. 测绘通报, 2022, 0(7): 158-162,167.
[8]	周烨, 刘云波, 郑丽波, 龙泱君. 多平台点云数据的单木参数提取精度分析[J]. 测绘通报, 2022, 0(7): 168-172.
[9]	贺瑜琦, 曾一笑, 陈光, 陈良超. 新型测绘视角下的山地城市规划实施场景预警模拟技术探索[J]. 测绘通报, 2022, 0(4): 11-15.
[10]	罗国玮, 叶嘉媛, 王金凤. 基于多特征相似性的多源POI匹配方法[J]. 测绘通报, 2022, 0(4): 96-100.
[11]	闫明涛, 乔家君, 瞿萌, 朱乾坤, 韩冬. 黄河流域乡村社会经济与生态环境耦合协调测度及影响因素分析[J]. 测绘通报, 2022, 0(4): 101-105,116.
[12]	张普伟, 付梁, 王国华, 卢嫣楠, 赵海云. 乡村休闲养老项目的选址评价体系分析[J]. 测绘通报, 2022, 0(4): 106-110.
[13]	黄鹤, 孟维明. 基于视觉的大半径圆曲线车道线识别[J]. 测绘通报, 2022, 0(4): 134-137.
[14]	张定祥, 汪秀莲, 刘顺喜, 张嘉, 陈强, 李士江. 第三次全国国土调查土地利用矢量数据栅格化方法[J]. 测绘通报, 2022, 0(4): 138-144.
[15]	曾元武, 史京文, 罗宏明, 程迎轩. 省市县三级联动国土空间规划实施监督信息系统建设研究——以广东省为例[J]. 测绘通报, 2022, 0(4): 145-148.