测绘通报 ›› 2018, Vol. 0 ›› Issue (11): 78-82.doi: 10.13474/j.cnki.11-2246.2018.0354

• 学术研究 • 上一篇    下一篇

一种基于抽样改进加权核K-means的大数据谱聚类算法

金海1, 张劲松2, 吴睿1,3   

  1. 1. 深圳职业技术学院, 广东 深圳 518055;
    2. 浙江工业大学, 浙江 杭州 310014;
    3. 西安交通大学, 陕西 西安 710061
  • 收稿日期:2018-06-11 修回日期:2018-08-26 出版日期:2018-11-25 发布日期:2018-11-29
  • 作者简介:金海(1979-),男,硕士,副教授,研究方向为工业设计程序与方法、交互设计。E-mail:wurui198312@163.com
  • 基金资助:

    国家自然科学基金(61501337);深圳职业技术学院校级基金课题(601522S25007)

A Large Scale Spectral Clustering Algorithm Using Sampling Improved Weighted Kernel K-means

JIN Hai1, ZHANG Jinsong2, WU Rui1,3   

  1. 1. Shenzhen Polytechnic, Shenzhen 518055, China;
    2. Zhejiang University of Technology, Hangzhou 310014, China;
    3. Xi'an Jiaotong University, Xi'an 710061, China
  • Received:2018-06-11 Revised:2018-08-26 Online:2018-11-25 Published:2018-11-29

摘要:

经典谱聚类将数据聚类转化为加权图划分问题,在分析Normalized Cut目标函数与加权核K-means函数等价基础上,设计了一种基于抽样改进加权核K-means算法的大规模数据谱聚类算法。算法通过Leaders进行初始聚类预处理,以控制后续随机抽样的数据规模及对原始数据类别的覆盖,通过抽样子集内加权核K-means迭代优化,避免Laplacian矩阵特征分解的大量资源占用,从而以部分核矩阵的使用避免全部核矩的时间、空间复杂度。试验结果表明,改进算法在保持与经典算法相近聚类精度基础上,大幅提高了聚类效率。

关键词: 大规模数据集谱聚类, 加权核K-means算法, 数据抽样, 核矩阵

Abstract:

Classical spectral clustering algorithm transforms data clustering into graph partitioning problems, so, based on analyzing the equivalence between its Normalized Cut objective function and the weighted nuclear K-means function, a large-scale data spectrum based on sampling improved weighted nuclear K-means is designed, in which initial clustering preprocessing by Leaders is used to control the size of subsequent random sampling data and coverage of the original data categories, and the weighted kernel K-means iterative optimization is used to avoid the large resource consumption of Laplacian matrix feature decomposition of classical spectral clustering algorithm, thereby avoiding the time-space complexity of all nuclear moments by using of partial kernel matrices. Experimental results show that, the improved algorithm can greatly improve the clustering efficiency on the basis of maintaining similar clustering accuracy with the classic algorithm.

Key words: big data spectral clustering, weighted kernel K-means, data sampling, kernel matrix

中图分类号: