引用本文:梁少军,张世荣,孙澜琼.基于最优密度方向的等距映射降维算法[J].控制理论与应用,2021,38(4):467~478.[点击复制]
LIANG Shao-jun,ZHANG Shi-rong,SUN Lan-qiong.Optimal density direction based isometric mapping dimensionality reduction algorithm[J].Control Theory and Technology,2021,38(4):467~478.[点击复制]
基于最优密度方向的等距映射降维算法
Optimal density direction based isometric mapping dimensionality reduction algorithm
摘要点击 2419  全文点击 875  投稿时间:2020-07-14  修订日期:2021-03-22
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2020.00454
  2021,38(4):467-478
中文关键词  等距映射  流形学习  自然邻居  最优密度
英文关键词  ISOMAP  manifold learning  natural neighbor  optimal density
基金项目  国家自然科学基金项目(51475337), 陆军军内科研项目(LJ20182B050054, LJ20191C040483, LJ20202C020416, LJ20202C050412)资助.
作者单位E-mail
梁少军 陆军工程大学军械士官学校 sjliang@whu.edu.cn 
张世荣* 武汉大学电气与自动化学院 srzhang@whu.edu.cn 
孙澜琼 陆军工程大学军械士官学校  
中文摘要
      等距映射算法(ISOMAP)是一种典型的非线性流形降维算法, 该算法可在尽量保持高维数据测地距离与低 维数据空间距离对等关系的基础上实现降维. 但ISOMAP容易受噪声的影响, 导致数据降维后不能保持高维拓扑结 构. 针对这一问题, 提出了一种基于最优密度方向的等距映射(ODD–ISOMAP)算法. 该算法通过筛选数据的自然邻 居确定每个数据沿流形方向的最优密度方向, 之后基于与各近邻数据组成的向量相对最优密度方向投影的角度、 方向和长度合理缩放局部邻域距离, 引导数据沿流形方向计算测地距离, 从而降低算法对噪声的敏感度. 为验证算 法有效性, 选取了2类人工合成数据和5类实测数据作为测试数据集, 分别使用ISOMAP, LLE, HLLE, LTSA, LEIGS, PCA和ODD–ISOMAP算法对数据集降维, 并对降维数据进行K-mediods聚类分析. 通过比对聚类正确率以及不同幅 度噪声对此正确率的影响程度评价各算法降维效果优劣. 结果表明, ODD–ISOMAP算法较其他6种常见算法降维效 果提升显著, 且对噪声干扰有更强的抵抗能力.
英文摘要
      As a typical nonlinear dimensionality reduction algorithm, ISOMAP can realize the dimensionality reduction based on the corresponding relationship between the high-dimensional geodesic distance and the low-dimensional spatial distance. However, the classical ISOMAP algorithm is heavily affected by noise making it difficult to maintain the topology in low dimensionality. To address this problem, an optimal density direction based ISOMAP algorithm is proposed. Firstly, the algorithm screens the optimal density direction of each data along the manifold direction by filtering its natural neighbors, then composes a vector with the data and its neighbors. After that, the local neighborhood distance is reasonably scaled according to the angle, direction and length of the projection of the vector relative to the optimal density direction. In this way, the geodesic distance is guided to be calculated along the manifold direction so as to reduce the sensitivity to noise. In order to verify the effectiveness of the algorithm, 2 types of artificial data sets and 5 types of measured data sets are selected as the test cases. ISOMAP, LLE, HLLE, LTSA, LEIGS, PCA, and ODD–ISOMAP algorithms are applied on data sets respectively. Moreover, K-mediods clustering algorithm is performed after dimension reduction. The dimensionality reduction effect of each algorithm is evaluated by comparing the clustering accuracy and the influence degree of different amplitude noise on the accuracy. Experimental results show that the effect of ODD–ISOMAP algorithm has been significantly improved comparing with the other 6 common algorithms, and it has stronger resistance to noises as well.