引用本文:孙博,周倩,陈海燕.以类重叠度为优化目标的不平衡数据学习方法[J].控制理论与应用,2024,41(11):2139~2146.[点击复制]
SUN Bo,ZHOU Qian,CHEN Hai-Yan.Imbalanced data learning approach with class overlap degree as the optimization goal[J].Control Theory and Technology,2024,41(11):2139~2146.[点击复制]
以类重叠度为优化目标的不平衡数据学习方法
Imbalanced data learning approach with class overlap degree as the optimization goal
摘要点击 131  全文点击 34  投稿时间:2022-02-20  修订日期:2024-08-11
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2023.20123
  2024,41(11):2139-2146
中文关键词  分类  类不平衡  欠采样  类重叠度  数据复杂性  机器学习
英文关键词  classification  class imbalance  undersampling  class overlap degree  data complexity  machine learning
基金项目  山东省自然科学基金项目(ZR2023MF098, ZR2018QF002), 山东省重大科技创新项目(2019JZZY010706)资助.
作者单位E-mail
孙博* 山东农业大学 sunbo87@126.com 
周倩 山东农业大学  
陈海燕 南京航空航天大学  
中文摘要
      分类是机器学习中的一项重要学习任务, 基本思想是使用在训练样例集上生成的分类器对测试样例的类别进行预测. 然而, 很多实际应用中的训练集具有不平衡的类分布, 这通常会制约学习算法的分类性能. 为此, 本文提出以类重叠度为优化目标的不平衡数据学习方法 (COA-RBU). 将相对类间势作为多数类样例效用的评价标准,并根据训练集的类重叠度自适应地确定合适欠采样比例, 以降低不平衡训练集的数据复杂性. 实验结果表明, 类重叠度能较好地反映数据集的学习难度, 并且COA-RBU具有良好的性能和较高的效率. 因此, 本文工作从类重叠数据复杂性角度为合适欠采样比例的确定提供了一种新的思路.
英文摘要
      Classification is an important learning task in machine learning, and it predicts the class label of a test example by employing a classifier that is learned on the training examples set. However, in many practical applications, the collected training sets have imbalanced class distribution, which usually hinders the classification performance of most classifier learning algorithms. To alleviate this problem, an imbalanced data learning approach with class overlap degree as the optimization goal (COA-RBU) is proposed in this paper. It utilizes the mutual class potential to evaluate the utility of each majority class example, and adaptively determines the proper undersampling ratio according to the class overlap degree of a training set, aiming to decrease the data complexity of the imbalanced training set. Exprimental results indicate that the class overlap degree can well reflect the learning difficulty of an imbalanced dataset, and the proposed approach COA-RBU is effective and efficient. Therefore, this work provides a novel idea for determining the proper undersampling ratio from the perspective of class overlap data complexity.