引用本文:杨潇,郭一楠,吉建娇,刘旭.异构群智感知PPO多目标任务指派方法[J].控制理论与应用,2024,41(6):1056~1066.[点击复制]
YANG Xiao,GUO Yi-nan,JI Jian-jiao,LIU Xu.PPO multi-objective task allocation method for heterogeneous crowd sensing[J].Control Theory and Technology,2024,41(6):1056~1066.[点击复制]
异构群智感知PPO多目标任务指派方法
PPO multi-objective task allocation method for heterogeneous crowd sensing
摘要点击 741  全文点击 158  投稿时间:2022-10-29  修订日期:2023-05-14
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  DOI: 10.7641/CTA.2023.20950
  2024,41(6):1056-1066
中文关键词  异构群智感知  多目标优化  强化学习  近端策略优化
英文关键词  heterogeneous crowd sensing  multi-objective optimization  reinforcement learning  proximal policy optimization
基金项目  国家自然科学基金项目(61973305, U23A20340, 52121003), 国家重点研发计划项目(2022YFB4703700)资助.
作者单位E-mail
杨潇 中国矿业大学 yangxiao_cumt@cumt.edu.cn 
郭一楠* 中国矿业大学 nanfly@126.com 
吉建娇 中国矿业大学  
刘旭 中国矿业大学  
中文摘要
      现有移动群智感知系统的任务指派主要面向单一类型移动用户展开,对于存在多种类型移动用户的异构群智感知任务指派研究相对缺乏.为此,针对异质移动用户,定义其区域可达性,并给出感知子区域类型划分.进而,兼顾感知任务数量和移动用户规模的时变性,构建了动态异构群智感知系统任务指派的多目标约束优化模型.模型以最大化感知质量和最小化感知成本为目标,综合考虑用户的最大任务执行数量、无人机的受限工作时间等约束.为解决该优化问题,提出一种基于近端策略优化的多目标进化优化算法.采用近端策略优化,根据种群的当前进化状态,选取具有最高奖励值的进化算子,生成子代种群.面向不同异构群智感知实例,与多种算法的对比实验结果表明,所提算法获得的Pareto最优解集具有最佳的收敛性和分布性,进化算子选择策略可以有效提升对时变因素的适应能力,改善算法性能.
英文摘要
      The task allocation of existing mobile crowd sensing systems is mainly carried out for a single type of mobile users, but there is a lack of research on the task allocation of heterogeneous crowd sensing where there are multiple types of mobile users. Therefore, we define the area accessibility of heterogeneous mobile users, and give a classification of sensing sub-regions. Then, we construct a multi-objective constrained optimization model for task allocation of dynamic heterogeneous crowd sensing systems, taking into account the time-varying nature of the number of sensing tasks and the size of mobile users. The model aims to maximize the sensing quality and minimize the sensing cost, taking into account the maximum number of tasks to be performed by users and the restricted working time of UAVs. To solve this optimization problem, a multi-objective evolutionary optimization algorithm based on proximal policy optimization is proposed. The proximal policy optimization is used to select the evolutionary operator with the highest reward value according to the current evolutionary state of the population, and generate the offspring population. The experimental results of comparing the proposed algorithm with various algorithms for different heterogeneous crowd sensing instances show that the optimal solution set of Pareto obtained by the proposed algorithm has the best convergence and distributivity, and the evolutionary operator selection strategy can effectively improve the adaptability to time-varying factors and improve the performance of the algorithm.