引用本文: | 杨潇,郭一楠,吉建娇,刘旭.异构群智感知PPO多目标任务指派方法[J].控制理论与应用,2024,41(6):1056~1066.[点击复制] |
YANG Xiao,GUO Yi-nan,JI Jian-jiao,LIU Xu.PPO multi-objective task allocation method for heterogeneous crowd sensing[J].Control Theory and Technology,2024,41(6):1056~1066.[点击复制] |
|
异构群智感知PPO多目标任务指派方法 |
PPO multi-objective task allocation method for heterogeneous crowd sensing |
摘要点击 743 全文点击 158 投稿时间:2022-10-29 修订日期:2023-05-14 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 DOI: 10.7641/CTA.2023.20950 |
2024,41(6):1056-1066 |
中文关键词 异构群智感知 多目标优化 强化学习 近端策略优化 |
英文关键词 heterogeneous crowd sensing multi-objective optimization reinforcement learning proximal policy optimization |
基金项目 国家自然科学基金项目(61973305, U23A20340, 52121003), 国家重点研发计划项目(2022YFB4703700)资助. |
|
中文摘要 |
现有移动群智感知系统的任务指派主要面向单一类型移动用户展开,对于存在多种类型移动用户的异构群智感知任务指派研究相对缺乏.为此,针对异质移动用户,定义其区域可达性,并给出感知子区域类型划分.进而,兼顾感知任务数量和移动用户规模的时变性,构建了动态异构群智感知系统任务指派的多目标约束优化模型.模型以最大化感知质量和最小化感知成本为目标,综合考虑用户的最大任务执行数量、无人机的受限工作时间等约束.为解决该优化问题,提出一种基于近端策略优化的多目标进化优化算法.采用近端策略优化,根据种群的当前进化状态,选取具有最高奖励值的进化算子,生成子代种群.面向不同异构群智感知实例,与多种算法的对比实验结果表明,所提算法获得的Pareto最优解集具有最佳的收敛性和分布性,进化算子选择策略可以有效提升对时变因素的适应能力,改善算法性能. |
英文摘要 |
The task allocation of existing mobile crowd sensing systems is mainly carried out for a single type of mobile users, but there is a lack of research on the task allocation of heterogeneous crowd sensing where there are multiple types of mobile users. Therefore, we define the area accessibility of heterogeneous mobile users, and give a classification of sensing sub-regions. Then, we construct a multi-objective constrained optimization model for task allocation of dynamic heterogeneous crowd sensing systems, taking into account the time-varying nature of the number of sensing tasks and the size of mobile users. The model aims to maximize the sensing quality and minimize the sensing cost, taking into account the maximum number of tasks to be performed by users and the restricted working time of UAVs. To solve this optimization problem, a multi-objective evolutionary optimization algorithm based on proximal policy optimization is proposed. The proximal policy optimization is used to select the evolutionary operator with the highest reward value according to the current evolutionary state of the population, and generate the offspring population. The experimental results of comparing the proposed algorithm with various algorithms for different heterogeneous crowd sensing instances show that the optimal solution set of Pareto obtained by the proposed algorithm has the best convergence and distributivity, and the evolutionary operator selection strategy can effectively improve the adaptability to time-varying factors and improve the performance of the algorithm. |
|
|
|
|
|