引用本文:张玉瑶,匡森.基于熵和不等概率的量子强化学习控制[J].控制理论与应用,2024,41(12):2277~2285.[点击复制]
ZHANG Yu-yao,KUANG Sen.Quantum reinforcement learning control based on entropy and unequal probability[J].Control Theory and Technology,2024,41(12):2277~2285.[点击复制]
基于熵和不等概率的量子强化学习控制
Quantum reinforcement learning control based on entropy and unequal probability
摘要点击 2915  全文点击 54  投稿时间:2023-01-05  修订日期:2024-07-10
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2023.30007
  2024,41(12):2277-2285
中文关键词  强化学习  动作选择策略    不等概率  量子态制备
英文关键词  reinforcement learning  action selection strategy  entropy  unequal probability  quantum state preparation
基金项目  国家自然科学基金项目(62373342)资助.
作者单位E-mail
张玉瑶 中国科学技术大学 zhang98@mail.ustc.edu.cn 
匡森* 中国科学技术大学 skuang@ustc.edu.cn 
中文摘要
      复杂量子系统的高精度控制是实现量子计算和量子信息处理的关键技术之一. 深度强化学习算法已经应用到量子控制问题中, 可以为不同的量子系统设计最优策略. 为实现量子系统快速高精度的量子态制备, 本文提出一种基于熵和不等概率的深度强化学习算法, 其中引入了信息论中熵的概念以改进动作选择策略. 通过当前状态的动作值得到该状态的熵值, 并根据熵值选择进行“探索”(exploration)或者“利用”(exploitation), 其中针对“利用”采用不等概率进行随机选择动作. 所提强化学习算法中的智能体(agent)对于学习程度充分的状态专注于利用, 对于学习程度非充分的状态则专注于探索, 直到完成任务. 在量子位系统上的数值仿真结果表明, 与传统的强化学习算法相比, 本文算法能够以更快的收敛速度和保真度实现本征态和纠缠态的制备.
英文摘要
      High-precision control of complicated quantum systems is one of the key technologies for realizing quantum computing and quantum information processing. Deep reinforcement learning algorithms have been applied to quantum control problems to design optimal strategies for various quantum systems. In order to achieve rapid and accurate quantum state preparation, a deep reinforcement learning algorithm based on entropy and unequal probability is proposed, where action selection strategy is improved by introducing the notion of entropy from information theory. The entropy value of the current state is obtained through its action value and “exploration”or “exploitation”is determined based on the entropy value, where the unequal probability is employed to randomly select actions for “exploitation”. The agent in the proposed reinforcement learning algorithm focuses on exploitation for sufficiently learned states and on exploration for non-sufficiently learned states, until the task is accomplished. Numerical simulation results on qubit systems show that the proposed algorithm achieves the preparation of eigenstates and entangled states with faster convergence speed and higher fidelities with respect to the conventional reinforcement learning algorithms.