带有微分项改进的自适应梯度下降优化算法

葛泉波; 张建朝; 杨秦敏; 李宏

引用本文:	葛泉波,张建朝,杨秦敏,李宏.带有微分项改进的自适应梯度下降优化算法[J].控制理论与应用,2022,39(4):623~632.[点击复制]
	GE Quan-bo,ZHANG Jian-chao,YANG Qin-min,LI Hong.Adaptive gradient descent optimization algorithm with improved differential term[J].Control Theory & Applications,2022,39(4):623~632.[点击复制]

带有微分项改进的自适应梯度下降优化算法

Adaptive gradient descent optimization algorithm with improved differential term

摘要点击 2650 全文点击 869 投稿时间：2021-01-18 修订日期：2021-05-19

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.10061

2022,39(4):623-632

中文关键词卷积神经网络梯度下降算法微分项权重更新自适应学习率悔界

英文关键词 convolutional neural networks gradient descent algorithm differential term weight update adaptive learning rate regret bound

基金项目中国航空科学基金项目(2019460T5001)资助.

作者	单位	邮编
葛泉波	同济大学	201804
张建朝	杭州电子科技大学
杨秦敏^*	浙江大学	310027
李宏	中国飞行试验研究院

中文摘要

梯度下降算法作为卷积神经网络训练常用优化算法, 其性能的优劣直接影响网络训练收敛性. 本文主要分析了目前梯度优化算法中存在超调而影响收敛性问题以及学习率自适应性问题, 提出了一种带微分项的自适应梯度优化算法, 旨在改善网络优化过程收敛性的同时提高收敛速率. 首先, 针对优化过程存在较大超调量的问题, 通过对迭代算法的重整合以及结合传统控制学原理引入微分项等方式来克服权重更新滞后于实际梯度改变的问题; 然后, 引入自适应机制来应对因学习率的不适应性导致的收敛率差和收敛速率慢等问题; 紧接着, 基于柯西–施瓦茨和杨氏不等式等证明了新算法的最差性能上界(悔界)为O(√T). 最后, 通过在包括MNIST数据集以及CIFAR–10基准数据集上的仿真实验来验证新算法的有效性, 结果表明新算法引入的微分项和自适应机制的联合模式能够有效地改善梯度下降算算法的收敛性能, 从而实现算法性能的明显改善.

英文摘要

Gradient descent algorithms are common optimization algorithms for neural networks training, whose performance directly affects the convergence of the network training. This article mainly analyzes the problem of overshoots, which affects the convergence of the gradient optimization algorithm, and the problem of self adaptability of learning rate. An adaptive gradient optimization algorithm with differential term is proposed, which aims to improve the convergence rate of network optimization process as well as the convergence rate. Firstly, aiming at the problem of large overshoots in the optimization process, this article re-integrates the iterative algorithms by introducing the differential term combined with the traditional control theory, the shortcoming that the weight update lags behind the actual gradient change is overcome; Secondly, this article introduces an adaptive mechanism to improve the poor and slow convergence rate which are due to the unsuitable learning rate. Thirdly, this article proves that the regret bound O(√T) is achieved by using the Cauchy-Schwarz and Young’s inequalities. Finally, the effectiveness of the proposed method are verified by experimental tests on MNIST and CIFAR–10 benchmark datasets. The results show that the convergence performance of gradient descent methods can be improved significantly by introducing the differential term and adaptive mechanism, which can realize the significant improvement of optimization algorithms.