引用本文: | 葛瑞,王朝晖,徐鑫,季怡,刘纯平,龚声蓉.基于多层卷积神经网络特征和双向长短时记忆单元的行为识别[J].控制理论与应用,2017,34(6):790~796.[点击复制] |
GE Rui,WANG Zhao-hui,XU Xin,JI Yi,LIU Chun-ping,GONG Sheng-rong.Action recognition with hierarchical convolutional neural networks features and bi-directional long short-term memory model[J].Control Theory and Technology,2017,34(6):790~796.[点击复制] |
|
基于多层卷积神经网络特征和双向长短时记忆单元的行为识别 |
Action recognition with hierarchical convolutional neural networks features and bi-directional long short-term memory model |
摘要点击 3826 全文点击 3040 投稿时间:2016-08-12 修订日期:2017-05-26 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2017.60607 |
2017,34(6):790-796 |
中文关键词 行为识别 卷积神经网络 递归神经网络 双向递归神经网络 |
英文关键词 action recognition convolutional neural networks recurrent neural networks bi-directional recurrent neural networks |
基金项目 国家自然科学基金;省自然科学基金 |
|
中文摘要 |
鲁棒的视频行为识别由于其复杂性成为了一项极具挑战的任务. 如何有效提取鲁棒的时空特征成为解决问题的关键. 在本文中, 提出使用双向长短时记忆单元(Bi--LSTM)作为主要框架去捕获视频序列的双向时空特征. 首先, 为了增强特征表达, 使用多层的卷积神经网络特征代替传统的手工特征. 多层卷积特征融合了低层形状信息和高层语义信息, 能够捕获丰富的空间信息. 然后, 将提取到的卷积特征输入Bi--LSTM, Bi--LSTM包含两个不同方向的LSTM层. 前向层从前向后捕获视频演变, 后向层反方向建模视频演变. 最后两个方向的演变表达融合到Softmax中, 得到最后的分类结果. 在UCF101和HMDB51数据集上的实验结果显示本文的方法在行为识别上可以取得较好的性能. |
英文摘要 |
Robust action recognition in videos is a challenging task due to its complexity. To solve it, how to effectively capture the robust spatio-temporal features becomes very important. In this paper, we propose to exploit bi-directional long short-term memory (Bi--LSTM) model as main framework to capture bi-directional spatio-temporal features. First, in order to boost our feature representations, the traditional hand-crafted descriptors are replaced by the extracted hierarchical convolutional neural network features. The multiple convolutional layer features fuse the information of low level basic shapes and high level semantic contents to get powerful spatial features. Then, the extracted convolutional features are fed into Bi--LSTM which has two different directional LSTM layers. The forward layer captures the evolution from front to back over video time and the backward layer models the opposite directional evolution. The two directional representations of evolution are then fused into Softmax to get final classification result. The experiments on UCF101 and HMDB51 datasets show that our method can achieve comparable performance with the state of the art methods for action recognition. |
|
|
|
|
|