可见光–红外特征交互与融合的YOLOv5目标检测算法

解宇敏; 张浪文; 余孝源; 谢巍

引用本文:	解宇敏,张浪文,余孝源,谢巍.可见光–红外特征交互与融合的YOLOv5目标检测算法[J].控制理论与应用,2024,41(5):914~922.[点击复制]
	XIE Yu-min,ZHANG Lang-wen,YU Xiao-yuan,XIE Wei.YOLOv5 object detection algorithm with visible-infrared feature interaction and fusion[J].Control Theory and Technology,2024,41(5):914~922.[点击复制]

可见光–红外特征交互与融合的YOLOv5目标检测算法

YOLOv5 object detection algorithm with visible-infrared feature interaction and fusion

摘要点击 4545 全文点击 312 投稿时间：2022-05-30 修订日期：2022-12-12

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2023.20475

2024,41(5):914-922

中文关键词可见光图像红外图像特征融合交互 YOLOv5

英文关键词 visible images infrared images feature fusion interaction YOLOv5

基金项目国家自然科学基金项目(61803161), 广东省自然科学基金项目(2022A1515011887, 2023A1515030119), 清远市科技计划项目(2023DZX006), 佛山市重点领域科技攻关项目(2020001006812), 顺德区核心攻关项目(2030218000174), 广州市科技计划项目(202102020379), 江门市基础与应用基础研究项目(2020030103080008999)

作者	单位	E-mail
解宇敏	华南理工大学	au_ymxie@mail.scut.edu.cn
张浪文^*	华南理工大学	aulwzhang@scut.edu.cn
余孝源	华南理工大学
谢巍	华南理工大学

中文摘要

目标检测是自动驾驶系统的关键技术, 普通RGB目标检测算法在夜间和恶劣气候等场景往往表现一般, 融合可见光和红外信息的目标检测算法因而受到诸多研究关注. 现有方法通常融合结构复杂, 且忽视了模态间信息交流的重要性. 对此, 本文以YOLOv5为基本框架, 提出一种可见光–红外特征交互与融合的目标检测算法, 使用一种新的主干网络跨阶段局部(CSPDarknet53-F), 采用双分支结构分别提取可见光和红外特征. 然后, 通过特征交互模块重构各模态的信息成分和比例, 提升模态间信息交流, 使可见光和红外特征进行更充分的融合. 在FLIR-aligned和M$^3$FD数据集上的大量实验证明, 本文算法使用的CSPDarknet53-F在协同利用可见光和红外信息方面更加出色, 提升了模型精度, 同时, 拥有对抗光照强度骤变的鲁棒性

英文摘要

Object detection is the key technology of the autonomous driving system, but object detection algorithms based on RGB often perform poorly in scenarios such as nighttime and severe weather. Therefore, the object detection algorithms fusing visible and infrared information have begun to receive a lot of research attention. However, the existing methods usually have complex fusion structures and ignore the importance of information exchange between modalities. In this paper, we take YOLOv5 as the basic framework, and propose an object detection algorithm with visible-infrared feature interaction and fusion. It uses a new backbone network, CSPDarknet53-F, which uses a dual branch structure to extract visible and infrared features, respectively, and then reconstructs the information components and proportions of each mode through feature interaction modules to improve the information exchange between modalities so that visible and infrared features can be more fully integrated. Extensive experiments on the FLIR-aligned dataset and the M$^3$FD dataset show that the CSPDarknet53-F used in our algorithm is more excellent in terms of synergistically utilizing visible and infrared information, which improves the detection accuracy of the model and has robustness against sudden changes in light intensity.