引用本文:吕政,赵珺,刘颖,王伟.基于最大方差权信息系数的煤气数据填补[J].控制理论与应用,2015,32(5):646~654.[点击复制]
LV Zheng,ZHAO Jun,LIU Ying,WANG Wei.Missing data imputation based on maximal variance weight information coefficient for gas flow in steel industry[J].Control Theory and Technology,2015,32(5):646~654.[点击复制]
基于最大方差权信息系数的煤气数据填补
Missing data imputation based on maximal variance weight information coefficient for gas flow in steel industry
摘要点击 2857  全文点击 1260  投稿时间:2014-09-06  修订日期:2014-12-31
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2015.40828
  2015,32(5):646-654
中文关键词  冶金能源系统  数据填补  样本筛选  最大方差权信息系数
英文关键词  energy system of steel industry  data imputation  sample selection  maximal variance weight information coefficient
基金项目  国家“863”计划项目(2013AA040703), 国家自然科学基金项目(61034003, 61304213, 61104157, 61273037, 61473056), 中央高校基本科研业务费 专项资金项目(DUT13RC203)资助.
作者单位E-mail
吕政 大连理工大学 控制科学与工程学院 lvzheng@mail.dlut.edu.cn 
赵珺* 大连理工大学 控制科学与工程学院 zhaoj@dlut.edu.cn 
刘颖 大连理工大学 控制科学与工程学院  
王伟 大连理工大学 控制科学与工程学院  
中文摘要
      在基于数据的挖掘、建模与优化领域, 数据的完整性与准确性是进行此类研究的基础. 鉴于冶金能源系统 的复杂性和现场数据采集过程易受干扰的特点, 其数据在获取过程中极易发生数据缺失的现象, 从而造成模型无法 建立, 隐含信息无法准确挖掘等情况. 本文针对钢铁企业副产煤气的发生、消耗流量数据出现的缺失情况, 通过分 析相似工况下能源流量数据的相关特性, 提出一种基于最大方差权信息系数的冶金企业副产煤气系统流量数据填 补方法. 该方法针对现场经常发生的两类数据缺失情况, 即数据点间断缺失和数据长时间连续缺失, 以最大方差权 信息系数作为样本筛选准则, 并采用基于核学习的方法对缺失数据进行填补. 为验证本文提出的数据填补方法的有 效性, 本文对上海宝钢高炉、焦炉和冷热轧用户的实际生产数据的运行试验, 结果表明该方法相比其他的方法在填 补精度上有很大优势.
英文摘要
      In data-driven-based modeling and optimization, the completeness and the accuracy of data are the foundations for further research tasks. Since the energy system of steel industry is rather complicate and its data-acquisition process might be frequently affected by the malfunctions of data transportation, storage and transformation, the data-missing phenomenon usually occurs, which might lead to the failure of model building or accurate information discovery. In this study, by analyzing the correlation of the energy data with respect to corresponding operation conditions in manufacturing, a data imputation method for the missing data in the byproduct gas flow is proposed. In this method, the proposed maximal variance weight information coefficient (MVWIC) is adopted as the sample selection criteria to realize the data imputation by using the kernel-learning-based method. To validate the proposed method, two types of missing modes that frequently occurred in steel industry are considered here, i.e., the intermittent missing and the long-term continuous missing. A series of experiments using the practical energy data indicates that the proposed method exhibits good performance on the imputation accuracy when compared to other methods.