Recently, the memory mechanism has been widely implemented in target tracking. However, these trackers hardly balance the stability of long-term memory with the plasticity of short-term memory through an elegant and efficient mechanism. A residual memory inference network (RMIT) is proposed to exploit the history of target states and last visual features. Specifically, RMIT consists of a base layer and a residual memory layer by synergizing short-and long-term memories. The base layer can be regarded as Discriminative Correlation Filter (DCF) reformulation that maintains the short-term memory to accommodate rapid appearance changes. The residual memory layer can extend residual learning from the spatial domain to the Spatio-temporal domain via ConvLSTM to obtain long-term memory of the target appearance. To avoid model degradation due to sample imbalance, we introduce a weighted gradient harmonized loss to improve the discrimination of the tracker. Then, response scores can be served as a basis of the adaptive learning strategy to ensure the reliability of memory updates. The proposed method performs favorably and has been extensively validated on six benchmark datasets, including OTB-50/100, TC-128, UAV-123, and VOT-2016/2018 against several advanced methods.
Bibliografisk notePublisher Copyright:
© 2022 Elsevier Inc.