Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

Seyed Mojtaba Marvasti Zadeh, Hossien Ghanei-Yakhdan*, Shohreh Kasaei, Kamal Nasrollahi, Thomas B. Moeslund

*Kontaktforfatter

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

1 Citationer (Scopus)
214 Downloads (Pure)

Abstract

Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This paper aims to evaluate the performance of 12 state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128, UAV-123 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking.

OriginalsprogEngelsk
TidsskriftVisual Computer
Vol/bind38
Udgave nummer12
Sider (fra-til)4397-4417
Antal sider21
ISSN0178-2789
DOI
StatusUdgivet - dec. 2022

Fingeraftryk

Dyk ned i forskningsemnerne om 'Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking'. Sammen danner de et unikt fingeraftryk.

Citationsformater