Attention in Multimodal Neural Networks for Person Re-identification

Aske Rasch Lejbølle; Benjamin Krogh; Kamal Nasrollahi; Thomas B. Moeslund

doi:10.1109/CVPRW.2018.00055

Attention in Multimodal Neural Networks for Person Re-identification

Aske Rasch Lejbølle, Benjamin Krogh, Kamal Nasrollahi, Thomas B. Moeslund

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

13 Citationer (Scopus)

449 Downloads (Pure)

Abstract

In spite of increasing interest from the research commu-
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

Originalsprog	Engelsk
Titel	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Antal sider	9
Forlag	IEEE
Publikationsdato	jun. 2018
Sider	292-300
ISBN (Trykt)	978-1-5386-6101-7
ISBN (Elektronisk)	978-1-5386-6100-0
DOI	https://doi.org/10.1109/CVPRW.2018.00055
Status	Udgivet - jun. 2018
Begivenhed	IEEE Conference on Computer Vision and Pattern Recognition, 2018 - Salt Lake City, USA Varighed: 18 jun. 2018 → 22 jun. 2018

Konference

Konference	IEEE Conference on Computer Vision and Pattern Recognition, 2018
Land/Område	USA
By	Salt Lake City
Periode	18/06/2018 → 22/06/2018

Navn	IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
ISSN	2160-7516

Adgang til dokumentet

10.1109/CVPRW.2018.00055

attention_multimodalAccepteret manuskript, 3,96 MB

http://openaccess.thecvf.com/content_cvpr_2018_workshops/w6/html/Lejbolle_Attention_in_Multimodal_CVPR_2018_paper.html

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Vision-based Person Re-identification in a Queue
Lejbølle, A. R.
01/01/2017 → 31/12/2019
Projekter: Projekt › Forskning

Citationsformater

@inproceedings{a4dcd5c2975943e09fd3e973628ad4c6,

title = "Attention in Multimodal Neural Networks for Person Re-identification",

abstract = "In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.",

author = "Lejb{\o}lle, {Aske Rasch} and Benjamin Krogh and Kamal Nasrollahi and Moeslund, {Thomas B.}",

year = "2018",

month = jun,

doi = "10.1109/CVPRW.2018.00055",

language = "English",

isbn = "978-1-5386-6101-7",

series = "IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",

publisher = "IEEE",

pages = "292--300",

booktitle = "2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",

address = "United States",

note = "IEEE Conference on Computer Vision and Pattern Recognition, 2018, IEEE CVPR 2018 ; Conference date: 18-06-2018 Through 22-06-2018",

}

Lejbølle, AR, Krogh, B, Nasrollahi, K & Moeslund, TB 2018, Attention in Multimodal Neural Networks for Person Re-identification. i 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), s. 292-300, IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, Utah, USA, 18/06/2018. https://doi.org/10.1109/CVPRW.2018.00055

Attention in Multimodal Neural Networks for Person Re-identification. / Lejbølle, Aske Rasch; Krogh, Benjamin; Nasrollahi, Kamal et al.
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2018. s. 292-300 (IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Attention in Multimodal Neural Networks for Person Re-identification

AU - Lejbølle, Aske Rasch

AU - Krogh, Benjamin

AU - Nasrollahi, Kamal

AU - Moeslund, Thomas B.

PY - 2018/6

Y1 - 2018/6

N2 - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

AB - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

U2 - 10.1109/CVPRW.2018.00055

DO - 10.1109/CVPRW.2018.00055

M3 - Article in proceeding

SN - 978-1-5386-6101-7

T3 - IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

SP - 292

EP - 300

BT - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

PB - IEEE

T2 - IEEE Conference on Computer Vision and Pattern Recognition, 2018

Y2 - 18 June 2018 through 22 June 2018

ER -

Attention in Multimodal Neural Networks for Person Re-identification

Abstract

Konference

Adgang til dokumentet

AUB Link

Fingeraftryk

Projekter

Vision-based Person Re-identification in a Queue

Citationsformater