Attention in Multimodal Neural Networks for Person Re-identification

Aske Rasch Lejbølle; Benjamin Krogh; Kamal Nasrollahi; Thomas B. Moeslund

doi:10.1109/CVPRW.2018.00055

Attention in Multimodal Neural Networks for Person Re-identification

Aske Rasch Lejbølle, Benjamin Krogh, Kamal Nasrollahi, Thomas B. Moeslund

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

13 Citations (Scopus)

443 Downloads (Pure)

Abstract

In spite of increasing interest from the research commu-
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

Original language	English
Title of host publication	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Number of pages	9
Publisher	IEEE
Publication date	Jun 2018
Pages	292-300
ISBN (Print)	978-1-5386-6101-7
ISBN (Electronic)	978-1-5386-6100-0
DOIs	https://doi.org/10.1109/CVPRW.2018.00055
Publication status	Published - Jun 2018
Event	IEEE Conference on Computer Vision and Pattern Recognition, 2018 - Salt Lake City, United States Duration: 18 Jun 2018 → 22 Jun 2018

Conference

Conference	IEEE Conference on Computer Vision and Pattern Recognition, 2018
Country/Territory	United States
City	Salt Lake City
Period	18/06/2018 → 22/06/2018

Series	IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
ISSN	2160-7516

Access to Document

10.1109/CVPRW.2018.00055

attention_multimodalAccepted author manuscript, 3.96 MB

http://openaccess.thecvf.com/content_cvpr_2018_workshops/w6/html/Lejbolle_Attention_in_Multimodal_CVPR_2018_paper.html

AUB Link

Search for the material in Aalborg University Library's search engine

Vision-based Person Re-identification in a Queue
Lejbølle, A. R.
01/01/2017 → 31/12/2019
Project: Research

Cite this

@inproceedings{a4dcd5c2975943e09fd3e973628ad4c6,

title = "Attention in Multimodal Neural Networks for Person Re-identification",

abstract = "In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.",

author = "Lejb{\o}lle, {Aske Rasch} and Benjamin Krogh and Kamal Nasrollahi and Moeslund, {Thomas B.}",

year = "2018",

month = jun,

doi = "10.1109/CVPRW.2018.00055",

language = "English",

isbn = "978-1-5386-6101-7",

series = "IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",

publisher = "IEEE",

pages = "292--300",

booktitle = "2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",

address = "United States",

note = "IEEE Conference on Computer Vision and Pattern Recognition, 2018, IEEE CVPR 2018 ; Conference date: 18-06-2018 Through 22-06-2018",

}

Lejbølle, AR, Krogh, B, Nasrollahi, K & Moeslund, TB 2018, Attention in Multimodal Neural Networks for Person Re-identification. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 292-300, IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, Utah, United States, 18/06/2018. https://doi.org/10.1109/CVPRW.2018.00055

Attention in Multimodal Neural Networks for Person Re-identification. / Lejbølle, Aske Rasch; Krogh, Benjamin; Nasrollahi, Kamal et al.
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2018. p. 292-300 (IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Attention in Multimodal Neural Networks for Person Re-identification

AU - Lejbølle, Aske Rasch

AU - Krogh, Benjamin

AU - Nasrollahi, Kamal

AU - Moeslund, Thomas B.

PY - 2018/6

Y1 - 2018/6

N2 - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

AB - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

U2 - 10.1109/CVPRW.2018.00055

DO - 10.1109/CVPRW.2018.00055

M3 - Article in proceeding

SN - 978-1-5386-6101-7

T3 - IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

SP - 292

EP - 300

BT - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

PB - IEEE

T2 - IEEE Conference on Computer Vision and Pattern Recognition, 2018

Y2 - 18 June 2018 through 22 June 2018

ER -

Attention in Multimodal Neural Networks for Person Re-identification

Abstract

Conference

Access to Document

AUB Link

Fingerprint

Projects

Vision-based Person Re-identification in a Queue

Cite this