165 Downloads (Pure)

Abstract

In spite of increasing interest from the research commu-
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.
Original languageEnglish
Title of host publication2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Number of pages9
PublisherIEEE
Publication dateJun 2018
Pages292-300
ISBN (Print)978-1-5386-6101-7
ISBN (Electronic)978-1-5386-6100-0
DOIs
Publication statusPublished - Jun 2018
EventIEEE Conference on Computer Vision and Pattern Recognition, 2018 - Salt Lake City, United States
Duration: 18 Jun 201822 Jun 2018

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition, 2018
CountryUnited States
CitySalt Lake City
Period18/06/201822/06/2018
SeriesIEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
ISSN2160-7516

Fingerprint

Electric fuses
Convolution
Lighting
Cameras
Neural networks
Experiments

Cite this

Lejbølle, A. R., Krogh, B., Nasrollahi, K., & Moeslund, T. B. (2018). Attention in Multimodal Neural Networks for Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 292-300). IEEE. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) https://doi.org/10.1109/CVPRW.2018.00055
Lejbølle, Aske Rasch ; Krogh, Benjamin ; Nasrollahi, Kamal ; Moeslund, Thomas B. / Attention in Multimodal Neural Networks for Person Re-identification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2018. pp. 292-300 (IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)).
@inproceedings{a4dcd5c2975943e09fd3e973628ad4c6,
title = "Attention in Multimodal Neural Networks for Person Re-identification",
abstract = "In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43{\%},2.01{\%} and 2.13{\%} on OPR, DPI-T and TVPR, respectively.",
author = "Lejb{\o}lle, {Aske Rasch} and Benjamin Krogh and Kamal Nasrollahi and Moeslund, {Thomas B.}",
year = "2018",
month = "6",
doi = "10.1109/CVPRW.2018.00055",
language = "English",
isbn = "978-1-5386-6101-7",
pages = "292--300",
booktitle = "2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)",
publisher = "IEEE",
address = "United States",

}

Lejbølle, AR, Krogh, B, Nasrollahi, K & Moeslund, TB 2018, Attention in Multimodal Neural Networks for Person Re-identification. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 292-300, IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, United States, 18/06/2018. https://doi.org/10.1109/CVPRW.2018.00055

Attention in Multimodal Neural Networks for Person Re-identification. / Lejbølle, Aske Rasch; Krogh, Benjamin; Nasrollahi, Kamal; Moeslund, Thomas B.

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2018. p. 292-300.

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - Attention in Multimodal Neural Networks for Person Re-identification

AU - Lejbølle, Aske Rasch

AU - Krogh, Benjamin

AU - Nasrollahi, Kamal

AU - Moeslund, Thomas B.

PY - 2018/6

Y1 - 2018/6

N2 - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

AB - In spite of increasing interest from the research commu-nity, person re-identification remains an unsolved problem.Correctly deciding on a true match by comparing imagesof a person, captured by several cameras, requires extrac-tion of discriminative features to counter challenges such aschanges in lighting, viewpoint and occlusion. Besides de-vising novel feature descriptors, the setup can be changedto capture persons from an overhead viewpoint rather thana horizontal. Furthermore, additional modalities can beconsidered that are not affected by similar environmentalchanges as RGB images. In this work, we present a Multi-modal ATtention network (MAT) based on RGB and depthmodalities. We combine a Convolution Neural Network withan attention module to extract local and discriminative fea-tures that are fused with globally extracted features. At-tention is based on correlation between the two modalitiesand we finally also fuse RGB and depth features to generatea joint multilevel RGB-D feature. Experiments conductedon three datasets captured from an overhead view show theimportance of attention, increasing accuracies by 3.43%,2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

U2 - 10.1109/CVPRW.2018.00055

DO - 10.1109/CVPRW.2018.00055

M3 - Article in proceeding

SN - 978-1-5386-6101-7

SP - 292

EP - 300

BT - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

PB - IEEE

ER -

Lejbølle AR, Krogh B, Nasrollahi K, Moeslund TB. Attention in Multimodal Neural Networks for Person Re-identification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. 2018. p. 292-300. (IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)). https://doi.org/10.1109/CVPRW.2018.00055