Projects per year
Abstract
In spite of increasing interest from the research commu-
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.
Original language | English |
---|---|
Title of host publication | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
Number of pages | 9 |
Publisher | IEEE |
Publication date | Jun 2018 |
Pages | 292-300 |
ISBN (Print) | 978-1-5386-6101-7 |
ISBN (Electronic) | 978-1-5386-6100-0 |
DOIs | |
Publication status | Published - Jun 2018 |
Event | IEEE Conference on Computer Vision and Pattern Recognition, 2018 - Salt Lake City, United States Duration: 18 Jun 2018 → 22 Jun 2018 |
Conference
Conference | IEEE Conference on Computer Vision and Pattern Recognition, 2018 |
---|---|
Country/Territory | United States |
City | Salt Lake City |
Period | 18/06/2018 → 22/06/2018 |
Series | IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
---|---|
ISSN | 2160-7516 |
Fingerprint
Dive into the research topics of 'Attention in Multimodal Neural Networks for Person Re-identification'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Vision-based Person Re-identification in a Queue
Lejbølle, A. R.
01/01/2017 → 31/12/2019
Project: Research