Projekter pr. år
Abstract
In spite of increasing interest from the research commu-
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.
nity, person re-identification remains an unsolved problem.
Correctly deciding on a true match by comparing images
of a person, captured by several cameras, requires extrac-
tion of discriminative features to counter challenges such as
changes in lighting, viewpoint and occlusion. Besides de-
vising novel feature descriptors, the setup can be changed
to capture persons from an overhead viewpoint rather than
a horizontal. Furthermore, additional modalities can be
considered that are not affected by similar environmental
changes as RGB images. In this work, we present a Multi-
modal ATtention network (MAT) based on RGB and depth
modalities. We combine a Convolution Neural Network with
an attention module to extract local and discriminative fea-
tures that are fused with globally extracted features. At-
tention is based on correlation between the two modalities
and we finally also fuse RGB and depth features to generate
a joint multilevel RGB-D feature. Experiments conducted
on three datasets captured from an overhead view show the
importance of attention, increasing accuracies by 3.43%,
2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.
Originalsprog | Engelsk |
---|---|
Titel | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
Antal sider | 9 |
Forlag | IEEE |
Publikationsdato | jun. 2018 |
Sider | 292-300 |
ISBN (Trykt) | 978-1-5386-6101-7 |
ISBN (Elektronisk) | 978-1-5386-6100-0 |
DOI | |
Status | Udgivet - jun. 2018 |
Begivenhed | IEEE Conference on Computer Vision and Pattern Recognition, 2018 - Salt Lake City, USA Varighed: 18 jun. 2018 → 22 jun. 2018 |
Konference
Konference | IEEE Conference on Computer Vision and Pattern Recognition, 2018 |
---|---|
Land/Område | USA |
By | Salt Lake City |
Periode | 18/06/2018 → 22/06/2018 |
Navn | IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
---|---|
ISSN | 2160-7516 |
Fingeraftryk
Dyk ned i forskningsemnerne om 'Attention in Multimodal Neural Networks for Person Re-identification'. Sammen danner de et unikt fingeraftryk.Projekter
- 1 Afsluttet
-
Vision-based Person Re-identification in a Queue
Lejbølle, A. R.
01/01/2017 → 31/12/2019
Projekter: Projekt › Forskning