4 Citations (Scopus)
122 Downloads (Pure)


Person re-identification requires extraction of discriminative features to ensure a correct match; this must be done independent of challenges, such as occlusion, view, or lighting changes. While occlusion can be eliminated by changing the camera setup from a horizontal to a vertical (overhead) viewpoint, other challenges arise as the total visible surface area of persons is decreased. As a result, methods that focus on the most discriminative regions of persons must be applied, while different domains should also be considered to extract different semantics. To further increase feature discriminability, complementary features extracted at different abstraction levels should be fused. To emphasize features at certain abstraction levels depending on the input, fusion should be done intelligently. This work considers multiple domains and feature discrimination, where a multimodal convolution neural network is applied to fuse RGB and depth information. To extract multilocal discriminative features, two different attention modules are proposed: (1) a spatial attention module, which is able to capture local information at different abstraction levels, and (2) a layer-wise attention module, which works as a dynamic weighting scheme to assign weights and fuse local abstraction-level features intelligently, depending on the input image. By fusing local and global features in a multimodal context, we show state-of-the-art accuracies on two publicly available datasets, DPI-T and TVPR, while increasing the state-of-the-art accuracy on a third dataset, OPR. Finally, through both visual and quantitative analysis, we show the ability of the proposed system to leverage multiple frames, by adapting feature weighting depending on the input.
Original languageEnglish
Article number8826013
JournalI E E E Transactions on Information Forensics and Security
Pages (from-to)1216 - 1231
Number of pages16
Publication statusPublished - 5 Sep 2019


  • Artificial neural networks
  • Dynamic feature fusion
  • Multimodal sensors
  • Person re-identification
  • Soft attention
  • dynamic feature fusion
  • soft attention
  • multimodal sensors
  • person re-identification

Fingerprint Dive into the research topics of 'Person Re-identification Using Spatial and Layer-Wise Attention'. Together they form a unique fingerprint.

Cite this