Abstract
In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.
Originalsprog | Engelsk |
---|---|
Titel | 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) |
Antal sider | 10 |
Forlag | IEEE Communications Society |
Publikationsdato | 29 okt. 2017 |
Sider | 3179-3188 |
Artikelnummer | 8265587 |
ISBN (Trykt) | 978-1-5386-1035-0 |
DOI | |
Status | Udgivet - 29 okt. 2017 |
Udgivet eksternt | Ja |
Begivenhed | 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) - Venice, Italy Varighed: 22 okt. 2017 → 29 okt. 2017 |
Konference
Konference | 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) |
---|---|
Lokation | Venice, Italy |
Periode | 22/10/2017 → 29/10/2017 |