Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies

Maryam Asadi-Aghbolaghi; Hugo Bertiche; Vicent Roig; Shohreh Kasaei; Sergio Escalera

doi:10.1109/ICCVW.2017.376

Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies

Maryam Asadi-Aghbolaghi, Hugo Bertiche, Vicent Roig, Shohreh Kasaei, Sergio Escalera

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

17 Citationer (Scopus)

Abstract

In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.

Originalsprog	Engelsk
Titel	2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
Antal sider	10
Forlag	IEEE Communications Society
Publikationsdato	29 okt. 2017
Sider	3179-3188
Artikelnummer	8265587
ISBN (Trykt)	978-1-5386-1035-0
DOI	https://doi.org/10.1109/ICCVW.2017.376
Status	Udgivet - 29 okt. 2017
Udgivet eksternt	Ja
Begivenhed	2017 IEEE International Conference on Computer Vision Workshops (ICCVW) - Venice, Italy Varighed: 22 okt. 2017 → 29 okt. 2017

Konference

Konference	2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
Lokation	Venice, Italy
Periode	22/10/2017 → 29/10/2017

Adgang til dokumentet

10.1109/ICCVW.2017.376

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8265587

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

https://ieeexplore.ieee.org/document/8265587/

Citationsformater

Asadi-Aghbolaghi, M., Bertiche, H., Roig, V., Kasaei, S., & Escalera, S. (2017). Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies. I 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (s. 3179-3188). Artikel 8265587 IEEE Communications Society. https://doi.org/10.1109/ICCVW.2017.376

@inproceedings{3b4b5413ee8a4411a319edeed1358ab0,

title = "Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies",

abstract = "In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.",

keywords = "Trajectory, Optical imaging, Videos, Cameras, Three-dimensional displays, Machine learning, Computational modeling",

author = "Maryam Asadi-Aghbolaghi and Hugo Bertiche and Vicent Roig and Shohreh Kasaei and Sergio Escalera",

year = "2017",

month = oct,

day = "29",

doi = "10.1109/ICCVW.2017.376",

language = "English",

isbn = "978-1-5386-1035-0",

pages = "3179--3188",

booktitle = "2017 IEEE International Conference on Computer Vision Workshops (ICCVW)",

publisher = "IEEE Communications Society",

address = "United States",

note = "2017 IEEE International Conference on Computer Vision Workshops (ICCVW) ; Conference date: 22-10-2017 Through 29-10-2017",

}

Asadi-Aghbolaghi, M, Bertiche, H, Roig, V, Kasaei, S & Escalera, S 2017, Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies. i 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)., 8265587, IEEE Communications Society, s. 3179-3188, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 22/10/2017. https://doi.org/10.1109/ICCVW.2017.376

Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies. / Asadi-Aghbolaghi, Maryam; Bertiche, Hugo; Roig, Vicent et al.
2017 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE Communications Society, 2017. s. 3179-3188 8265587.

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies

AU - Asadi-Aghbolaghi, Maryam

AU - Bertiche, Hugo

AU - Roig, Vicent

AU - Kasaei, Shohreh

AU - Escalera, Sergio

PY - 2017/10/29

Y1 - 2017/10/29

N2 - In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.

AB - In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.

KW - Trajectory

KW - Optical imaging

KW - Videos

KW - Cameras

KW - Three-dimensional displays

KW - Machine learning

KW - Computational modeling

UR - https://ieeexplore.ieee.org/document/8265587/

U2 - 10.1109/ICCVW.2017.376

DO - 10.1109/ICCVW.2017.376

M3 - Article in proceeding

SN - 978-1-5386-1035-0

SP - 3179

EP - 3188

BT - 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

PB - IEEE Communications Society

T2 - 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

Y2 - 22 October 2017 through 29 October 2017

ER -

Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater