Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

German Barquero; Johnny Núñez; Zhen Xu; Sergio Escalera; Wei Wei Tu; Isabelle Guyon; Cristina Palmero

Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

German Barquero, Johnny Núñez, Zhen Xu, Sergio Escalera, Wei Wei Tu, Isabelle Guyon, Cristina Palmero

Research output: Contribution to journal › Conference article in Journal › Research › peer-review

Abstract

Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.

Original language	English
Book series	Proceedings of Machine Learning Research
Volume	173
Pages (from-to)	107-138
Number of pages	32
ISSN	2640-3498
Publication status	Published - 2021
Event	ChaLearn LAP Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions Workshop, DYAD 2021, held in conjunction with the International Conference on Computer Vision, ICCV 2021 - Virtual, Online Duration: 16 Oct 2021 → …

Conference

Conference	ChaLearn LAP Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions Workshop, DYAD 2021, held in conjunction with the International Conference on Computer Vision, ICCV 2021
City	Virtual, Online
Period	16/10/2021 → …

Bibliographical note

Funding Information:
Isabelle Guyon was supported by ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022. This work has been partially supported by the Spanish project PID2019-105093GB-I00 and by ICREA under the ICREA Academia programme.

Publisher Copyright:
© 2022 G. Barquero, J. Núñez, Z. Xu, S. Escalera, W.-W. Tu, I. Guyon & C. Palmero.

Keywords

Behavior forecasting
Dyadic interaction
Human motion prediction
Human pose forecasting
Multimodal forecasting

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{423cf98349114766aaaf26333709d090,

title = "Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios",

abstract = "Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.",

keywords = "Behavior forecasting, Dyadic interaction, Human motion prediction, Human pose forecasting, Multimodal forecasting",

author = "German Barquero and Johnny N{\'u}{\~n}ez and Zhen Xu and Sergio Escalera and Tu, {Wei Wei} and Isabelle Guyon and Cristina Palmero",

note = "Funding Information: Isabelle Guyon was supported by ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022. This work has been partially supported by the Spanish project PID2019-105093GB-I00 and by ICREA under the ICREA Academia programme. Publisher Copyright: {\textcopyright} 2022 G. Barquero, J. N{\'u}{\~n}ez, Z. Xu, S. Escalera, W.-W. Tu, I. Guyon & C. Palmero.; ChaLearn LAP Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions Workshop, DYAD 2021, held in conjunction with the International Conference on Computer Vision, ICCV 2021 ; Conference date: 16-10-2021",

year = "2021",

language = "English",

volume = "173",

pages = "107--138",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

}

TY - GEN

T1 - Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

AU - Barquero, German

AU - Núñez, Johnny

AU - Xu, Zhen

AU - Escalera, Sergio

AU - Tu, Wei Wei

AU - Guyon, Isabelle

AU - Palmero, Cristina

N1 - Funding Information: Isabelle Guyon was supported by ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022. This work has been partially supported by the Spanish project PID2019-105093GB-I00 and by ICREA under the ICREA Academia programme. Publisher Copyright: © 2022 G. Barquero, J. Núñez, Z. Xu, S. Escalera, W.-W. Tu, I. Guyon & C. Palmero.

PY - 2021

Y1 - 2021

N2 - Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.

AB - Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.

KW - Behavior forecasting

KW - Dyadic interaction

KW - Human motion prediction

KW - Human pose forecasting

KW - Multimodal forecasting

UR - http://www.scopus.com/inward/record.url?scp=85163830540&partnerID=8YFLogxK

M3 - Conference article in Journal

AN - SCOPUS:85163830540

SN - 2640-3498

VL - 173

SP - 107

EP - 138

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - ChaLearn LAP Challenge on Understanding Social Behavior in Dyadic and Small Group Interactions Workshop, DYAD 2021, held in conjunction with the International Conference on Computer Vision, ICCV 2021

Y2 - 16 October 2021

ER -

Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

Abstract

Conference

Bibliographical note

Keywords

AUB Link

Other files and links

Fingerprint

Cite this