TY - GEN
T1 - Multi-modal gesture recognition challenge 2013
T2 - 2013 15th ACM International Conference on Multimodal Interaction, ICMI 2013
AU - Escalera, Sergio
AU - Gonzàlez, Jordi
AU - Baró, Xavier
AU - Reyes, Miguel
AU - Lopes, Oscar
AU - Guyon, Isabelle
AU - Athitsos, Vassilis
AU - Escalante, Hugo
PY - 2013
Y1 - 2013
N2 - The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a Kinect camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1.720.800 frames. In addition to the 20 main gesture categories, "distracter" gestures are included, meaning that additional audio and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results were obtained by the first ranked participants.
AB - The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a Kinect camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1.720.800 frames. In addition to the 20 main gesture categories, "distracter" gestures are included, meaning that additional audio and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results were obtained by the first ranked participants.
KW - computer vision
KW - gesture recognition
KW - multi-modal data analysis
UR - http://www.scopus.com/inward/record.url?scp=84892583619&partnerID=8YFLogxK
U2 - 10.1145/2522848.2532595
DO - 10.1145/2522848.2532595
M3 - Article in proceeding
AN - SCOPUS:84892583619
SN - 9781450321297
T3 - ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction
SP - 445
EP - 452
BT - ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction
Y2 - 9 December 2013 through 13 December 2013
ER -