TY - JOUR
T1 - Action detection fusing multiple Kinects and a WIMU
T2 - an application to in-home assistive technology for the elderly
AU - Clapés, Albert
AU - Pardo, Àlex
AU - Pujol Vila, Oriol
AU - Escalera, Sergio
N1 - Funding Information:
This work was partly supported by the spanish project TIN2016-74946-P and CERCA Programme / Generalitat de Catalunya. The work of Albert Clapés was supported by SUR-DEC of the Generalitat de Catalunya and FSE. We would also like to thank the SARQuavitae Claret elder home and all the people who volunteered for the recording of the dataset.
Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - We present a vision-inertial system which combines two RGB-Depth devices together with a wearable inertial movement unit in order to detect activities of the daily living. From multi-view videos, we extract dense trajectories enriched with a histogram of normals description computed from the depth cue and bag them into multi-view codebooks. During the later classification step a multi-class support vector machine with a RBF-X2 kernel combines the descriptions at kernel level. In order to perform action detection from the videos, a sliding window approach is utilized. On the other hand, we extract accelerations, rotation angles, and jerk features from the inertial data collected by the wearable placed on the user’s dominant wrist. During gesture spotting, a dynamic time warping is applied and the aligning costs to a set of pre-selected gesture sub-classes are thresholded to determine possible detections. The outputs of the two modules are combined in a late-fusion fashion. The system is validated in a real-case scenario with elderly from an elder home. Learning-based fusion results improve the ones from the single modalities, demonstrating the success of such multimodal approach.
AB - We present a vision-inertial system which combines two RGB-Depth devices together with a wearable inertial movement unit in order to detect activities of the daily living. From multi-view videos, we extract dense trajectories enriched with a histogram of normals description computed from the depth cue and bag them into multi-view codebooks. During the later classification step a multi-class support vector machine with a RBF-X2 kernel combines the descriptions at kernel level. In order to perform action detection from the videos, a sliding window approach is utilized. On the other hand, we extract accelerations, rotation angles, and jerk features from the inertial data collected by the wearable placed on the user’s dominant wrist. During gesture spotting, a dynamic time warping is applied and the aligning costs to a set of pre-selected gesture sub-classes are thresholded to determine possible detections. The outputs of the two modules are combined in a late-fusion fashion. The system is validated in a real-case scenario with elderly from an elder home. Learning-based fusion results improve the ones from the single modalities, demonstrating the success of such multimodal approach.
KW - Assistive technology
KW - Computer vision
KW - Dense trajectories
KW - Dynamic time warping
KW - Inertial sensors
KW - Multimodal activity detection
UR - http://www.scopus.com/inward/record.url?scp=85046436864&partnerID=8YFLogxK
U2 - 10.1007/s00138-018-0931-1
DO - 10.1007/s00138-018-0931-1
M3 - Journal article
AN - SCOPUS:85046436864
SN - 0932-8092
VL - 29
SP - 765
EP - 788
JO - Machine Vision and Applications
JF - Machine Vision and Applications
IS - 5
ER -