Multi-modal RGB–Depth–Thermal Human Body Segmentation

Cristina Palmero, Albert Clapés, Chris Bahnsen, Andreas Møgelmose, Thomas B. Moeslund, Sergio Escalera

Research output: Contribution to journalJournal articleResearchpeer-review

13 Citations (Scopus)
607 Downloads (Pure)

Abstract

This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB-Depth-Thermal dataset along with a multi-modal seg- mentation baseline. The several modalities are registered us- ing a calibration device and a registration algorithm. Our baseline extracts regions of interest using background sub- traction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells us- ing different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilis- tic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector rep- resentation. The baseline, using Gaussian Mixture Models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art meth- ods, obtaining an overlap above 75% on the novel dataset when compared to the manually annotated ground-truth of human segmentations.
Original languageEnglish
JournalInternational Journal of Computer Vision
Volume118
Issue number2
Pages (from-to)217-239
ISSN0920-5691
DOIs
Publication statusPublished - 13 Apr 2016

Fingerprint

Supervised learning
Electric fuses
Learning algorithms
Feature extraction
Calibration
Statistical Models
Hot Temperature

Cite this

@article{f29922b927d44eebb54cc2d420b82a42,
title = "Multi-modal RGB–Depth–Thermal Human Body Segmentation",
abstract = "This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB-Depth-Thermal dataset along with a multi-modal seg- mentation baseline. The several modalities are registered us- ing a calibration device and a registration algorithm. Our baseline extracts regions of interest using background sub- traction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells us- ing different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilis- tic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector rep- resentation. The baseline, using Gaussian Mixture Models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art meth- ods, obtaining an overlap above 75{\%} on the novel dataset when compared to the manually annotated ground-truth of human segmentations.",
author = "Cristina Palmero and Albert Clap{\'e}s and Chris Bahnsen and Andreas M{\o}gelmose and Moeslund, {Thomas B.} and Sergio Escalera",
year = "2016",
month = "4",
day = "13",
doi = "10.1007/s11263-016-0901-x",
language = "English",
volume = "118",
pages = "217--239",
journal = "International Journal of Computer Vision",
issn = "0920-5691",
publisher = "Springer",
number = "2",

}

Multi-modal RGB–Depth–Thermal Human Body Segmentation. / Palmero, Cristina; Clapés, Albert; Bahnsen, Chris; Møgelmose, Andreas; Moeslund, Thomas B.; Escalera, Sergio.

In: International Journal of Computer Vision, Vol. 118, No. 2, 13.04.2016, p. 217-239.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Multi-modal RGB–Depth–Thermal Human Body Segmentation

AU - Palmero, Cristina

AU - Clapés, Albert

AU - Bahnsen, Chris

AU - Møgelmose, Andreas

AU - Moeslund, Thomas B.

AU - Escalera, Sergio

PY - 2016/4/13

Y1 - 2016/4/13

N2 - This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB-Depth-Thermal dataset along with a multi-modal seg- mentation baseline. The several modalities are registered us- ing a calibration device and a registration algorithm. Our baseline extracts regions of interest using background sub- traction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells us- ing different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilis- tic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector rep- resentation. The baseline, using Gaussian Mixture Models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art meth- ods, obtaining an overlap above 75% on the novel dataset when compared to the manually annotated ground-truth of human segmentations.

AB - This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB-Depth-Thermal dataset along with a multi-modal seg- mentation baseline. The several modalities are registered us- ing a calibration device and a registration algorithm. Our baseline extracts regions of interest using background sub- traction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells us- ing different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilis- tic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector rep- resentation. The baseline, using Gaussian Mixture Models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art meth- ods, obtaining an overlap above 75% on the novel dataset when compared to the manually annotated ground-truth of human segmentations.

U2 - 10.1007/s11263-016-0901-x

DO - 10.1007/s11263-016-0901-x

M3 - Journal article

VL - 118

SP - 217

EP - 239

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 2

ER -