Projekter pr. år
Abstract
Equipping robots with the ability to identify who is talking to them is an important step towards natural and effective verbal interaction. However, speaker identification for voice control remains largely unexplored compared to recent progress in natural language instruction and speech recognition. This motivates us to tackle text-independent speaker identification for human-robot interaction applications in industrial environments. By representing audio segments as time-frequency spectrograms, this can be formulated as an image classification task, allowing us to apply state-of-the-art convolutional neural network (CNN) architectures. To achieve robust prediction in unconstrained, challenging acoustic conditions, we take a data-driven approach and collect a custom dataset with a far-field microphone array, featuring over 3 hours of "in the wild"audio recordings from six speakers, which are then encoded into spectral images for CNN-based classification. We propose a shallow 3-layer CNN, which we compare with the widely used ResNet-18 architecture: in addition to benchmarking these models in terms of accuracy, we visualize the features used by these two models to discriminate between classes, and investigate their reliability in unseen acoustic scenes. Although ResNet-18 reaches the highest raw accuracy, we are able to achieve remarkable online speaker recognition performance with a much more lightweight model which learns lower-level vocal features and produces more reliable confidence scores. The proposed method is successfully integrated into a robotic dialogue system and showcased in a mock user localization and authentication scenario in a realistic industrial environment: https://youtu.be/IVtZ8LKJZ7A.
Originalsprog | Engelsk |
---|---|
Titel | 2021 30th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2021 |
Antal sider | 7 |
Forlag | IEEE |
Publikationsdato | 8 aug. 2021 |
Sider | 272-278 |
Ansøger | Horizon Europe |
ISBN (Elektronisk) | 9781665404921 |
DOI | |
Status | Udgivet - 8 aug. 2021 |
Begivenhed | 30th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2021 - Virtual, Vancouver, Canada Varighed: 8 aug. 2021 → 12 aug. 2021 |
Konference
Konference | 30th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2021 |
---|---|
Land/Område | Canada |
By | Virtual, Vancouver |
Periode | 08/08/2021 → 12/08/2021 |
Navn | IEEE RO-MAN proceedings |
---|---|
ISSN | 1944-9445 |
Bibliografisk note
Publisher Copyright:© 2021 IEEE.
Fingeraftryk
Dyk ned i forskningsemnerne om 'Why talk to people when you can talk to robots? Far-field speaker identification in the wild'. Sammen danner de et unikt fingeraftryk.Projekter
- 2 Afsluttet
-
chARmER: Assistive Robotic Disassembly System for Recycling
Hjorth, S., Chrysostomou, D., Bøgh, S., Madsen, O. & Arexolaleiba, N. A.
01/02/2020 → 01/02/2023
Projekter: Projekt › Forskning
-
R2P2: Networking for Research and Development of Human Interactive and Sensitive Robotics taking advantage of Additive Manufacturing
Chrysostomou, D., LI, C., Arexolaleiba, N. A. & Madsen, O.
01/01/2020 → 31/12/2022
Projekter: Projekt › Forskning
Aktiviteter
- 1 Konferenceoplæg
-
RO-MAN 2021 panel - HRI and Collaboration in Manufacturing Environments
Galadrielle Humblot-Renaux (Oplægsholder)
9 aug. 2021Aktivitet: Foredrag og mundtlige bidrag › Konferenceoplæg