No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios

Tze Ho Elden Tse; Daniele De Martini; Letizia Marchegiani

doi:10.1007/978-3-030-35888-4_17

No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios

Tze Ho Elden Tse, Daniele De Martini, Letizia Marchegiani

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

4 Citationer (Scopus)

60 Downloads (Pure)

Abstract

This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.

Originalsprog	Engelsk
Titel	Social Robotics - 11th International Conference, ICSR 2019, Proceedings
Redaktører	Miguel A. Salichs, Shuzhi Sam Ge, Emilia Ivanova Barakova, John-John Cabibihan, Alan R. Wagner, Álvaro Castro-González, Hongsheng He
Antal sider	10
Vol/bind	11876
Forlag	Springer
Publikationsdato	2019
Sider	176-185
ISBN (Trykt)	978-3-030-35887-7
ISBN (Elektronisk)	978-3-030-35888-4
DOI	https://doi.org/10.1007/978-3-030-35888-4_17
Status	Udgivet - 2019
Begivenhed	International Conference on Social Robotics - Madrid, Spanien Varighed: 26 nov. 2019 → 29 nov. 2019

Konference

Konference	International Conference on Social Robotics
Land/Område	Spanien
By	Madrid
Periode	26/11/2019 → 29/11/2019

Navn	Lecture Notes in Computer Science
ISSN	0302-9743

Adgang til dokumentet

10.1007/978-3-030-35888-4_17

Green OA manuscriptAccepteret manuskript, 3,64 MBLicens: CC BY 4.0

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=85076579580&partnerID=8YFLogxK

Citationsformater

Tse, T. H. E., De Martini, D., & Marchegiani, L. (2019). No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios. I M. A. Salichs, S. S. Ge, E. I. Barakova, J-J. Cabibihan, A. R. Wagner, Á. Castro-González, & H. He (red.), Social Robotics - 11th International Conference, ICSR 2019, Proceedings (Bind 11876, s. 176-185). Springer. https://doi.org/10.1007/978-3-030-35888-4_17

Tse, Tze Ho Elden ; De Martini, Daniele ; Marchegiani, Letizia. / No Need to Scream : Robust Sound-based Speaker Localisation in Challenging Scenarios. Social Robotics - 11th International Conference, ICSR 2019, Proceedings. red. / Miguel A. Salichs ; Shuzhi Sam Ge ; Emilia Ivanova Barakova ; John-John Cabibihan ; Alan R. Wagner ; Álvaro Castro-González ; Hongsheng He. Bind 11876 Springer, 2019. s. 176-185 (Lecture Notes in Computer Science).

@inproceedings{e7b0fbd37e5d42be838d378f1fc60f14,

title = "No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios",

abstract = "This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.",

keywords = "Speaker localisation, Speaker verification, Speech in noise",

author = "Tse, {Tze Ho Elden} and {De Martini}, Daniele and Letizia Marchegiani",

year = "2019",

doi = "10.1007/978-3-030-35888-4_17",

language = "English",

isbn = "978-3-030-35887-7",

volume = "11876",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = " 176--185",

editor = "Salichs, {Miguel A.} and Ge, {Shuzhi Sam} and Barakova, {Emilia Ivanova} and John-John Cabibihan and Wagner, {Alan R.} and {\'A}lvaro Castro-Gonz{\'a}lez and Hongsheng He",

booktitle = "Social Robotics - 11th International Conference, ICSR 2019, Proceedings",

address = "Germany",

note = "International Conference on Social Robotics, (ICSR) ; Conference date: 26-11-2019 Through 29-11-2019",

}

Tse, THE, De Martini, D & Marchegiani, L 2019, No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios. i MA Salichs, SS Ge, EI Barakova, J-J Cabibihan, AR Wagner, Á Castro-González & H He (red), Social Robotics - 11th International Conference, ICSR 2019, Proceedings. bind 11876, Springer, Lecture Notes in Computer Science, s. 176-185, International Conference on Social Robotics, Madrid, Spanien, 26/11/2019. https://doi.org/10.1007/978-3-030-35888-4_17

No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios. / Tse, Tze Ho Elden; De Martini, Daniele; Marchegiani, Letizia.
Social Robotics - 11th International Conference, ICSR 2019, Proceedings. red. / Miguel A. Salichs; Shuzhi Sam Ge; Emilia Ivanova Barakova; John-John Cabibihan; Alan R. Wagner; Álvaro Castro-González; Hongsheng He. Bind 11876 Springer, 2019. s. 176-185 (Lecture Notes in Computer Science).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - No Need to Scream

T2 - International Conference on Social Robotics

AU - Tse, Tze Ho Elden

AU - De Martini, Daniele

AU - Marchegiani, Letizia

PY - 2019

Y1 - 2019

N2 - This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.

AB - This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.

KW - Speaker localisation

KW - Speaker verification

KW - Speech in noise

UR - http://www.scopus.com/inward/record.url?scp=85076579580&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-35888-4_17

DO - 10.1007/978-3-030-35888-4_17

M3 - Article in proceeding

SN - 978-3-030-35887-7

VL - 11876

T3 - Lecture Notes in Computer Science

SP - 176

EP - 185

BT - Social Robotics - 11th International Conference, ICSR 2019, Proceedings

A2 - Salichs, Miguel A.

A2 - Ge, Shuzhi Sam

A2 - Barakova, Emilia Ivanova

A2 - Cabibihan, John-John

A2 - Wagner, Alan R.

A2 - Castro-González, Álvaro

A2 - He, Hongsheng

PB - Springer

Y2 - 26 November 2019 through 29 November 2019

ER -

Tse THE, De Martini D, Marchegiani L. No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios. I Salichs MA, Ge SS, Barakova EI, Cabibihan J-J, Wagner AR, Castro-González Á, He H, red., Social Robotics - 11th International Conference, ICSR 2019, Proceedings. Bind 11876. Springer. 2019. s. 176-185. (Lecture Notes in Computer Science). doi: 10.1007/978-3-030-35888-4_17

No Need to Scream: Robust Sound-based Speaker Localisation in Challenging Scenarios

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater