TY - GEN
T1 - No Need to Scream
T2 - International Conference on Social Robotics
AU - Tse, Tze Ho Elden
AU - De Martini, Daniele
AU - Marchegiani, Letizia
PY - 2019
Y1 - 2019
N2 - This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.
AB - This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.
KW - Speaker localisation
KW - Speaker verification
KW - Speech in noise
UR - http://www.scopus.com/inward/record.url?scp=85076579580&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-35888-4_17
DO - 10.1007/978-3-030-35888-4_17
M3 - Article in proceeding
SN - 978-3-030-35887-7
VL - 11876
T3 - Lecture Notes in Computer Science
SP - 176
EP - 185
BT - Social Robotics - 11th International Conference, ICSR 2019, Proceedings
A2 - Salichs, Miguel A.
A2 - Ge, Shuzhi Sam
A2 - Barakova, Emilia Ivanova
A2 - Cabibihan, John-John
A2 - Wagner, Alan R.
A2 - Castro-González, Álvaro
A2 - He, Hongsheng
PB - Springer
Y2 - 26 November 2019 through 29 November 2019
ER -