Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

Md Sahidullah, Dennis Alexander Lehmann Thomsen, Rosa Gonzalez Hautamaki, Tomi Kinnunen, Zheng-Hua Tan, Robert Parts, Martti Pitkänen

Research output: Contribution to journalJournal articleResearchpeer-review

5 Citations (Scopus)
36 Downloads (Pure)

Abstract

While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acousticmicrophone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks.We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dualchannel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69% to 18.75%. The FAR of replay condition further drops to 0% when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.

Original languageEnglish
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume26
Issue number1
Pages (from-to)44-56
Number of pages13
ISSN2329-9290
DOIs
Publication statusPublished - 2018

Fingerprint

throats
Microphones
microphones
attack
acceptability
Experiments
Detectors
sensors
detectors
Sensors

Keywords

  • Automatic speaker verification
  • anti-spoofing
  • replay attack
  • throat microphone
  • two-channel countermeasure
  • voice liveness detection

Cite this

Sahidullah, Md ; Thomsen, Dennis Alexander Lehmann ; Hautamaki, Rosa Gonzalez ; Kinnunen, Tomi ; Tan, Zheng-Hua ; Parts, Robert ; Pitkänen, Martti . / Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018 ; Vol. 26, No. 1. pp. 44-56.
@article{d43a6ca481614a7e924eb8bb291fed29,
title = "Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones",
abstract = "While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acousticmicrophone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks.We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dualchannel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69{\%} to 18.75{\%}. The FAR of replay condition further drops to 0{\%} when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.",
keywords = "Automatic speaker verification, anti-spoofing, replay attack, throat microphone, two-channel countermeasure, voice liveness detection",
author = "Md Sahidullah and Thomsen, {Dennis Alexander Lehmann} and Hautamaki, {Rosa Gonzalez} and Tomi Kinnunen and Zheng-Hua Tan and Robert Parts and Martti Pitk{\"a}nen",
year = "2018",
doi = "10.1109/TASLP.2017.2760243",
language = "English",
volume = "26",
pages = "44--56",
journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",
issn = "2329-9290",
publisher = "IEEE Signal Processing Society",
number = "1",

}

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones. / Sahidullah, Md; Thomsen, Dennis Alexander Lehmann; Hautamaki, Rosa Gonzalez; Kinnunen, Tomi; Tan, Zheng-Hua; Parts, Robert; Pitkänen, Martti .

In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, No. 1, 2018, p. 44-56.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

AU - Sahidullah, Md

AU - Thomsen, Dennis Alexander Lehmann

AU - Hautamaki, Rosa Gonzalez

AU - Kinnunen, Tomi

AU - Tan, Zheng-Hua

AU - Parts, Robert

AU - Pitkänen, Martti

PY - 2018

Y1 - 2018

N2 - While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acousticmicrophone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks.We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dualchannel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69% to 18.75%. The FAR of replay condition further drops to 0% when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.

AB - While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acousticmicrophone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks.We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dualchannel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69% to 18.75%. The FAR of replay condition further drops to 0% when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.

KW - Automatic speaker verification

KW - anti-spoofing

KW - replay attack

KW - throat microphone

KW - two-channel countermeasure

KW - voice liveness detection

UR - http://www.scopus.com/inward/record.url?scp=85031788681&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2017.2760243

DO - 10.1109/TASLP.2017.2760243

M3 - Journal article

VL - 26

SP - 44

EP - 56

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

SN - 2329-9290

IS - 1

ER -