DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Hong Yu, Zheng-Hua Tan, Yiming Zhang, Zhanyu Ma, Jun Guo

Research output: Contribution to journalJournal articleResearchpeer-review

18 Citations (Scopus)
171 Downloads (Pure)

Abstract

With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank-based cepstral feature, deep neural network (DNN) filter bank cepstral coefficients, to distinguish between natural and spoofed speech. The DNN filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood classifier trained by the new feature performs better than the state-of-the-art linear frequency triangle filter bank cepstral coefficients-based classifier, especially on detecting unknown attacks.
Original languageEnglish
JournalIEEE Access
Volume5
Pages (from-to)4779 - 4787
Number of pages9
ISSN2169-3536
DOIs
Publication statusPublished - 24 Mar 2017

Fingerprint

Filter banks
Classifiers
Neural networks
Deep neural networks
Speech synthesis
Maximum likelihood

Cite this

Yu, Hong ; Tan, Zheng-Hua ; Zhang, Yiming ; Ma, Zhanyu ; Guo, Jun. / DNN Filter Bank Cepstral Coefficients for Spoofing Detection. In: IEEE Access. 2017 ; Vol. 5. pp. 4779 - 4787.
@article{d213800589c24f5789e2af5a0c9c7247,
title = "DNN Filter Bank Cepstral Coefficients for Spoofing Detection",
abstract = "With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank-based cepstral feature, deep neural network (DNN) filter bank cepstral coefficients, to distinguish between natural and spoofed speech. The DNN filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood classifier trained by the new feature performs better than the state-of-the-art linear frequency triangle filter bank cepstral coefficients-based classifier, especially on detecting unknown attacks.",
author = "Hong Yu and Zheng-Hua Tan and Yiming Zhang and Zhanyu Ma and Jun Guo",
year = "2017",
month = "3",
day = "24",
doi = "10.1109/ACCESS.2017.2687041",
language = "English",
volume = "5",
pages = "4779 -- 4787",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE",

}

DNN Filter Bank Cepstral Coefficients for Spoofing Detection. / Yu, Hong; Tan, Zheng-Hua; Zhang, Yiming; Ma, Zhanyu; Guo, Jun.

In: IEEE Access, Vol. 5, 24.03.2017, p. 4779 - 4787.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - DNN Filter Bank Cepstral Coefficients for Spoofing Detection

AU - Yu, Hong

AU - Tan, Zheng-Hua

AU - Zhang, Yiming

AU - Ma, Zhanyu

AU - Guo, Jun

PY - 2017/3/24

Y1 - 2017/3/24

N2 - With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank-based cepstral feature, deep neural network (DNN) filter bank cepstral coefficients, to distinguish between natural and spoofed speech. The DNN filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood classifier trained by the new feature performs better than the state-of-the-art linear frequency triangle filter bank cepstral coefficients-based classifier, especially on detecting unknown attacks.

AB - With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank-based cepstral feature, deep neural network (DNN) filter bank cepstral coefficients, to distinguish between natural and spoofed speech. The DNN filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood classifier trained by the new feature performs better than the state-of-the-art linear frequency triangle filter bank cepstral coefficients-based classifier, especially on detecting unknown attacks.

U2 - 10.1109/ACCESS.2017.2687041

DO - 10.1109/ACCESS.2017.2687041

M3 - Journal article

VL - 5

SP - 4779

EP - 4787

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

ER -