Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

Hong Yu; Tianrui Hu; Zhanyu Ma; Zheng-Hua Tan; Jun Guo

doi:10.1109/ICNIDC.2018.8525526

Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

Hong Yu, Tianrui Hu, Zhanyu Ma, Zheng-Hua Tan, Jun Guo

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

3 Citationer (Scopus)

Abstract

Modern automatic speaker verification (ASV) systems need to be robust under various noisy conditions. Motivated by the success of generative adversarial networks (GANs), this paper proposes a multi-task adversarial network (MAN) for extracting noise-invariant bottleneck (BN) features. The MAN consists of three component networks, a feature encoding network (FEN), a speaker discriminative network (SDN) and a noise-domain adaptation network (NAN). The FEN aims to generate noise-robustness BN features, the SDN makes the features from the FEN more speaker-discriminative and the NAN guides the FEN to learn more noise-invariant feature representations. The MAN is trained using an adversarial method. When training FEN and SDN, speaker identities and the label of being clean speech are used as target labels, which can make BN features, extracted from noisy or clean speech, similar. When training NAN, on the contrary, noise types are used as training targets. We evaluate the newly proposed MAN-BN feature extraction method on a Gaussian mixture model-universal background model (GMM-UBM) based ASV system. The experimental results on the RSR2015 database show that the proposed MAN-BN feature can dramatically improve the accuracy of the ASV system under different noise-type and signal-to-noise-ratio conditions.

Originalsprog	Engelsk
Titel	2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)
Antal sider	5
Forlag	IEEE
Publikationsdato	6 nov. 2018
Sider	165-169
Artikelnummer	8525526
ISBN (Trykt)	978-1-5386-6066-9
ISBN (Elektronisk)	978-1-5386-6067-6
DOI	https://doi.org/10.1109/ICNIDC.2018.8525526
Status	Udgivet - 6 nov. 2018
Begivenhed	2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC) - Guiyang, Kina Varighed: 22 aug. 2018 → 24 aug. 2018

Konference

Konference	2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)
Land/Område	Kina
By	Guiyang
Periode	22/08/2018 → 24/08/2018

Navn	International Conference on Network Infrastructure and Digital Content (IC-NIDC)
ISSN	2575-4955

Adgang til dokumentet

10.1109/ICNIDC.2018.8525526

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

http://www.scopus.com/inward/record.url?scp=85058298314&partnerID=8YFLogxK

Citationsformater

@inproceedings{9237ad475e944155bcd989fce0ca21df,

title = "Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification",

abstract = "Modern automatic speaker verification (ASV) systems need to be robust under various noisy conditions. Motivated by the success of generative adversarial networks (GANs), this paper proposes a multi-task adversarial network (MAN) for extracting noise-invariant bottleneck (BN) features. The MAN consists of three component networks, a feature encoding network (FEN), a speaker discriminative network (SDN) and a noise-domain adaptation network (NAN). The FEN aims to generate noise-robustness BN features, the SDN makes the features from the FEN more speaker-discriminative and the NAN guides the FEN to learn more noise-invariant feature representations. The MAN is trained using an adversarial method. When training FEN and SDN, speaker identities and the label of being clean speech are used as target labels, which can make BN features, extracted from noisy or clean speech, similar. When training NAN, on the contrary, noise types are used as training targets. We evaluate the newly proposed MAN-BN feature extraction method on a Gaussian mixture model-universal background model (GMM-UBM) based ASV system. The experimental results on the RSR2015 database show that the proposed MAN-BN feature can dramatically improve the accuracy of the ASV system under different noise-type and signal-to-noise-ratio conditions.",

keywords = "Bottleneck Features, Multi-task Adversarial Training, Speaker Verification",

author = "Hong Yu and Tianrui Hu and Zhanyu Ma and Zheng-Hua Tan and Jun Guo",

year = "2018",

month = nov,

day = "6",

doi = "10.1109/ICNIDC.2018.8525526",

language = "English",

isbn = "978-1-5386-6066-9",

series = "International Conference on Network Infrastructure and Digital Content (IC-NIDC)",

publisher = "IEEE",

pages = "165--169",

booktitle = "2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)",

address = "United States",

note = "2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC) ; Conference date: 22-08-2018 Through 24-08-2018",

}

Yu, H, Hu, T, Ma, Z, Tan, Z-H & Guo, J 2018, Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification. i 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)., 8525526, IEEE, International Conference on Network Infrastructure and Digital Content (IC-NIDC), s. 165-169, 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, Kina, 22/08/2018. https://doi.org/10.1109/ICNIDC.2018.8525526

Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification. / Yu, Hong; Hu, Tianrui; Ma, Zhanyu et al.
2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC). IEEE, 2018. s. 165-169 8525526 (International Conference on Network Infrastructure and Digital Content (IC-NIDC)).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

AU - Yu, Hong

AU - Hu, Tianrui

AU - Ma, Zhanyu

AU - Tan, Zheng-Hua

AU - Guo, Jun

PY - 2018/11/6

Y1 - 2018/11/6

N2 - Modern automatic speaker verification (ASV) systems need to be robust under various noisy conditions. Motivated by the success of generative adversarial networks (GANs), this paper proposes a multi-task adversarial network (MAN) for extracting noise-invariant bottleneck (BN) features. The MAN consists of three component networks, a feature encoding network (FEN), a speaker discriminative network (SDN) and a noise-domain adaptation network (NAN). The FEN aims to generate noise-robustness BN features, the SDN makes the features from the FEN more speaker-discriminative and the NAN guides the FEN to learn more noise-invariant feature representations. The MAN is trained using an adversarial method. When training FEN and SDN, speaker identities and the label of being clean speech are used as target labels, which can make BN features, extracted from noisy or clean speech, similar. When training NAN, on the contrary, noise types are used as training targets. We evaluate the newly proposed MAN-BN feature extraction method on a Gaussian mixture model-universal background model (GMM-UBM) based ASV system. The experimental results on the RSR2015 database show that the proposed MAN-BN feature can dramatically improve the accuracy of the ASV system under different noise-type and signal-to-noise-ratio conditions.

AB - Modern automatic speaker verification (ASV) systems need to be robust under various noisy conditions. Motivated by the success of generative adversarial networks (GANs), this paper proposes a multi-task adversarial network (MAN) for extracting noise-invariant bottleneck (BN) features. The MAN consists of three component networks, a feature encoding network (FEN), a speaker discriminative network (SDN) and a noise-domain adaptation network (NAN). The FEN aims to generate noise-robustness BN features, the SDN makes the features from the FEN more speaker-discriminative and the NAN guides the FEN to learn more noise-invariant feature representations. The MAN is trained using an adversarial method. When training FEN and SDN, speaker identities and the label of being clean speech are used as target labels, which can make BN features, extracted from noisy or clean speech, similar. When training NAN, on the contrary, noise types are used as training targets. We evaluate the newly proposed MAN-BN feature extraction method on a Gaussian mixture model-universal background model (GMM-UBM) based ASV system. The experimental results on the RSR2015 database show that the proposed MAN-BN feature can dramatically improve the accuracy of the ASV system under different noise-type and signal-to-noise-ratio conditions.

KW - Bottleneck Features

KW - Multi-task Adversarial Training

KW - Speaker Verification

UR - http://www.scopus.com/inward/record.url?scp=85058298314&partnerID=8YFLogxK

U2 - 10.1109/ICNIDC.2018.8525526

DO - 10.1109/ICNIDC.2018.8525526

M3 - Article in proceeding

SN - 978-1-5386-6066-9

T3 - International Conference on Network Infrastructure and Digital Content (IC-NIDC)

SP - 165

EP - 169

BT - 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)

PB - IEEE

T2 - 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC)

Y2 - 22 August 2018 through 24 August 2018

ER -

Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater