Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski; Zheng-Hua Tan; Maria  Paola Bissiri

Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri

Institut for Elektroniske Systemer

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

6 Citationer (Scopus)

Abstract

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

Originalsprog	Engelsk
Titel	INTERSPEECH-2015
Antal sider	5
Forlag	ISCA
Publikationsdato	2015
Sider	2937-2941
Status	Udgivet - 2015
Begivenhed	INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association - Dresden, Tyskland Varighed: 6 sep. 2015 → 10 sep. 2015

Konference

Konference	INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Land/Område	Tyskland
By	Dresden
Periode	06/09/2015 → 10/09/2015

Navn	INTERSPEECH
ISSN	1990-9770

Adgang til dokumentet

http://www.isca-speech.org/archive/interspeech_2015/i15_2937.html

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{31210b10deeb4de689510a67865b3c56,

title = "Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD",

abstract = "This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.",

author = "Ivan Kraljevski and Zheng-Hua Tan and {Paola Bissiri}, Maria",

year = "2015",

language = "English",

series = "INTERSPEECH ",

publisher = "ISCA",

pages = "2937--2941",

booktitle = "INTERSPEECH-2015",

note = "INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association ; Conference date: 06-09-2015 Through 10-09-2015",

}

Kraljevski, I, Tan, Z-H & Paola Bissiri, M 2015, Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD. i INTERSPEECH-2015. ISCA, INTERSPEECH , s. 2937-2941, INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association, Dresden, Tyskland, 06/09/2015. <http://www.isca-speech.org/archive/interspeech_2015/i15_2937.html>

TY - GEN

T1 - Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

AU - Kraljevski, Ivan

AU - Tan, Zheng-Hua

AU - Paola Bissiri, Maria

PY - 2015

Y1 - 2015

N2 - This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

AB - This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

M3 - Article in proceeding

T3 - INTERSPEECH

SP - 2937

EP - 2941

BT - INTERSPEECH-2015

PB - ISCA

T2 - INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association

Y2 - 6 September 2015 through 10 September 2015

ER -

Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Abstract

Konference

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater