Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski; Zheng-Hua Tan; Maria  Paola Bissiri

Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri

Department of Electronic Systems

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

6 Citations (Scopus)

Abstract

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

Original language	English
Title of host publication	INTERSPEECH-2015
Number of pages	5
Publisher	ISCA
Publication date	2015
Pages	2937-2941
Publication status	Published - 2015
Event	INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015

Conference

Conference	INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Country/Territory	Germany
City	Dresden
Period	06/09/2015 → 10/09/2015

Series	INTERSPEECH
ISSN	1990-9770

Access to Document

http://www.isca-speech.org/archive/interspeech_2015/i15_2937.html

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{31210b10deeb4de689510a67865b3c56,

title = "Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD",

abstract = "This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.",

author = "Ivan Kraljevski and Zheng-Hua Tan and {Paola Bissiri}, Maria",

year = "2015",

language = "English",

series = "INTERSPEECH ",

publisher = "ISCA",

pages = "2937--2941",

booktitle = "INTERSPEECH-2015",

note = "INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association ; Conference date: 06-09-2015 Through 10-09-2015",

}

Kraljevski, I, Tan, Z-H & Paola Bissiri, M 2015, Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD. in INTERSPEECH-2015. ISCA, INTERSPEECH , pp. 2937-2941, INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, 06/09/2015. <http://www.isca-speech.org/archive/interspeech_2015/i15_2937.html>

TY - GEN

T1 - Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

AU - Kraljevski, Ivan

AU - Tan, Zheng-Hua

AU - Paola Bissiri, Maria

PY - 2015

Y1 - 2015

N2 - This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

AB - This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

M3 - Article in proceeding

T3 - INTERSPEECH

SP - 2937

EP - 2941

BT - INTERSPEECH-2015

PB - ISCA

T2 - INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association

Y2 - 6 September 2015 through 10 September 2015

ER -

Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Abstract

Conference

Access to Document

AUB Link

Fingerprint

Cite this