Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

6 Citations (Scopus)

Abstract

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.
Original languageEnglish
Title of host publicationINTERSPEECH-2015
Number of pages5
PublisherISCA
Publication date2015
Pages2937-2941
Publication statusPublished - 2015
EventINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association - Dresden, Germany
Duration: 6 Sept 201510 Sept 2015

Conference

ConferenceINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Country/TerritoryGermany
CityDresden
Period06/09/201510/09/2015
SeriesINTERSPEECH
ISSN1990-9770

Fingerprint

Dive into the research topics of 'Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD'. Together they form a unique fingerprint.

Cite this