Quality Control in Remote Speech Data Collection

Research output: Contribution to journalJournal articleResearchpeer-review

15 Downloads (Pure)

Abstract

There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers which deserve further investigation.
In this paper, a simple and effective approach for identification of outliers in a speech database is proposed.
Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold.
The DetMCD is a computationally efficient algorithm which provides a highly robust estimate of the mean and covariance in multivariate data even when 50% of the data are outliers.
Experimental results using 8 different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in significantly decreasing the effort required for manually controlling the quality of the database.
Original languageEnglish
JournalIEEE Journal of Selected Topics in Signal Processing
Volume13
Issue number2
ISSN1932-4553
DOIs
Publication statusPublished - Mar 2019

Fingerprint

Quality control

Keywords

  • Outlier Detection
  • Quality Control
  • Robust Estimation
  • Speech Databases
  • Remote Data Collection

Cite this

@article{1c1592c93cef497da0ab9dd43a10f6af,
title = "Quality Control in Remote Speech Data Collection",
abstract = "There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers which deserve further investigation.In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold.The DetMCD is a computationally efficient algorithm which provides a highly robust estimate of the mean and covariance in multivariate data even when 50{\%} of the data are outliers.Experimental results using 8 different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4{\%} accuracy, resulting in significantly decreasing the effort required for manually controlling the quality of the database.",
keywords = "Outlier Detection, Quality Control, Robust Estimation, Speech Databases, Remote Data Collection",
author = "Alavijeh, {Amir Hossein Poorjam} and Little, {Max A} and Jensen, {Jesper Rindom} and Christensen, {Mads Gr{\ae}sb{\o}ll}",
year = "2019",
month = "3",
doi = "10.1109/JSTSP.2019.2904212",
language = "English",
volume = "13",
journal = "I E E E Journal on Selected Topics in Signal Processing",
issn = "1932-4553",
publisher = "IEEE",
number = "2",

}

Quality Control in Remote Speech Data Collection. / Alavijeh, Amir Hossein Poorjam; Little, Max A; Jensen, Jesper Rindom; Christensen, Mads Græsbøll.

In: IEEE Journal of Selected Topics in Signal Processing, Vol. 13, No. 2, 03.2019.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Quality Control in Remote Speech Data Collection

AU - Alavijeh, Amir Hossein Poorjam

AU - Little, Max A

AU - Jensen, Jesper Rindom

AU - Christensen, Mads Græsbøll

PY - 2019/3

Y1 - 2019/3

N2 - There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers which deserve further investigation.In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold.The DetMCD is a computationally efficient algorithm which provides a highly robust estimate of the mean and covariance in multivariate data even when 50% of the data are outliers.Experimental results using 8 different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in significantly decreasing the effort required for manually controlling the quality of the database.

AB - There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers which deserve further investigation.In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold.The DetMCD is a computationally efficient algorithm which provides a highly robust estimate of the mean and covariance in multivariate data even when 50% of the data are outliers.Experimental results using 8 different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in significantly decreasing the effort required for manually controlling the quality of the database.

KW - Outlier Detection

KW - Quality Control

KW - Robust Estimation

KW - Speech Databases

KW - Remote Data Collection

U2 - 10.1109/JSTSP.2019.2904212

DO - 10.1109/JSTSP.2019.2904212

M3 - Journal article

VL - 13

JO - I E E E Journal on Selected Topics in Signal Processing

JF - I E E E Journal on Selected Topics in Signal Processing

SN - 1932-4553

IS - 2

ER -