Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Sam Karimian-Azari

doi:10.5278/vbn.phd.engsci.00126

Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Sam Karimian-Azari

Institut for Arkitektur og Medieteknologi

Publikation: Ph.d.-afhandling

406 Downloads (Pure)

Abstract

Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise.

This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.

Originalsprog	Engelsk
Udgiver	Aalborg Universitetsforlag
ISBN'er, elektronisk	978-87-7112-750-8
DOI	https://doi.org/10.5278/vbn.phd.engsci.00126
Status	Udgivet - 2016

Adgang til dokumentet

10.5278/vbn.phd.engsci.00126

PHD_Sam_Karimian-Azari_EForlagets udgivne version, 8,6 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@misc{3480cef51ba54786a33f4b6df38f77b3,

title = "Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement",

abstract = "Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise.This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.",

author = "Sam Karimian-Azari",

note = "PhD Supervisor: Prof. Mads Gr{\ae}sb{\o}ll Christensen, Aalborg University PhD Co-supervisor: Postdoc. Jesper Rindom Jensen, Aalborg University",

year = "2016",

doi = "10.5278/vbn.phd.engsci.00126",

language = "English",

series = "Ph.d.-serien for Det Teknisk-Naturvidenskabelige Fakultet, Aalborg Universitet",

publisher = "Aalborg Universitetsforlag",

}

TY - GEN

T1 - Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

AU - Karimian-Azari, Sam

N1 - PhD Supervisor: Prof. Mads Græsbøll Christensen, Aalborg University PhD Co-supervisor: Postdoc. Jesper Rindom Jensen, Aalborg University

PY - 2016

Y1 - 2016

N2 - Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise.This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.

AB - Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise.This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.

U2 - 10.5278/vbn.phd.engsci.00126

DO - 10.5278/vbn.phd.engsci.00126

M3 - PhD thesis

T3 - Ph.d.-serien for Det Teknisk-Naturvidenskabelige Fakultet, Aalborg Universitet

PB - Aalborg Universitetsforlag

ER -

Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater