TY - JOUR
T1 - Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning
AU - Zhang, Qinzheng
AU - Wang, Haiyan
AU - Rindom Jensen, Jesper
AU - Tao, Shuai
AU - Græsbøll Christensen, Mads
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024/5
Y1 - 2024/5
N2 - With the development of deep learning techniques, the field of direction of arrival (DOA) estimation has also made significant progress. However, the accuracy of DOA estimation using end-to-end neural networks (NNs) heavily relies on the classification step of the networks, which necessitates the use of large and representative datasets. Additionally, conventional speech presence probability (SPP) estimation methods based on the ideal ratio mask (IRM) may misclassify time-frequency (T-F) bins dominated by non-speech and noise, which hinders the accurate extraction of directional information. To improve the robustness of existing DOA estimation algorithms, this paper proposes a DOA estimation method with T-F bin selection. In terms of output, instead of using IRM-based SPP, our proposed approach focuses on the a posteriori SPP, a deliberate choice aimed at circumventing potential confusion. For input optimization, we construct features that encompass spatial, temporal, and directional information concurrently, and these are coupled with a frequency bin-wise recurrent neural network (RNN) model to attain precise multi-channel SPP estimation. Subsequently, these SPP estimates are utilized to extract local information for DOA estimation. Moreover, the cascaded structure ensures that the model has the ability to complete out-of-label tasks, effectively reducing the dataset requirements by training only a subset of direction information to achieve omnidirectional DOA estimation. Besides, this contributes to the algorithm’s ability to eliminate its reliance on the step size, setting it apart from other end-to-end methods. Simulation results validate that the proposed method achieves higher accuracy and lower error compared to both NN-based end-to-end approaches and traditional full-band approaches under various conditions of reverberation and signal-to-noise ratio.
AB - With the development of deep learning techniques, the field of direction of arrival (DOA) estimation has also made significant progress. However, the accuracy of DOA estimation using end-to-end neural networks (NNs) heavily relies on the classification step of the networks, which necessitates the use of large and representative datasets. Additionally, conventional speech presence probability (SPP) estimation methods based on the ideal ratio mask (IRM) may misclassify time-frequency (T-F) bins dominated by non-speech and noise, which hinders the accurate extraction of directional information. To improve the robustness of existing DOA estimation algorithms, this paper proposes a DOA estimation method with T-F bin selection. In terms of output, instead of using IRM-based SPP, our proposed approach focuses on the a posteriori SPP, a deliberate choice aimed at circumventing potential confusion. For input optimization, we construct features that encompass spatial, temporal, and directional information concurrently, and these are coupled with a frequency bin-wise recurrent neural network (RNN) model to attain precise multi-channel SPP estimation. Subsequently, these SPP estimates are utilized to extract local information for DOA estimation. Moreover, the cascaded structure ensures that the model has the ability to complete out-of-label tasks, effectively reducing the dataset requirements by training only a subset of direction information to achieve omnidirectional DOA estimation. Besides, this contributes to the algorithm’s ability to eliminate its reliance on the step size, setting it apart from other end-to-end methods. Simulation results validate that the proposed method achieves higher accuracy and lower error compared to both NN-based end-to-end approaches and traditional full-band approaches under various conditions of reverberation and signal-to-noise ratio.
KW - A posteriori speech presence probability
KW - Broadband DOA estimation
KW - Deep learning
KW - Out-of-label task
UR - http://www.scopus.com/inward/record.url?scp=85182715509&partnerID=8YFLogxK
U2 - 10.1007/s00034-023-02586-x
DO - 10.1007/s00034-023-02586-x
M3 - Journal article
AN - SCOPUS:85182715509
SN - 0278-081X
VL - 43
SP - 2961
EP - 2981
JO - Circuits, Systems, and Signal Processing
JF - Circuits, Systems, and Signal Processing
IS - 5
ER -