Abstract
We compare the run-time complexity of recent deep neural network (DNN) and non-DNN based monaural speech enhancement algorithms. Specifically, we consider fully connected, convolutional, and genetic-algorithm based DNNs and compare their performance to the image analysis technique, which is non-DNN based. It is demonstrated that for the same speech enhancement performance, a simple fully connected DNN has the lowest run-time computational complexity in terms of floating-point operations and execution time on a standard laptop. The objective indices used for the evaluation of the speech enhancement performance are the perceptual evaluation of speech quality and short-time objective intelligibility measures. In addition, the subjective intelligibility measures involved in the experiment are the modified rhyme test and the mean opinion score. Both stationary and non-stationary noise in addition to interfering speech is considered.
Originalsprog | Engelsk |
---|---|
Artikelnummer | 108627 |
Tidsskrift | Applied Acoustics |
Vol/bind | 190 |
ISSN | 0003-682X |
DOI | |
Status | Udgivet - 15 mar. 2022 |