## Abstrakt

Modern multimedia systems are more and more shifting toward distributed and networked structures. This includes audio systems, where networks of wireless distributed microphones are replacing the traditional microphone arrays. This allows for flexibility of placement and high spatial diversity. However, it comes with the price of several challenges, including the limited power and bandwidth resources for wireless transmission of audio recordings. In such a setup, we study the problem of source coding for the compression of the audio recordings before the transmission in order to reduce the power consumption and/or transmission bandwidth by reduction in the transmission rates.

Source coding for wireless microphones in reverberant environments has several special characteristics which make it more challenging in comparison with regular audio coding. The signals which are acquired by the microphones and fed to the encoders are typically impaired by the enclosure effects, noise, interference, etc. Moreover, each particular microphone can take it into account that there are other microphones which make their own observations of the sound field. These observations are correlated with that microphone's recording. This means that ignoring this correlation will be a waste of the scarce power and bandwidth resources.

In this thesis, we study both information-theoretic and audio coding aspects of the coding problem in the above-mentioned framework. We formulate rate-distortion problems which take into account the characteristics which were described above. The resulting problems are not treated in the literature, and therefore, a significant part of this thesis is dedicated to solving them. We solve the problems using assumptions such as Gaussianity of the sources, and thereby establish mathematical bounds on the performance of the audio coding system. We derive explicit formulas for the rate-distortion functions, and design coding schemes that asymptotically achieve the performance bounds. We justify the Gaussianity assumption by showing that the results will still be relevant for non-Gaussian sources including audio signals.

We apply the information-theoretic results to several problems with different objectives. First, we consider a network of microphones where each microphone sends its own measurements to a center possibly by using neighboring microphones as relays. Each relay microphone has its own recording of the sound source, which is correlated with the data received from the neighboring microphone. We study the possibility of combining these two correlated signals optimally in order to minimize the transmission rate. Moreover, the recording at a receiving microphone can be used as side information for the transmitting microphone to reduce the rate by using distributed source coding. Within this framework, we apply our coding schemes to Gaussian signals as well as audio measurements and compare the rate-distortion performances for distributed and non-distributed source coding scenarios. We also compare the performance with the theoretical bounds. To optimally combine the audio recordings at each microphone with the signals received from the neighboring microphones, we derive explicit formulas for sufficient statistics of the two signals. To take the spatial correlation into account, one needs the joint statistics of the signals from different microphones. This is hard to obtain in practice. To cope with this problem, we use an offline estimation of the statistics for a database of audio signals, and use them for coding of the audio measurements, assuming that the statistics are fixed. To implement the distributed source coding part, we use zero error coding as a suboptimal implementation. We show that significant gains are achievable using the sufficient statistics for joint coding, and small gains can also be obtained by our suboptimal implementation of the distributed source coding scenario.

As the second problem, we consider a centralized network of wireless sensors (possibly microphones), where each sensor sends its data directly to a center with a certain rate. The center will fuse and process the received data to maximize the output SNR while incurring no linear distortion. The problem is how to allocate rates and distortions to individual sensor nodes in order to maximize the output SNR at the center, given a weighted sum-rate constraint for the network. We make use of our rate-distortion functions and coding schemes to formulate this problem as an optimization problem over a set of matrices. The resulting problem is hard to solve analytically in general. However, we consider special cases such as the high-rate regime or the scalar two-nodes setup, and derive analytical results.

Finally, we turn our attention to perceptual audio compression within our setup. We incorporate the perceptual masking effects into the rate-distortion problem by defining the distortion constraint on the spectrum of the reconstruction error. Perceptual transparency will then be achieved by forcing the spectrum of the reconstruction error to fall below the perceptual masking curve. We solve this problem and interpret the results for distributed and remote source coding scenarios. We show that for those setups, instead of bit allocation and quantization based on the perceptual masking curve alone, one should make use of a modified masking curve which takes into account the remote or distributed nature of the problem in addition to the masking effects.

Source coding for wireless microphones in reverberant environments has several special characteristics which make it more challenging in comparison with regular audio coding. The signals which are acquired by the microphones and fed to the encoders are typically impaired by the enclosure effects, noise, interference, etc. Moreover, each particular microphone can take it into account that there are other microphones which make their own observations of the sound field. These observations are correlated with that microphone's recording. This means that ignoring this correlation will be a waste of the scarce power and bandwidth resources.

In this thesis, we study both information-theoretic and audio coding aspects of the coding problem in the above-mentioned framework. We formulate rate-distortion problems which take into account the characteristics which were described above. The resulting problems are not treated in the literature, and therefore, a significant part of this thesis is dedicated to solving them. We solve the problems using assumptions such as Gaussianity of the sources, and thereby establish mathematical bounds on the performance of the audio coding system. We derive explicit formulas for the rate-distortion functions, and design coding schemes that asymptotically achieve the performance bounds. We justify the Gaussianity assumption by showing that the results will still be relevant for non-Gaussian sources including audio signals.

We apply the information-theoretic results to several problems with different objectives. First, we consider a network of microphones where each microphone sends its own measurements to a center possibly by using neighboring microphones as relays. Each relay microphone has its own recording of the sound source, which is correlated with the data received from the neighboring microphone. We study the possibility of combining these two correlated signals optimally in order to minimize the transmission rate. Moreover, the recording at a receiving microphone can be used as side information for the transmitting microphone to reduce the rate by using distributed source coding. Within this framework, we apply our coding schemes to Gaussian signals as well as audio measurements and compare the rate-distortion performances for distributed and non-distributed source coding scenarios. We also compare the performance with the theoretical bounds. To optimally combine the audio recordings at each microphone with the signals received from the neighboring microphones, we derive explicit formulas for sufficient statistics of the two signals. To take the spatial correlation into account, one needs the joint statistics of the signals from different microphones. This is hard to obtain in practice. To cope with this problem, we use an offline estimation of the statistics for a database of audio signals, and use them for coding of the audio measurements, assuming that the statistics are fixed. To implement the distributed source coding part, we use zero error coding as a suboptimal implementation. We show that significant gains are achievable using the sufficient statistics for joint coding, and small gains can also be obtained by our suboptimal implementation of the distributed source coding scenario.

As the second problem, we consider a centralized network of wireless sensors (possibly microphones), where each sensor sends its data directly to a center with a certain rate. The center will fuse and process the received data to maximize the output SNR while incurring no linear distortion. The problem is how to allocate rates and distortions to individual sensor nodes in order to maximize the output SNR at the center, given a weighted sum-rate constraint for the network. We make use of our rate-distortion functions and coding schemes to formulate this problem as an optimization problem over a set of matrices. The resulting problem is hard to solve analytically in general. However, we consider special cases such as the high-rate regime or the scalar two-nodes setup, and derive analytical results.

Finally, we turn our attention to perceptual audio compression within our setup. We incorporate the perceptual masking effects into the rate-distortion problem by defining the distortion constraint on the spectrum of the reconstruction error. Perceptual transparency will then be achieved by forcing the spectrum of the reconstruction error to fall below the perceptual masking curve. We solve this problem and interpret the results for distributed and remote source coding scenarios. We show that for those setups, instead of bit allocation and quantization based on the perceptual masking curve alone, one should make use of a modified masking curve which takes into account the remote or distributed nature of the problem in addition to the masking effects.

Originalsprog | Engelsk |
---|

Antal sider | 122 |
---|---|

DOI | |

Status | Udgivet - 2016 |

Navn | Ph.d.-serien for Det Teknisk-Naturvidenskabelige Fakultet, Aalborg Universitet |
---|---|

ISSN | 2246-1248 |