Acoustic-Phonetic Modelling with Bayesian Networks

  • Lindberg, Børge (Project Participant)
  • Andersen, Johan Myhre (Project Participant)

Project Details

Description

This research project aims at improved modelling of the relation between a phonetic symbol-string and its acoustic speech signal. Such acoustic-phonetic modelling is typically done with hidden Markov models (HMMs), which allow efficient calculations, but which disregard many aspects of speech signals, especially the inherent time-correlation. Switching state-space models (SSSMs) extend HMMs with a continuous hidden state-space, which can be interpreted as a pseudo-articulatory representation of speech. This level connects the phoneme sequence with the acoustic signal. Bayesian Networks (BNs) can describe both HMMs and SSMs as well as future extensions. A BN is a directed acyclic graph of stochastic variables and a probability distribution on this graph. The visual representation is appealing when phoneticians and other speech experts discuss model construction with statisticians. Using BNs as modelling framework also allows to benefit more readily from advances in BN technology, such as new methods for inference and learning. Exact calculation is not feasible with SSSMs, and Gibbs sampling has been chosen as an approximate method for inference and learning of model parameters. A Gibbs sampler has been implemented and tested on simulated data, and will be applied to real speech data. These experiments will be made with the TIMIT database because it includes phonetic transcriptions by phoneticians. The goal of this project is to improve speech recognition by using more accurate acoustic-phonetic models. Secondarily, experiments with the new models might provide new insight into speech production and the structure of speech signals. The SSSMs are expected to be a stepping stone towards even more detailed stochastic models. Such models should capture more of the large diversity in speech signals by including more aspects of speech production. (Johan M. Andersen, Børge Lindberg, and Steffen L. Lauritzen)
StatusFinished
Effective start/end date31/12/200331/12/2003