Out-of-Vocabulary Detection and Confidence Measures

Project Details


With the rapid advances in speech recognition technology, application systems deployed perform reasonably well when the user is cooperative and carefully uses the system's vocabulary. In many applications, however, all the words to be recognised by a speech recognition system cannot be provided in advance, especially in the case of relatively small-embedded systems. Words being pronounced, which are not part of the lexicon of the recogniser, are called out-of-vocabulary (OOV) words. Generally, the recognition accuracy degrades sharply when OOV words, or more generally, extraneous speech are present. It is therefore important to develop methods that can identify when a recogniser's hypothesis is correct and when it may be in error. A usable system should be able to reject extraneous speech and noise and to determine how confident it is in the speech that has been recognised. OOV-detection falls under the category of decisions based on a quantitative score, also called confidence measure (CM). A CM is the result of a hypothesis test that weighs the probability that the hypothesis (word/sentence) is correct against the probability that it is incorrect. Hence, in order to calculate a CM for a hypothesis, it is crucial that an estimate of both of these probabilities can be estimated (explicitly or implicitly). The goal of the OOV-project is to study and develop new acoustic CM, which can be used for OOV detection. The focus will be on techniques that can be implemented using limited hardware resources - i.e. techniques that can be used in connection with a speech recogniser running on a mobile terminal in the near future. In the context of distributed speech recognition (DSR) over mobile networks, the influence of transmission errors on OOV detection has been investigated. In the presence of transmission errors a fixed threshold method fails to maintain the balance of the false rejection and false acceptance rates. A frame-error-rate based OOV detection is therefore suggested in which the estimated FER is used to adjust the discrimination threshold. (Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg)
Effective start/end date31/12/200331/12/2003