Danish Text-to-Speech Synthesis



This project is an ongoing initiative to ensure that Danish Text-to-Speech synthesis is on par qualitatively with the best systems worldwide. The current technology is based on tri- and diphones undergoing prosodic modifications using Residual Excited Linear Processing (RELP). Two voices are currently available: a male (Carsten) and a female (Benedicte). The former is found through formal assessment to exhibit both high intelligibility and high naturalness while the female voice is very intelligible but lacks some naturalness.       From commercial deployment of the synthesizer it has been observed that an increasing number of English words occur in everyday Danish texts. To optimise the rendering of English words we have started recording the frequently occurring English diphones which is estimated to add up to approx. 1000 units.   Another technology being investigated is ?phrase-splicing?. The idea is to record and store the most problematic phone sequences as whole words. This will improve the quality of isolated words. But it remains to be investigated how listeners will perceive abrupt jumps in the quality of the synthetic speech.      
Effektiv start/slut dato19/05/201031/12/2017