Usability Evaluation of Spoken and Multi-Modal Dialogues



This project aims at developing robust methods for evaluation of Spoken Dialogue Systems. In particular, the issue of combining objective (e.g. task success rates) measures with subjective (e.g. user attitudes) measures is addressed. For this, the PARADISE (Paradigm for Spoken Dialogue Systems Evaluation) scheme, proposed by AT&T is investigated and developed further. A dialogue corpus consisting of about 700 annotated dialogues from more than 300 users, collected in a real-world field trial, forms the basis for the analysis. All users expressed their attitudes towards the application (a home-banking service) and this information is analysed and combined with information logged during the dialogues. Results so far show that the PARADISE model is able to predict the users' attitudes based on task success rates (normalised for task complexity) and the speech recognition performance for a subset of problematic dialogues. These results are compatible to other current research. Another major topic in the project is the problems related to scenario based user testing. The users have no "real" use for the information they obtain from the system, because they are placed in an artificial situation. Therefore, it may be suspected, that their answers do not fully represent those of "real users". For example, analyses of the corpus has shown, that although 97% of the users reported that they completed the assigned tasks, actually only 74% did so. However, user attitudes might be closer related to the "perceived" task success than the true one, and this might in turn influence the model to make wrong or biased predictions. (Lars Bo Larsen)
Effektiv start/slut dato31/12/200331/12/2003