Speech interaction system – how to increase its usability? Fang Chen Department of Computing Science, Chalmers University of Technology, SE-412 96 Göteborg, Sweden fanch@cs.chalmers.se Abstract This paper discussed different issues related to the usability of speech interaction system. It includes the usability concept, different design approaches, design process and evaluation questions for speech interaction system. Usability is a very fuzzy concept, especially when it related to the speech interaction system: it is hard to measure and it is very much context dependent. The traditional user-centered design approach may not be suitable for the speech interaction system design since the users might not have enough knowledge to see what the technology can do. Usage-centered design may be the better method but there is not comprehensive theory and methodology for the design process and evaluation. 1. Introduction Speech technology has been significant progress. The increased automatic speech recognition (ASR) rates (99 to 100% accuracy in laboratory condition) with large vocabulary capacity and naturalness and robustness speech recognition, the humanized synthetic voice, robustness dialogue system design and language understand, together with the increased computer speed and memory capacity, make the speech based interaction system possible for the real-time application [1-3]. The results from Leduc’s performance benchmarking survey [4] clearly indicated that speech recognition is becoming a more mass-market option. When it comes to the real-time application, usability of the speech interaction system becomes important for the acceptance of the user. There is new challenge for the speech system interface design. The environment where the system is going to be used can be dynamic; user’s vocal quality can be unstable, and speech can be variance. Many research works related to human factors on speech related interface design were published in later 80’s and beginning of 90’ [5-7]. At that time, the ASR technology was very poor; most of studies were focus on feedback design and error corrections. The results may be invalid for present system design applications. The constant interests of employing speech interaction design come from the common agreed advantage of speech as “a natural and intuitive communication method” of human beings. But such advantages are not obviously appearing in any speech interaction system unless the designers understand human cognitive behavior, human needs, and usability issues. 2. The usability concept The usability requirements for the speech interaction system are facing the new way of measuring the effectiveness, efficiency and satisfaction, the three elements in the usability evaluation. Among these three elements, effectiveness and efficiency are close related to the functionality of the system and directly affect the satisfaction, therefore is the essential for the design. The effectiveness can be measured in terms of the extent to which a goal or a task is achieved. The efficiency means the amount of effort required to accomplish a goal. Gibbon, et al [8] has described a large number of different effectiveness and efficiency measurements. The effectiveness can be understood as the acceptable performance that should be achieved by a defined proportion of the user population, over a specified range of tasks and in a specified range of environments as the system design for. The efficiency might be measured in terms of the time taken to complete one task, or the errors that user makes during the performance, as well as how much effort the users have to invest on learning and understanding of how the system is working and to be able to work on it. Acceptable performance should be achieved within acceptable human costs, in terms of fatigue, stress, frustration, and discomfort. Sometime the usability evaluation by measuring these three elements separately does not meet each other properly. For example, two dialogue systems (MIMI and Tap&Talk) for train timetable information were compared with usability evaluations [9]. Effectiveness is measured by the number of dialogues that were completed successfully, while efficiency is measured by task completion time and user satisfaction is measured subjectively. The results showed that: MIMI was slightly better than Tap&Talk by effectiveness measurement because solving recognition error was easier with the MIMI interface; Tap&Talk is significantly better than MIMI by efficiency measurement, because there were spoken prompts in MIMI interface, but not in Tap&Talk, so the later one worked faster. The satisfaction is measured by giving users many questions, even most of the statements in this study were judged bout equal for both interface, sometimes it was in favor of the Tap&Talk interface. Even though, the overall user satisfaction was not significantly higher for the Tap&Talk interface, but most people are prefer to use this system anyway. Hone and Graham [10] made some systematic study on user satisfaction towards speech input/output systems. There are six factors in user attitude: perceived system response accuracy, likeability, cognitive demand, annoyance, habitability and speed. Under each label of the attitude, one can design a set of questions to ask. Satisfaction can be understood in different levels. It is the human nature that never satisfies with what they have. The needs to fulfill the functional requirement, to be able to solve the problems are fundamental. The basic level is the comfortable and confidence feeling when using the interface. Learnability and flexibility of the system may affect the comfortable feeling of the users [11]. As soon as this need is fulfilled, people will look for the higher level of satisfaction, such as the pleasant, exciting, fulfillment and happiness. Figure 1 shows the pyramid of the human needs in different levels. Self-actualization needs: Self-fulfilment Psychological needs: Esteen Belonging and love Basic needs: Comfort Confident How to do? Encouragement Freedom Suggestion Feeling of being part of user community A sense of mastery and competence User feel comfortable in learning Logical structure of the information presentation Accurate, reliable, and predictable Figure 1, The hierarchy of user’s needs I. Suggested by Coe [12] Usability is indeed a fuzzy concept. It can only be meaningful within a specific context. One particular system placed in one context will probably display different usability characteristics when placed in a second context [13]. Usability is a property of the interaction among a product or a system, a user and the task, or set of tasks and the organization, society, environment the system is in use. 3. Usability in design process To conceptualizing usability in the design process, Don Norman [14], Ravden and Johnson [15] have pointed out some design principles: x Visibility: information presented should be clear, well organized, unambiguous and easy to understand. x Feedback: users should be given clear, informative feedback on where they are in the system, what actions they have taken, whether these actions have been successful and what actions should be taken next. x Consistency and compatibility: The way the system looks and works should be consistent at all times, and compatible with user expectations. x Explicitness: The way the system works and is structured should be clear to the user so user will easily know how to use it. It shall show the relationship between actions and their effects. x Flexibility and constraints: the structure and the information presentation should be sufficiently flexible in terms of what the user can do, to suit different user needs and allow them to feel in control of the system. At the same time, the system shall also restrict certain kind of user interaction that can take place at a given moments. x Error prevention and correction: The possibility user error should be minimized, automatically detected and easy to handle those which do occur. x User guidance and support: easy-to-read and understand, relevant guidance and support should be provided to help the user understand the system. There are very few studies on usability issues in design speech interaction system. Dybkjaer and Bernsen [16] discussed different criteria for spoken language dialogue systems design. Their results matched well with above principles: x Learnability: The design shall always clear about the user’s experience/knowledge to the system and how quick they can learn about the interaction. x Visibility: The system’s output language should be naturalness and to guide users’ input language so that the input language becomes manageable for the system. x Explicitness: The system shall express its understanding of the user’s intention and provide the information to the user in a clear, unambiguous, correct, accurate and using the language that the familiar to the user. x Flexibility: Multimodal interaction is always more preferable, but be careful to select an appropriate modality for interaction on the specific task domain. x Feedback: The user shall be informed on what is going on in the system. The output voice quality shall has natural intonation and prosody, with an appropriate speaking rate. x Error prevention and correction: Error handling is always important for speech interaction system, as the error may come from the system mis-recognize what the user said, or even users’ mad the error. x User guidance and support: Interaction guidance is necessary for the users to feel in control during interaction. A long and complicated “user manual” provided to the first-time user is not suitable. Leduc and Dougherty et al [4] has specially pointed out the important of consistency in the design. The consistency has two aspects; consistent with previous usage and internally consistent. The task handling by speech system shall match the users’ pre-experiences of handling the task by other systems. The similar tasks will be fulfilled in a similar manner using identical terms throughout the speech application. It is necessary to specify the usability design principles into different speech interactive systems. The detail of these design principles for in-vehicle information system design shall be in somehow different compare with a spoken dialogue system placed inside the house, as in a mobile environment, user’s attention shall be kept on the road. The designer will not only consider the information system itself, but also the safety drive and effect from stress. 4. Problems with user-centred approach to the design The user-centered approach to the design (UCD) can enhance the usability of the products. In this approach, the potential users are involved in the entire design process from the early beginning. The design process is more or less driven by the user. It specializes of user experiences. The typical design process is shown in figure 2. In practical, there are very few organizations managing to implement the UCD process. The problems associated with UCD come from the possible issues like [13]: User issues; organizational commitment; developer skills; and resource constraints. User experience and knowledge, user’s expectation, user contribution and agreement and user diversity are the factors make the user involve difficulties. Users may expect the new system to be simply an improved version of the old one; users may not be able to step back from their daily practices to see how technology can change the way they work, and/or they might not be familiar with the design methods used or the technology, and may simply feel over-awed by the design process (which leads to them feeling unqualified to comment). It is hard to have the user to contribute to the quality of the design solution. At the same time, it is not easy to collect all the information about the user [17]. 6. Context of evaluation Usability evaluation Meets all the requirements 1. Plan the human centrd process Design philosophy Identify design team and users Time-line Success criteria 2. Specify the context of use Understand the characteristics of User, tasks, organization, Environment Task analysis 5. Evaluate designs against user requirements Getting feedback for design Assess the achievement of user and Organizational objectives 3. Specify the user and organizational requirements Allocation of tasks among users Functional requirements Performance criteria Usability criteria 4. Produce design solutions Collect knowledges for design Concrete design solution Prototypes and user tests Iterating above process untill satisfy Figure 2. The interdependence of user centred design activities (developed from ISO-13407) A speech interaction system is not simply replacing keyboard input with speech input, which has been unfortunately the case as in the history of speech technology application. There are many studies on comparing the keyboard input with speech input in different context and the results show sometime positive and sometimes negative of using speech as one of the input modality. The speech interaction system should take the advantages of naturalness and the intuitive of human speech communication, so the interaction between human and system shall be totally different compared with the traditional human-computer interactive systems as using manual input and visual output. Even the speech communication is happened to people’s daily life, it does not mean that users would know what the speech technology can do and how the interface shall be looked like, as their past experiences with the human computer interaction system may not apply to the speech interaction system. The UCD process might not be the best choice for speech interaction system design. Instead of focus on the user, Rakers [18] suggests to focusing on the roles, goals and responsibilities people have. This idea leads to the usagecentred approach of the design. The usage-centred approach focuses on the use of the interaction between humans and the system, its environment and social-organization. Human behavior is goal orientated and event that happens in the living and working environment have its meaning to the user. The notion of meaning, constraints from technology and environment and goal are closely related to its specific context task that the user performs [19]. In this approach, the potential users are not necessary be involved into the design process. The possibility of what the technology can do shall be externally exploded by the technical experts, the usability of the system shall be the essential for the design, while the user’s high level fulfillment (as shown in figure 1) shall be satisfied. The usage-centred approach to the speech interaction system design is a new concept and lack of the comprehensive design theories and methodologies on design process, problem analysis and evaluation. 5. Problems with user evaluation In the usage-centered approach, some user tests shall be carried out in the design process. The purpose to carry out such test is not for getting user’s opinion of the design, but to understand better the user and the interaction between the user and the products, thus to take the most benefit from the developed technology and to increase the usability. There is no applicable usability evaluation theory for the speech interaction system. In general, any evaluation theory should be able to handle at least the following problems [20]: x What are the characteristics of overall system performance? How to measure it? At what level should they be taken? x How should we choose the test persons? Should they be naïve people, or experts? x How much training is required to arrive at stable performance where we can be sure that we are evaluating the properties of the interface and not just the learning and adaptive behavior of the operators? x How detail and fidelity the scenarios should be? What should be their properties? Which properties of the interface should be used? How can context be specified? x If general principles are being evaluated, what is the minimal set of applications which are needed in the evaluation? The above questions are more or less related to the evaluation of the basic requirements to the usability. How to measure the high level fulfillment of the needs and satisfaction? Any evaluation is context driving, is it possible to build a general theory to guide the usability evaluation for different speech interaction system? External studies need to be carried out in this area. 6. Problem with design guidelines Usability design requires integration of multiple disciplines, such as speech technology, computer technology, the knowledge in the application domain, human cognition, human factors and ergonomics, and even social/organizational knowledge. It is impossible for one designer to have all the knowledge. Therefore many design “guidelines” are issued in different user interface design books, or journal papers [21-23]. It was believed that design “guidelines” should be the helpful and useful for the engineers to use in their design. Many (if not all) guidelines vary in the extent to which they are derived from their specific research findings. Their scope is rarely made explicit and it remains for the designer to judge the applicability of a guideline to a particular user interface and to apply it accordingly. The body of the guidelines is incomplete – many design issues are just not covered by guidelines [24]. Guidelines from different resources may differ in detail, may not tell clearly of the application conditions; and may even be contradictory. People try to abstract the guidelines with abstracted statement to show its “external validity”, or try to pretend as having a high “concept level”, while any design is context depending. How to make sure that the engineer used the “right” guideline? For example, the guidelines for speech interface design provided by Baber [23]: x Match the type of work the recognizer is intended to perform with the characteristics of the available technology There are many problems with such “guidelines”: how to match the technology and the type of the work? What are the criteria? How to measure the match? Life [24] has suggested that if the guideline could be presented as “IF (condition) THEN (system performance consequence) BECAUSE (interaction model constraint), HENCE (guideline, expressed as a system design prescription)”, then it can be very handy. This is almost an impossible dream, because nobody can cover all the possible application conditions that a speech interaction system may be applied. Many of the application conditions are unpredictable in present condition, and application context may change due to the development of the technology and people’s needs. At the same time, there are almost unlimited consequences, and constraints one can identify according to the application context. The matrix of the three entities with enormous amount of variability in each can come out with millions of detail guidelines, which may be difficult for the designers to find out the proper guidelines, while it may still be danger of not covering the situation which the designer is working for. [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] 7. Discussion and conclusion Usability is very much context dependent. The traditional user-centered design approach may not be suitable for the speech interaction system design due to the limited knowledge and understanding of what the technology can do. Usage-centered design focus on the roles, goals and responsibilities people has. To fulfill the high level needs is the final goal of the design, while the usability of the system is the fundamental requirement. Usage-centered design is a new concept. There is not comprehensive theory and methodology for the design process and evaluation. Design “guidelines” is not a good solution to help the designer to increase the usability of the system. The integration of multiple disciplines knowledge is important for the design and it is hard to find the short cuts. 8. References [1] Steeneken, H. J. M., "Potentials of speech and language technology systems for military use: an application and technology oriented survey," NATO, Defence Research Group AC/243(Panel 3)TR/21, 1996. [17] [18] [19] [20] [21] [22] [23] [24] Weinstein, C. J., "Opportunities for advanced speech processing in military computer-based systems," Proceedings of the IEEE, vol. 79, 1991, pp. 1626-1641. Weinstein, C. J., "Military and government applications of human-machine communication by voice," Proc. Natl. Acad. Sci. USA, vol. 92, 1995, pp. 10011-10016. Leduc, N., Dougherty, M., Ankaitis, V., "Measuring the performance of speech applications: a user-centered approach," in Universal Access in HCI: Towards an information society for all, C. Stephanidis, Ed., Lawrence Erlbaum Associates, publishers, 2001, pp. 372-376. Jones, D., Hapeshi, K., Frankish, C., " Human factors and the problems of evaluation in the design of speech systems interfaces," People and Computers III: proceedings of the third conference of the British Computer society; Human-computer interaction specialist group, 1987, pp. 41-49. Jones, D. M., "Automatic speech recognition in practice," Behav. & lnf Tech, vol. 11, 1992, pp. 109-122. Baber, C., Hone, K.S., "Modelling error recovery and repair in automatic speech recognition.," International Journal of ManMachine Studies, vol. 39, 1993, pp. 495-515. Gibbon, D., "Handbook of multimodal and spoken dialogue systems: resources, terminology and product evaluation," in The Kluwer international series in engineering and computer science: Kluwer Academic, 2000. Sturm, J., Bakx, I., Cranen, B., Terken, J., "Comparing the usability of a user driven and a mixed initiative multimodal dalogue system for train timetable information," EuroSpeech 2003, 2003, pp. 2245. Hone, K. S., Graham, R., "Subjective assessment of speechsystem interface usability," Eurospeech 2001, 2001. Stanton, N., Human Factors in Consumer Products, Taylor & Francis, 1998. Coe, M., Human Factors for Technical Communicators, John Wiley & Sons, 1996. Smith, A., Human-Computer Factors: A study of Users and Information Systems, The McGraw-Hill Companies, 1997. Norman, D., The Design of Everyday Things. New York, Basic Books, 1988. Ravden, S. J., Johnson, G.I., Evaluating Usability of HumanComputer Interfaces: A practical method. Chichester, Ellis Horwood, 1989. Dybkjaer, L., Bernsen, N.O., "Usability evaluation in spoken language dialogue system," Proceedings of the Workshop on Evaluation for Language and Dialogue Systems, Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter (ACL/EACL) 2001, 2001, pp. 9-18. Eason, K. D., "User-centred design: for users or by users?," Ergonomics, vol. 38, 1995, pp. 1667-1673. Rakers, G., "Interation design process," in User Interface Design for Electronic Appliances, B. T. K. Baumann, Ed. London and New York, Taylor & Francis, 2001, pp. 7-47. Flach, J. M., Tanabe, F., Monta, K., Vicente, K.J., Rasmussen, J., "An ecological approach to interface design," Proceedings of Human Factors and Ergonomics Society 42nd Annual Meeting, 1998, pp. 295-299. Moray, N., "Advanced displays can be hazardous: the problem of evaluation," pp. 59-62. Jones, D., Hapeshi, K., & Frankish, C., "Design guidelines for speech recognition interfaces," Applied Ergonomics, vol. 20, 1989, pp. 47-52. Baber, C., "Automatic speech recognition in adverse environments," Human factors, vol. 38, 1996, pp. 142-155. Baber, C., Noyes, J., "Speech control," in User Interface Design for Electronic Appliances, B. T. K. Baumann, Ed. London and New York, Taylor & Francis, 2001, pp. 190-208. Life, M. A., Long, J.B., "Providing human factors knowledge to non-specialists: a structured method for the evaluation of future speech interfaces," Ergonomics, vol. 37, 1994, pp. 1801-1842.
© Copyright 2025