Development of Web-based Conversation Materials Using Automatic Speech Recognition: How to Compliment Tsuo-Lin Chiu, Yachi Chuang, Yuli Yeh, Hsien-Chin Liou, and Jiang-Cun Chen Department of Foreign Languages and Literature, National Tsing Hua University, g925256@alumni.nthu.edu.tw (T. L. Chiu) Abstract The instruction on speech acts and intelligible pronunciation is essential for developing EFL learners’ oral competence. For most learners it often takes time before being able to produce spontaneous speech in the target language. Recently, the technology of Speech Recognition has become mature enough for applications in facilitating student learning of oral communication. CALL design principles and instructional design of speech acts for oral communication have been incorporated into a project of web-based speech-act materials “compliment” with the support of speech recognition technology. Specifically, automatic pertinent feedback is given from the computer program to improve learners’ pronunciation when learner deviation is noted. Based on the analysis of complimenting patterns, two activities are developed to demonstrate proper use of compliment. A speech recognizer analyzes learners’ input and gives pertinent feedback to contextualize and reinforce the learning. The feedback comes mainly in two formats--typical pronunciation errors of Taiwanese learners’ pronunciation and the wave form of learners’ production in comparison with a native speaker’s model. For assessment of learning outcome, online speaking tasks requiring learners to react to daily situations are designed. A scale for evaluating comprehensibility and appropriateness of the speech act is proposed for teacher raters. Introduction The instruction on speech acts and intelligible pronunciation is essential for developing EFL learners’ oral competence. For most learners, however, it often takes a long time before being able to produce spontaneous speech in the target language. For the development of oral competence, researchers (e.g. Cohen & Olshtain, 1993) have claimed that explicit teaching of linguistic forms such as speech acts (for achieving language functions) could be beneficial. Another element contributing to fluent and accurate oral performance for successful communication is pronunciation. Achieving mutually intelligible pronunciation has been recognized as the goal of pronunciation instruction (Derwing & Munro, 2005; Levis, 2005). Recently, the technology of Automatic Speech Recognition (ASR) has become mature enough for applications of Computer Assisted Language Learning (CALL) in order to facilitate 1 student learning of both oral communication and pronunciation (Chiu, 2006; Tsai, 2003). CALL design principles and instructional design of speech acts for oral communication have thus been incorporated into our development project of web-based speech-act materials “compliment” with the support of speech recognition technology. Chiu (2006) in his previous study reported that learners would like to receive feedback concerning their language production from the technology of speech recognition in an ASR-supported conversation environment. Thus, the feature of corrective and pertinent feedback, automatically generated from the speech recognizer in order to improve learners’ pronunciation, is integrated into the current design. The feedback comes mainly in two formats—recognizing typical pronunciation errors of Taiwanese EFL learners and presenting the wave form of learners’ production in comparison with a native speaker’s model. Based on the analysis of complimenting patterns (Manes & Wolfson, 1981), two activities are developed to demonstrate proper use of compliment. First, sentences demonstrating patterns of proper usages on giving and receiving compliments are presented for practice. Second, two exercises are provided for appropriate use of the patterns in daily contexts. For both activities, a speech recognizer analyzes learners’ input and gives pertinent feedback to contextualize and reinforce the learning. For assessment of learning outcome, online speaking tasks that require learners to react in daily situations are designed. A scale for evaluating comprehensibility and appropriateness of the speech act is proposed for human or teacher raters. In this article, the “compliment” unit and its speech assessment environment are presented, together with the designing principles and implications for how English teachers may use the materials to enhance EFL learners’ oral production and assessment of oral performance. The Speech Act of Compliment Speech Acts are strings of words performed for achieving communicative functions. Successful performance of speech acts depends on both the linguistic units chosen and their appropriateness under the context of communication. In other word, a good speech act should be performed both sociolinguistically and socioculturally acceptable (Cohen & Olshtain, 1994). Non-native speakers often have difficulties catching the norm of proper speech act performance in the target language, and their performances tend to be verbose, repetitive, and lengthy (Blum-Kulka et al., 1989). Thus, scholars such as Cohen and Olshtain (1993) suggested that explicit teaching on use of speech act is needed. Compliment is a speech act which is performed differently in various cultures. For example, a compliment can be used for opening a dialogue, greeting, or thanking. 2 However, people from different cultures might have different interpretations toward the same complimenting behavior. Manes and Wolfson (1981) examined the patterns of compliments in American English. From 686 compliments gathered from a variety of naturally occurring situations, they found three major patterns, six sub-major patterns, and a number of most frequently used adjectives and verbs for performing compliments (for details, see Manes and Wolfson, 1981). They suggested that if learners are given these most frequently found patterns, adjectives, and verbs, they may have fewer difficulties in producing compliments which conform to the patterns used by native speakers of English. In other words, explicit teaching on the structures and linguistic units could be helpful for English learners to catch the norms of complimenting behavior in English. Pronunciation Instruction and Design of Instructional Material with Speech Recognition Technology To achieve native-like pronunciation or to achieve mutually-intelligible pronunciation are two competing principles guiding the instruction of pronunciation (Levis, 2005). While addressing pronunciation instruction, language teachers should set realistic goals for their learners. For example, some researchers (Jenkins, 2002; Derwing & Munro, 2005) suggested that language teachers should consider the context of their instruction before setting their goals. If learners are in the context where native-speakers are the majority, their instructional goal then should be more toward native-norm. If learners are in the context where English is not the primary language, it would be unrealistic and extremely difficult for learners to achieve native-like pronunciation. Therefore, given the status of English today and the EFL (English as a foreign language) context in Taiwan, pronunciation instruction aiming at achieving mutual intelligibility seems to be more appropriate. Up to the present, a number of studies have attempted to find out the factors contributing to intelligibility by investigating the relationship between speaker-listener background (Munro, Derwing, & Morton, 2006) and the suprasegmental factors (Field, 2005; Hahn, 2004); nevertheless, as Derwing and Munro (2005) claimed, more research investigating the elements of intelligibility would still be needed. Recently, researchers have evidenced the advantages brought by Automatic Speech Recognition (ASR) in teaching pronunciation and speaking (Bernstein & Najmi & Ehsani, 1999; Egan, 1999; Ehsani & Knodt, 1998; Eskenazi, 1999). Studies (Chiu, 2006; Tsai, 2003) have also shown its strengths on pronunciation instruction and training of speech acts for EFL learners. On the other hand, there are also drawbacks of the application of ASR on language learning reported by previous researchers and studies, of which the most frequently mentioned is recognition error. 3 However, given the advantages shown and its potentials in facilitating speaking-related skills, which are often hard to develop for most EFL learners, it is still worth exploring its efficacy in language learning. All in all, as Wachowicz and Scott (1999) suggested, it is not the question whether ASR technology should be used in language learning, but how it should be used. That is, the effectiveness of applying ASR technology for language learning is determined less by the speech recognizers than by the design of activities and repair strategies to cope with current limitations. Designing factors contributing to successful pronunciation instruction in ASR-supported environment have also been discussed in previous studies. Neri, et al. (2002), for example, recommended that four factors--input, output, feedback, and reduction of stress--should be considered when developing a pronunciation unit. Learners need to be exposed to adequate amount of contextually-meaningful input in order to construct their own language model. Ample opportunities for speech production could allow learners to compare their output with the input model so as to form a correct L2 model. Further, pertinent feedback given by the system, here the recognizer, could help learners notice the discrepancies between their speech production and the L2. Finally, a stress-reduced environment is needed for encouraging more speech production. Speech Assessment In speech assessment, reacting to situations tasks (Luoma, 2004) requires test-takers to give appropriate responses to the situations given. Test-takers are often given some planning time for reading the situations and formulating appropriate responses. It is easier for the testers to get a versatile impressions of test-takers’ speaking skills as to fit in different language situations. For consideration of fair speech assessment, the effect of planning time on test-takers’ speech production is crucial. A study by Li and Chen (2005) examined the influence of planning on the speaking performance of high- and low-proficiency groups of adult EFL learners. The learners completed two tests with similar difficulty levels, and the first test was done with planning time given while the second one without. The major result of the study indicated that planning was more effective for the low proficiency group in which more complicated and accurate utterances were produced in the test. The study implicated that different lengths of planning time should be given to learners of different proficiency levels during speaking assessments. Design and Development of the “Compliment” Unit Aiming at training English learners for appropriate complimenting behaviors and intelligible pronunciation with the support of speech recognition technology, the 4 instructional unit “How to compliment” was designed based on CALL designing principles and the analysis of complimenting patterns (Manes & Wolfson, 1981). In addition, in the learning process, pertinent feedbacks and phoneme segmentation technique from current speech recognition technology would assist learners pinpoint their pronunciation errors and compare their pronunciation with a native model. Further, online speaking tasks that require learners to react in daily situations are designed for evaluation of learning outcome from the instructional unit. A scale for evaluating comprehensibility and appropriateness of the speech act is proposed for human or teacher raters. There are five steps for learners to complete the whole learning process as illustrated in Figure 1. Learners need to log on the website (http://candle.cs. nthu.edu.tw, under speaking), download the instructional unit, practice the patterns for a period of time, then go back to our website and undertake the speech assessment. In the following sections, the instructional unit and the speech assessment environment accompanied will be detailed. Figure 1 Flowchart of Entire Procedure The Instructional Unit There are two sub-sections in the unit: pattern-practice section and exercise section. In the pattern-practice section, nine complimenting patterns based on the analysis by Manes and Wolfson (1981) are presented for training. The formula corresponding to the complimenting pattern they just practiced is presented at the bottom. Figure 2 demonstrates an example of this pattern-practice interface. In the exercise section, two complimenting situations are presented for learners to practice how to apply the patterns in real-life contexts. After each section is completed, a summary page recording learners’ performance and their pronunciation errors in each 5 sentence, as illustrated in Figure 3, will be given for later self-reflection. We have also integrated the four factors—input, output, feedback, and reduction of stress-- for successful pronunciation units (Neri, et al., 2002) into our design. Since input provided is essential for an instructional unit aiming at developing oral competence, we provided native and non-native models as learners’ input in this unit. Throughout the unit, each sentence is recorded by a native English speaker as a model voice for imitation. Learners could listen to the native model before uttering their own sentences. Moreover, in the exercise section, in addition to the native model, appropriate examples of complimenting behaviors performed by non-native speakers of English are also given. Figure 4 demonstrates the first situation in the exercise section and the selections for non-native speaker examples. Being exposed to diversified input from both native and non-native speakers, learners could thus better construct their own language models. Moreover, for some learners, being exposed to quality non-native speakers’ models might help them set realistic goals in developing their oral language skills and become less frustrated. Ample chances for producing output would be another crucial element in developing oral competence. In the unit, each sentence corresponding to the complimenting patterns together with sentences in the exercise section provide chances for learner practice, wherever and whenever they would like to do so. The feedback given should be beneficial and constructive for learners during oral production. In the unit, besides self-examination, learners’ language production could also be examined and then responded by the speech recognizer with pertinent feedbacks. Figure 5 demonstrates how we present feedback in the unit. The feedback generated by the speech recognizer are mainly in two formats: (1) to detect and remind learners of typical pronunciation errors of Taiwanese English speakers and (2) to provide wave forms of learners’ pronunciation in comparison with a native model. Twenty-two typical Taiwanese English learners’ pronunciation errors (for details, see http://ccms.ntu.edu.tw/~karchung/intro%20page%2029.htm), including twelve consonant errors and ten vowel errors (see Appendix A), serve as a reference list for evaluating our learners’ utterances before providing potential feedback. Our system also presents learners’ utterances in the waveform format. That is, by employing phoneme-segmentation technique, it allows learners to do word-by-word comparison of their performance with a native model presented right above. With a simple click on each word in a sentence, learners could hear the deviance between their pronunciation and the native model, making it easier for learners to examine their utterance of each sentence at word-level. Take Figures 2 & 4 for example: each button containing a word right above the wave form is clickable for listening to the comparison between a learner’s own voice and the native model. In addition to the 6 two formats of feedback given, three levels of judgment (very good, not bad, and try again) concerning learners’ overall pronunciation are also provided. The judgment is made as a result of comparison with a speech corpus containing 640 non-native speakers’ data of various proficiency levels (see TIMIT Acoustic–Phonetic Continuous Speech Corpus http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1). As mentioned, the judgment made on each sentence together with learners’ errors detected will be recorded in the summary page for later self-reflection. A stress-reduced environment is beneficial for facilitating oral production. As some learners might be too shy to speak in the target language during face-to-face encounters, these inhibiting factors are removed while engaging in a CALL instructional practice. In this unit, learners could feel at ease practicing their pronunciation. Further, they would be less-frustrated or less-intimidated when quality non-native speaker models are also accessible in the unit. Our unit is therefore different than materials that offer only native models which are usually tough for learners to set as their goals for learning. Figure 2 Pattern Practice 7 Figure 3 Summary Page Figure 4 Exercise Section 8 Figure 5 An Example of Feedback The Online Speech Assessment Task An online speech assessment task, using the format of reacting to situations (Luoma, 2004) with planning time given, is developed for evaluating learners’ learning outcomes from the instructional unit. A total of four different complimenting situations are devised. An analytic rating scale adapted from Chiu (2006) consisting of two criteria, the appropriateness of use of speech act and comprehensibility of speech, is proposed. Upon entering the test, learners will first be given a warm-up question for testing the microphone volume and also clearing their throats. Followed by the warm-up are four formal questions. For each question, a planning time of 30 seconds maximum would be given. Then, learners will have a maximum of 60 seconds to answer each question, and the sound-files of their answer will be uploaded to the server. Figure 6 demonstrates an example of the test design. An interface for teachers as illustrated in Figure 7 is developed where teachers could easily monitor students’ progress, listen and score students’ test results. A blank column is provided for teachers to key in comments regarding individual performance. A number of ready-made comments are also provided for reference. 9 Figure 6 Online Speech Assessment: Student Interface Figure 7 Online Speech Assessment: Teacher Interface The Rating Scale An analytic scale with two criteria, comprehensibility and use of speech act, was adopted in Chiu (2006) on evaluating 49 college freshmen’s speech-act productions (http://candle.cs.nthu.edu.tw, Candletalk under speaking) with two raters and a high inter-rater reliability of .87. Based on the experiences, it seems that an expended scale in the criterion of comprehensibility may better distinguish learners of different oral proficiencies. The descriptors of the concise scale for the Test of Spoken English Scale (ETS, 2001) were thus adopted. The new analytic scale for the current project 10 consists of the same two criteria but wider range, (1) comprehensibility, coded by a 5-point scale ranging from 0 to 4, and (2) use of speech act, coded by a 4-point scale, from 0 to 3. This is proposed for teacher raters when evaluating learners’ complimenting behaviors. Tables 1 & 2 below demonstrate the scoring scale for the two criteria. Table 1 Scoring scale for comprehensibility (ETS, 2001, retrieved from Luoma, 2004, p. 69) Score Descriptions 0 No effective communication: no evidence of ability to perform task 1 Communication generally not effective: task generally performed poorly 2 Communication somewhat effective: task performed somewhat competently 3 Communication generally effective: task performed competently 4 Communication almost always effective: task performed very competently Table 2 Scoring scale for the use of speech act (Chiu, 2006) Score Descriptions 0 No answer or misuse of a speech act 1 Correct use of a speech act based on the question, but fragmented, inappropriate, or incomplete 2 Correct use of a speech act, appropriate, but containing some non-native features, such as verbose, lengthy, and repetitive utterances 3 Correct use of a speech act. Effective and appropriate Implications and Conclusion Insufficient training for oral production in English is considered a common problem for most English learners in Taiwan. This problem is attributed to either the lack of output production (e.g. talking to native speakers) or learners’ inhibition in producing the target language. Not every learner could afford the high cost of finding a speaking tutor or studying in an English-speaking country. Even if they do have 11 opportunities of speaking in English, they might be frustrated or embarrassed by the mistakes they make and thus refrained from further practice. In addition, for some learners who have attained comprehensible pronunciation but still have room for improvement, they might need pertinent feedback regarding their persistent pronunciation errors. However, explicitly pinpointing interlocutors’ pronunciation errors might be considered rude in face-to-face interaction. Given the common problems faced by most learners in Taiwan, an instructional unit supported by speech recognition technology designed in the current project could be helpful for self-instruction in oral production. In addition to a stress-reduced learning environment with the combination of native and non-native model input, the unit provides ample chances for meaningful output production of complimenting patterns as well as pertinent and individualized feedback on pronunciation. After implementing our design, two development implications can be drawn. First, feedback on suprasegmentals might be needed for further help for pronunciation training. Secondly, tools that could present to learners the 3D illustration for pronouncing certain difficult vowels and consonants would better equip the learners for self-learning when they are now provided with feedback on their own errors in the learning process and would like to further practice on the errors. There are also three pedagogical implications for its use in teaching. With the advanced technology in speech recognition, learners’ attention now could be directed to persistent pronunciation errors indicated by the system along the process. Teachers could also collect students’ error records and focus more on those errors that learners have the most difficulty on or those that learners are not aware of and thus can not correct by themselves. Furthermore, since feedback on suprasegmentals is not yet available in the current project, teachers might provide more instruction on this aspect which is also essential for comprehension. Last, in accessing each student’s test data and giving individualized feedback concerning learners’ performance, teachers can make the best use out of the online speech assessment environment accompanying this unit. To conclude, to facilitate more effective teaching and learning of oral production, the current project “How to compliment” employs advanced technology of speech recognition for helping English learners produce correct patterns of compliment in English with intelligible pronunciation and for relieving teachers’ loads by saving the precious class-hours. Four design principles contributing to a successful pronunciation environment are integrated into the design of the unit. An online speech assessment task for evaluating learning outcome from the unit has also been developed. Future development of the unit might include feedback on suprasegmentals (e.g. intonation, lexical stress) accompanied by contents for other speaking skills so as to better 12 explore the potentials of speech recognition technology on oral production. More research on the evaluation of instruction with speech recognition technology to validate its efficacy is also needed. ACKNOWLEDGEMENTS: We would like to thank two programmers, Hsi-Hsien and Hsin-ping Yu, as well as help from Professor Roger Chang at Computer Science of Tsing Hua University, and the financial support provided by the National Science Council under the National Science and Technology Program for E-Learning (NSC 94-2524-S007-001). References Bernstein, J., & Najmi, A., & Ehsani, F. (1999). Subarashii: encounters in Japanese spoken language education. CALICO Journal, 16 (3), 361-384. Blum-Kulka, S., House, J., & Kasper, G. (1989). Cross-cultural pragmatics: Requests and apologies. Norwood, NJ: Ablex. Chiu, T. L. (2006). Effects of online conversation materials with the support of speech recognition technology on college EFL learners. MA thesis, Tsing Hua University, Taiwan. Cohen, A. D., & Olshtain, E. (1993).The production of speech acts by EFL learners. TESOL Quarterly, 27(1), 33-56. Cohen, A. D., & Olshtain, E. (1994). Researching the production of second-language speech acts. In E. E. Tarone & S. M. Gass & A. D. Cohen (Eds.), Research methodology in second-language acquisition (pp. 143-156). USA: Lawrence Erlbaum Associates, Inc.. Derwing, T. M., Munro, M. J. (2005). Second language accent and pronunciation teaching: a research-based approach. TESOL Quarterly, 39, 379-397. Egan, K. B. (1999). Speaking: a critical skill and a challenge. CALICO Journal, 16 (3), 277-293. Ehsani, F., Knodt, E. (1998). Speech technology in computer-aided language learning: strengths and limitations of a new CALL paradigm. Language Learning & Technology, 2(1), 45-60. Eskenazi, M. (1999). Using automatic speech processing for foreign language pronunciation tutoring: some issues and a prototype. Language Learning & Technology, 2(2), 62-76. ETS (2001). TSE and SPEAK score user guide. 2001-2992 edition. Princeton, NJ: Educational Testing Service. Online version of a current score user guide available from http://www.toefl.org/tse/tseindex.html. Field, J. (2005). Intelligibility and the listener: the role of lexical stress. TESOL 13 Quarterly, 39, 399-423. Hahn, L. D. (2004). Primary stress and intelligibility: research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38, 201-223. Jenkins, J. (2002). A sociolinguistically-based, empirically-researched pronunciation syllabus for English as an international language. Applied Linguistics, 23, 83-103. Levis, J. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39, 369-377. Li, M. C., & Chen, C. Y. (2005). The effect of planning on speaking assessment scores. English Teaching & Learning, 30.1, 41-67. Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press. Manes, J., & Wolfson, N. (1981). The compliment formula. In F. Coulmas. (Ed.), Conversational routine (pp. 115-132). The Hague: Mouton. Munro, M. J., Derwing, T.M., & Morton, S.L. (2006). The mutual intelligibility of L2 speech. TESOL Quarterly, 40, 111-131. Tsai, P. H. (2003) A duet of pedagogy and technology---an evaluation of my ET, a computer assisted pronunciation training system made in Taiwan. Proceedings of the 21st International Conference on English Teaching and Learning in the R.O.C. (pp. 439-452). Chaoyang University of Technology, Taichung. Wachowicz, K. A., & Scott, B. (1999). Software that listens: it’s not a question of whether; it’s a question of how. CALICO Journal, 16 (3), 253-275. Appendix A Typical Taiwan English allophonic errors, retrieved from http://ccms.ntu.edu.tw/~karchung/intro%20page%2029.htm Consonants: (1) /ks/, when represented by x in the orthography, is often simplified to [s] Example: excuse. (2) /θ/ is often replaced by [s]. Example: thank. (3) /h/ is often realized as [x]. Examples: him, husband, how. (4) Syllable-final /n/ is often deleted, leaving only a nasalized vowel before it. Examples: mine, the one in the stupid green sweater. (5) Word-final /əm/ is often realized as [ən]. Examples (these do not appear in the sample): system, wisdom. (6) Final voiced stops are usually completely devoiced, if they are pronounced at all. Examples: made, jog. (7) Final stops are often deleted, even when the word ends with a grammatical /s/ 14 ending. Examples: stupid, good, at, bought, like, pants, it's. (8) Dark /l/ ([ɫ]), together with the preceding vowel if there is one, is often realized as [oʊ]. Example: also, cold. (9) Postvocalic /r/ is often dropped. Examples: are, warm, person, learn, first, for. (10) Epenthetic [ə] sometimes added before the approximant in consonant clusters. Examples: black, England. (11) Initial /ð/ often replaced by [l] or [d]. Examples: they, them, the. (12) Nasals are often determined allophonically by the backness of the preceding vowel; a nasal after the back vowels [oʊ], [ ɔ], [ʌ], /ɑ/ tends to be realized as a velar nasal [ŋ]; a nasal after the non-high front vowels/ɛ/ or /æ/ tends to be realized as alveolar [n]; though a nasal after high or mid-high front/i/ or /ɪ/ is usually [ŋ], and a nasal after the high and mid-high back vowels /u/ or /ʊ/ is usually [n]. Examples: want, months, in, been, run, (sometimes) him. Vowels: (1) /ɪ/ and /i/are often confused. Examples: is, him, if, seat, need, teacher. (2) /eɪ/ in pre-consonantal position is often pronounced [æ] . Examples: taken, made. (3) /oʊ/ is often realized as [ɔ]. Examples: no, so. (4) The diphthong /aɪ/ is often simplified to the monophthong [a]. Examples: nice, I. (5) /ɛ/ is often replaced by [eɪ] or [æ]. Examples: weather, next. (6) /ʌ/ is often realized as [ɑ]. Examples: husband, months, funny. (7) /ʊ/ is often replaced by [u]. Examples: look, should, good. (8) /ɔ/ is often replaced by [o]. Examples: talk, long. (9) /ɑ/ is often replaced by [ɔ] when orthographically written as o. Examples: not, 15 John, Tom. (10) /z/ often realized as [s] when written as s in the orthography. Examples: is, days, shoes, those, husband. 16
© Copyright 2024