|
|
Project LISTEN |
|
|||||||||||||||||
|
Summary |
Project LISTEN Publications [Note: Links to full text are included when possible, e.g. after publication or conference presentation. * marks publications by others. [Interspeech 2009 predictable] Aist, G., & Mostow, J. (2009, September 6-10). Designing Spoken Tutorial Dialogue with Children to Elicit Predictable but Educationally Valuable Responses. 10th Annual Conference of the International Speech Communication Association (Interspeech), Brighton, UK. Click here for .pdf file. Abstract: How to construct spoken dialogue interactions with children that are educationally effective and technically feasible? To address this challenge, we propose a design principle that constructs short dialogues in which (a) the user’s utterance are the external evidence of task performance or learning in the domain, and (b) the target utterances can be expressed as a well-defined set, in some cases even as a finite language (up to a small set of variables which may change from exercise to exercise.) The key approach is to teach the human learner a parameterized process that maps input to response. We describe how the discovery of this design principle came out of analyzing the processes of automated tutoring for reading and pronunciation and designing dialogues to address vocabulary and comprehension, show how it also accurately describes the design of several other language tutoring interactions, and discuss how it could extend to non-language tutoring tasks. [SLaTE 2009 predictable] Aist, G., & Mostow, J. (2009, September 3-5). Predictable and Educational Spoken Dialogues: Pilot Results. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England. Click here for .pdf file. Abstract: This paper addresses the challenge of designing spoken dialogues that are of educational benefit within the context of an intelligent tutoring system, yet predictable enough to facilitate automatic speech recognition and subsequent processing. We introduce a design principle to meet this goal: construct short dialogues in which the desired student utterances are external evidence of performance or learning in the domain, and in which those target utterances can be expressed as a well-defined set. The key to this principle is to teach the human learner a process that maps inputs to responses. Pilot results in two domains - self-generated questions and morphology exercises - indicate that the approach is promising in terms of its habitability and the predictability of the utterances elicited. We describe the results and sketch a brief taxonomy classifying the elicited utterances according to whether they evidence student performance or learning, whether they are amenable to automatic processing, and whether they support or call into question the hypothesis that such dialogues can elicit spoken utterances that are both educational and predictable. [SLaTE 2009 prosody] Duong, M., & Mostow, J. (2009, September 3-5). Detecting Prosody Improvement in Oral Rereading. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England. Click here for .ppt file. Click here for .pdf file. Abstract: A reading tutor that listens to children read aloud should be able to detect fluency growth - not only in oral reading rate, but also in prosody. How sensitive can such detection be? We present an approach to detecting improved oral reading prosody in rereading a given text. We evaluate our method on data from 133 students ages 7-10 who used Project LISTEN's Reading Tutor. We compare the sensitivity of our extracted features in detecting improvements. We use them to compare the magnitude of recency and learning effects. We find that features computed by correlating the student's prosodic contours with those of an adult narration of the same text are generally not as sensitive to gains as features based solely on the student's speech. We also find that rereadings on the same day show greater improvement than those on later days: statistically reliable recency effects are almost twice as strong as learning effects for the same features. [SLaTE 2009 contexts] Liu, L., Mostow, J., & Aist, G. (2009, September 3-5). Automated Generation of Example Contexts for Helping Children Learn Vocabulary. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England. Click here for .pdf file. Abstract: This paper addresses the problem of generating good example contexts to help children learn vocabulary. We construct candidate contexts from the Google N-gram corpus. We propose a set of constraints on good contexts, and use them to filter candidate example contexts. We evaluate the automatically generated contexts by comparison to example contexts from children’s dictionaries and from children’s stories. * [IDEC 2009] Reeder, K., Shapiro, J., & Wakefield, J. (2009, July 19-22). A computer based reading tutor for young English language learners: recent research on proficiency gains and affective response. 16th European Conference on Reading and 1st Ibero-American Forum on Literacies, University of Minho, Campus de Gualtar, Braga, Portugal. [AIED 2009 prosody] Mostow, J., & Duong, M. (2009, July 6-10). Automated Assessment of Oral Reading Prosody. Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED2009), Brighton, UK, 189-196. Click here for .pdf file. Abstract: We describe an automated method to assess the expressiveness of children's oral reading by measuring how well its prosodic contours correlate in pitch, intensity, pauses, and word reading times with adult narrations of the same sentences. We evaluate the method directly against a common rubric used to assess fluency by hand. We also compare it against manual and automated baselines by its ability to predict fluency and comprehension test scores and gains of 55 children ages 7-10 who used Project LISTEN's Reading Tutor. It outperforms the human-scored rubric, predicts gains, and could help teachers identify which students are making adequate progress. [AIED 2009 questioning] Mostow, J., & Chen, W. (2009, July 6-10). Generating Instruction Automatically for the Reading Strategy of Self-Questioning. Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED2009), Brighton, UK, 465-472. Click here for .pdf file. Abstract: Self-questioning is an important reading comprehension strategy, so it would be useful for an intelligent tutor to help students apply it to any given text. Our goal is to help children generate questions that make them think about the text in ways that improve their comprehension and retention. However, teaching and scaffolding self-questioning involve analyzing both the text and the students’ responses. This requirement poses a tricky challenge to generating such instruction automatically, especially for children too young to respond by typing. This paper describes how to generate self-questioning instruction for an automated reading tutor. Following expert pedagogy, we decompose strategy instruction into describing, modeling, scaffolding, and prompting the strategy. We present a working example to illustrate how we generate each of these four phases of instruction for a given text. We identify some relevant criteria and use them to evaluate the generated instruction on a corpus of 513 children’s stories. [QG 2009 informational] Chen, W., Aist, G., & Mostow, J. (2009, July 6). Generating Questions Automatically from Informational Text. Proceedings of AIED 2009 Workshop on Question Generation, Brighton, UK, 17-24. Click here for .pdf file. Abstract: Good readers ask themselves questions during reading. Our goal is to scaffold this self-questioning strategy automatically to help children in grades 1-3 understand informational text. In previous work, we showed that instruction for self-questioning can be generated for narrative text. This paper tests the generality of that approach by applying it to informational text. We describe the modifications required, and evaluate the approach on informational texts from Project LISTEN's Reading Tutor. [EDM 2009 logging] Mostow, J., & Beck, J. E. (2009, July 1-3). Why, What, and How to Log? Lessons from LISTEN. Proceedings of the Second International Conference on Educational Data Mining, Córdoba, Spain, 269-278. Click here for paper as .pdf file. Click here for poster as .pptx file. Abstract: The
ability to log tutorial interactions in comprehensive, longitudinal,
fine-grained detail offers great potential for educational data mining – but
what data is logged, and how, can facilitate or impede the realization of
that potential. We propose guidelines
gleaned over 15 years of logging, exploring, and analyzing millions of events
from Project LISTEN’s Reading Tutor and its predecessors. [SSSR 2009 prefixes] Mostow, J., Gates, D., McKeown, M., & Aist, G.
(2009). How often are prefixes useful cues to word meaning? Less than you might think! Sixteenth
Annual Meeting of the Society for the Scientific Study of Reading,
Boston. Click here for .ppt file. Abstract: We report the frequency and cue validity in WordNet
and some large text corpora of several common prefixes often advocated as
worth teaching in early grades. To
estimate the cue validity of a prefix to word meaning, e.g. “un-,” to the
meaning of over 10,000 distinct words, e.g. “undo” and “uncle,” we computed
what percentage of their WordNet definitions contain keywords for the meaning
of the prefix, e.g. "cancel," "lack," "no," “not,"
“opposite," “reverse,” etc. We
analyze the cue validity of each prefix, both overall and how it varies by
corpus and by lexical properties such as word frequency, length, part of
speech, and whether the remainder of the word is also a word. This analysis revealed that their utility
in deciphering word meaning varies considerably, and is surprisingly poor for
some prefixes. We discuss the
implications of these findings for vocabulary instruction in different
grades, and for readers at varying levels of sophistication with respect to
word structure and word meaning. [ICTD 2009 Ghana] Mills-Tettey, A., Mostow, J., Dias, M. B., Sweet, T. M., Belousov, S. M., Dias, M. F., & Gong, H. (2009, April 17-19). Improving Child Literacy in Africa: Experiments with an Automated Reading Tutor. 3rd IEEE/ACM International Conference on Information and Communication Technologies and Development (ICTD2009), 129-138. Carnegie Mellon, Doha, Qatar. Honorable Mention Student Paper Award. Click here for .pdf file. Abstract: This paper describes a research endeavor aimed at exploring the role that technology can play in improving child literacy in developing communities. An initial pilot study and subsequent four-month-long controlled field study in Ghana investigated the viability and effectiveness of an automated reading tutor in helping urban children enhance their reading skills in English. In addition to quantitative data suggesting that automated tutoring can be useful for some children in this setting, these studies and an additional preliminary pilot study in Zambia yielded useful qualitative observations regarding the feasibility of applying technology solutions to the challenge of enhancing child literacy in developing communities. This paper presents the findings, observations and lessons learned from the field studies. [IWCS 2009 mental] Chen, W. (2009). Understanding Mental
States in Natural Language. Proceedings of the 8th International Workshop
on Computational Semantics, Tilburg, Netherlands, 61-72.
Click here for .pdf file. Abstract: Understanding mental states in narratives
is an important aspect of human language comprehension. By “mental states” we
refer to beliefs, states of knowledge, points of view, and suppositions, all
of which may change over time. In this paper, we propose an approach for
automatically extracting and understanding multiple mental states in stories.
Our model consists of two parts: (1) a parser that takes an English sentence
and translates it to some semantic operations; (2) a mental-state inference
engine that reads in the semantic operations and produces a situation model
that represents the meaning of the sentence. We present the performance of the
system on a corpus of children stories containing both fictional and
non-fictional texts. [ITS 2008 help] Beck, J. E., Chang, K.-m., Mostow, J., &
Corbett, A. (2008, June 23-27). Does help help? Introducing the Bayesian Evaluation and
Assessment methodology. 9th International Conference on Intelligent Tutoring
Systems, Montreal, 383-394. ITS2008 Best Paper Award. Click here for .pdf file. Abstract: Most ITS have a means of providing assistance to the student, either on student request or when the tutor determines it would be effective. Presumably, such assistance is included by the ITS designers since they feel it benefits the students. However, whether-and how-help helps students has not been a well studied problem in the ITS community. In this paper we present three approaches for evaluating the efficacy of the Reading Tutor's help: creating experimental trials from data, learning decomposition, and Bayesian Evaluation and Assessment, an approach that uses dynamic Bayesian networks. We have found that experimental trials and learning decomposition both find a negative benefit for help--that is, help hurts! However, the Bayesian Evaluation and Assessment framework finds that help both promotes student long-term learning and provides additional scaffolding on the current problem. We discuss why these approaches give divergent results, and suggest that the Bayesian Evaluation and Assessment framework is the strongest of the three. In addition to introducing Bayesian Evaluation and Assessment, a method for simultaneously assessing students and evaluating tutorial interventions, this paper describes how help can both scaffold the current problem attempt as well as teach the student knowledge that will transfer to later problems. [ITS 2008 LD] Beck, J. E., & Mostow, J. (2008, June 23-27). How
who should practice: Using learning
decomposition to evaluate the efficacy of different types of practice for
different types of students. 9th International Conference on Intelligent
Tutoring Systems, Montreal, 353-362.
Nominated for Best Paper. Click here for .pdf file. Abstract: A basic question of instruction is how much students will actually learn from it. This paper presents an approach called learning decomposition, which determines the relative efficacy of different types of learning opportunities. This approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other. We analyze 346 students reading 6.9 million words and show that different types of practice differ reliably in how efficiently students acquire the skill of reading words quickly and accurately. Specifically, massed practice is generally not effective for helping students learn words, and rereading the same stories is not as effective as reading a variety of stories. However, we were able to analyze data for individual student's learning and use bottom-up processing to detect small subgroups of students who did benefit from rereading (11 students) and from massed practice (5 students). The existence of these has two implications: 1) one size fits all instruction is adequate for perhaps 95% of the student population using computer tutors, but as a community we can do better and 2) the ITS community is well poised to study what type of instruction is optimal for the individual. [ITS 2008 compare] Zhang, X., Mostow, J., & Beck, J. E. (2008). A Case Study Empirical Comparison of Three Methods to Evaluate Tutorial Behaviors. 9th International Conference on Intelligent Tutoring Systems, Montreal, 122-131. Click here for .pdf file. Abstract: Researchers have used various methods to evaluate the fine-grained interactions of intelligent tutors with their students. We present a case study comparing three such methods on the same data set, logged by Project LISTEN's Reading Tutor from usage by 174 children in grades 2-4 (typically 7-10 years) over the course of the 2005-2006 school year. The Reading Tutor chooses randomly between two different types of reading practice. In assisted oral reading, the child reads aloud and the tutor helps. In "Word Swap," the tutor reads aloud and the child identifies misread words. One method we use here to evaluate reading practice is conventional analysis of randomized controlled trials (RCTs), where the outcome is performance on the same words when encountered again later. The second method is learning decomposition, which estimates the impact of each practice type as a parameter in an exponential learning curve. The third method is knowledge tracing, which estimates the impact of practice as a probability in a dynamic Bayes net. The comparison shows qualitative agreement among the three methods, which is evidence for their validity. [EDM 2008 freeform] Zhang, X., Mostow, J., Duke, N. K., Trotochaud, C., Valeri, J., & Corbett, A. (2008, June 20-21). Mining Free-form Spoken Responses to Tutor Prompts. Proceedings of the First International Conference on Educational Data Mining, Montreal, 234-241. Click here for .pdf file. Abstract: How can an automated tutor assess children's spoken responses despite imperfect speech recognition? We address this challenge in the context of tutoring children in explicit strategies for reading comprehension. We report initial progress on collecting, annotating, and mining their spoken responses. Collection and annotation yield authentic but sparse data, which we use to synthesize additional realistic data. We train and evaluate a classifier to estimate the probability that a response mentions a given target. [EDM 2008 analytic] Mostow, J., & Zhang, X. (2008, June 20-21). Analytic Comparison of Three Methods to Evaluate Tutorial Behaviors. Proceedings of the First International Conference on Educational Data Mining, Montreal, 28-37. Click here for .pdf file. Abstract: We compare the purposes, inputs, representations, and assumptions of three methods to evaluate the fine-grained interactions of intelligent tutors with their students. One method is conventional analysis of randomized controlled trials (RCTs). The second method is learning decomposition, which estimates the impact of each practice type as a parameter in an exponential learning curve. The third method is knowledge tracing, which estimates the impact of practice as a probability in a dynamic Bayes net. The comparison leads to a generalization of learning decomposition to account for slips and guesses. [IES 2008] Mostow, J., Corbett, A., Valeri, J., Bey, J., Duke, N. K., & Trotochaud, C. (2008, June 10-12). Explicit Comprehension Instruction in an Automated Reading Tutor that Listens: Year 1 [poster and handout]. IES Third Annual Research Conference, Washington, DC. [FLET 2008] Mostow, J. (2008). Experience from a Reading Tutor that listens: Evaluation purposes, excuses, and methods. In C. K. Kinzer & L. Verhoeven (Eds.), Interactive Literacy Education: Facilitating Literacy Environments Through Technology, pp. 117-148. New York: Lawrence Erlbaum Associates, Taylor & Francis Group. Click here to order book from Amazon.com. Abstract: This chapter gives three good reasons to evaluate reading software, identifies three methods for doing so, and refutes three excuses for not evaluating – namely, that evaluation is premature, unnecessary, or will be done by others: (1) Wizard of Oz
experiments help test whether (and clarify how) a proposed approach might
work, and refute the excuse that evaluation is premature because the approach
has not yet been implemented in a proposed system that may take years to
develop. (2) Conventional
controlled studies help determine whether an implemented system helps
children gain more in reading than they would otherwise. This criterion
is necessary to improve on the status quo, but the difficulty of meeting it
refutes the excuse that evaluation is unnecessary due to the supposedly
innate superiority of learning on computers, or of a proposed way to use
them. (3) Experiments
embedded in an automated tutor help analyze which tutorial actions help which
students and words, thereby guiding improvement of the tutor in ways that
third party evaluation cannot, thus refuting the excuse that evaluation can
be left to others. The chapter
details some practical lessons learned from designing, performing, and
analyzing experiments embedded in Project LISTEN’s school-deployed Reading
Tutor, which uses speech recognition to listen to children read aloud, and is
helping hundreds of children learn to read. [STLL 2008 SC] Aist, G., & Mostow, J. (2008). Faster, better task choice in a reading tutor that listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech Technologies in Computer Assisted Language Learning: From Research Toward Practice (pp. 220-240). New York: Routledge. Abstract: We analyze the efficiency and effectiveness of task choice in the context of a reading tutor that listens to children read aloud. We define efficiency as the time to pick a story, and effectiveness in terms of exposing students to new material. We describe design features we added to improve the Reading Tutor’s efficiency and effectiveness, and evaluate the resulting systems quantitatively, as follows. First, we made the story menu child-friendlier by incorporating two improvements: (a) to support use by nonreaders, the new menu spoke all items on the list; (b) to speed up choice, the new menu required just one click to select an item. Second, we instituted a mixed-initiative story choice policy where the Reading Tutor and the student took turns choosing stories. These improvements made story choice measurably more efficient and effective. [STLL
2008 S98] Mostow, J., Aist, G., Huang, C., Junker, B., Kennedy, R.,
Lan, H., Latimer, D., O'Connor, R., Tassone, R., Tobin, B., & Wierman, A.
(2008). 4-Month evaluation of a learner-controlled Reading Tutor that
listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech
Technologies in Computer Assisted Language Learning: From Research Toward Practice (pp.
201-219). New York: Routledge. Abstract: We evaluated an automated Reading Tutor that let children pick stories to read, and listened to them read aloud. All 72 children in three classrooms (grades 2, 4, 5) were independently tested on the nationally normed Word Attack, Word Identification, and Passage Comprehension subtests of the Woodcock Reading Mastery Test (where they averaged nearly 2 standard deviations below national norms), and on oral reading fluency. We split each class into 3 matched treatment groups: Reading Tutor, commercial reading software, or other activities. In 4 months, the Reading Tutor group gained significantly more in Passage Comprehension than the control group (effect size = 1.2, p=.002) - even though actual usage was a fraction of the planned daily 20-25 minutes. To help explain these results, we analyzed relationships among gains in Word Attack, Word Identification, Passage Comprehension, and fluency by 108 additional children who used the Reading Tutor in 7 other classrooms (grades 1-4). Gains in Word Identification predicted Passage Comprehension gains only for Reading Tutor users, both in the controlled study (n=21, p=.042, regression coefficient B=.495± s.e. .227) and in the other classrooms (n=108, p=.005, B=.331±.115), where grade was also a significant predictor (p=.024, B=2.575±1.127). * [IDEC 2007] Reeder, K., Shapiro, J., & Wakefield, J. (2007, August 5-8). The effectiveness of speech recognition technology in promoting reading proficiency and attitudes for Canadian immigrant children. 15th European Conference on Reading, Humboldt University, Berlin. Click here for .ppsx Powerpoint presentation. Abstract: This paper reports on recently-completed Canadian trials of the Reading Tutor, a prototype program that uses advanced speech recognition technology to listen to children read aloud in English. When the program hears the reader experiencing difficulty, it offers help with the goal of enhancing reading fluency, and in turn, comprehension. We followed 62 Canadian immigrant children in grades 2-7, ages 8 – 13 in three multicultural western Canadian urban elementary schools for 4 to 7 months of daily, 20-minute sessions on the Reading Tutor. Our first goal was to determine the role of English language (L2) proficiency in any reading gains achieved, while controlling for participants’ differing amounts of practice with the software. Our second goal was to describe participants’ attitudes toward, and perceptions of the experience of using the Reading Tutor software. Participants were pre-tested for English language proficiency level and for reading proficiency. At the end of each school’s trial, children were post-tested for reading proficiency, including word recognition, word attack, and word and passage comprehension. The lowest of the three English language proficiency groups showed the strongest reading gains, and did so in ways that reflected specific features of their language development. To assess the attitudinal dimension, we administered a clinical interview to all participants at the conclusion of the trial. We describe children’s perceptions of how the program assisted them in their literate development. * [JECR 2007] Poulsen, R., Wiemer-Hastings, P., & Allbritton,
D. (2007). Tutoring Bilingual Students with an Automated Abstract: Children from non-English-speaking homes are doubly disadvantaged when learning English in school. They enter school with less prior knowledge of English sounds, word meanings, and sentence structure, and they get little or no reinforcement of their learning outside of the classroom. This article compares the classroom standard practice of sustained silent reading with the Project LISTEN Reading Tutor which uses automated speech recognition to "listen" to children read aloud, providing both spoken and graphical feedback. Previous research with the Reading Tutor has focused primarily on native speaking populations. In this study 34 Hispanic students spent one month in the classroom and one month using the Reading Tutor for 25 minutes per day. The Reading Tutor condition produced significant learning gains in several measures of fluency. Effect sizes ranged from 0.55 to 1.27. These dramatic results from a one-month treatment indicate this technology may have much to offer English language learners. [SLaTE 2007 ASL] Xu, L.,
Varadharajan, V., Maravich, J., Tongia, R., & Mostow, J. (2007, October
1-3). DeSIGN: An Intelligent Tutor to Teach American Sign Language.
SLaTE workshop on Speech and Language Technology for Education, ISCA Tutorial
and Research Workshop, The Summit Inn, Abstract: This paper presents the development of
DeSIGN, an educational software application for those deaf students who are
taught to communicate using American Sign Language (ASL). The software
reinforces English vocabulary and ASL signs by providing two essential
components of a tutor, lessons and tests. The current version was designed
for 5th and 6th graders, whose literacy skills lag by a grade or more on
average. In addition, a game that allows the students to be creative has been
integrated into the tests. Another
feature of DeSIGN is its ability to intelligently adapt its tests to the
changing knowledge of the student as determined by a knowledge tracing
algorithm. A separate interface for the teacher enables additions and
modifications to the content of the tutor and provides progress monitoring.
These dynamic aspects help motivate the students to use the software
repeatedly. This software prototype aims at a feasible and sustainable
approach to increase the participation of deaf people in society. DeSIGN has
undergone an iteration of testing and is currently in use at a school for the
deaf in [AIED
2007 motivation] Beck, J. E. (2007, July 9-13). Does learner control
affect learning? Proceedings of the 13th International Conference on Artificial
Intelligence in Education, Abstract: Many intelligent tutoring systems permit some degree of learner control. A natural question is whether the increased student engagement and motivation such control provides results in additional student learning. This paper uses a novel approach, learning decomposition, to investigate whether students do in fact learn more from a story they select to read than from a story the tutor selects for them. By analyzing 346 students reading approximately 6.9 million words, we have found that students learn approximately 25% more in stories they choose to read, even though from a purely pedagogical standpoint such stories may not be as appropriate as those chosen by the computer. Furthermore, we found that (for our instantiation of learner control) younger students may derive less benefit from learner control than older students, and girls derive less benefit than boys. [AIED 2007 comprehension] Zhang,
X., Mostow, J., & Beck, J. E. (2007, July 9-13). Can a Computer Listen
for Fluctuations in Abstract: The ability to detect fluctuation in students' comprehension of text would be very useful for many intelligent tutoring systems. The obvious solution of inserting comprehension questions is limited in its application because it interrupts the flow of reading. To investigate whether we can detect comprehension fluctuations simply by observing the reading process itself, we developed a statistical model of 7805 responses by 289 children in grades 1-4 to multiple-choice comprehension questions in Project LISTEN's Reading Tutor, which listens to children read aloud and helps them learn to read. Machine-observable features of students' reading behavior turned out to be statistically significant predictors of their performance on individual questions. [EDM 2007 LFA transfer] Leszczenski, J. M., & Beck, J. E. (2007, July 9). What’s in a word? Extending learning factors analysis to modeling reading transfer. Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 31-39. Click here for .pdf file. Abstract: Learning Factors Analysis (LFA) has been proposed as a generic solution to evaluate and compare cognitive models of learning [1]. By performing a heuristic search over a space of statistical models, the researcher may evaluate different cognitive representations of a set of skills. We introduce a scalable application of this framework in the context of transfer in reading and demonstrate it upon Reading Tutor data. Using an assumption of a word-level model of learning as a baseline, we apply LFA to determine whether a representation with fewer word independencies will produce a better fit for student learning data. Specifically, we show that representing some groups of words as their common root leads to a better fitting model of student knowledge, indicating that this representation offers more information than merely viewing words as independent, atomic skills. In addition, we demonstrate an approximation to LFA which allows it to scale tractably to large datasets. We find that using a word root-based model of learning leads to an improved model fit, suggesting students make use of this information in their representation of words. Additionally, we present evidence based on both model fit and learning rate relationships that low proficiency students tend to exhibit a lesser degree of transfer through the word root representation than higher proficiency students. [EDM 2007 LD transfer] Zhang, X.,
Mostow, J., & Beck, J. E. (2007, July 9). All in the (word)
family: Using learning decomposition
to estimate transfer between skills in a Abstract: In this paper, we use the method of learning decomposition to study students’ mental representations of English words. Specifically, we investigate whether practice on a word transfers to similar words. We focus on the case where similar words share the same root (e.g., “dog” and “dogs”). Our data comes from Project LISTEN’s Reading Tutor during the 2003—2004 school year, and includes 6,213,289 words read by 650 students. We analyze the distribution of transfer effects across students, and identify factors that predict the amount of transfer. The results support some of our hypotheses about learning, e.g., the transfer effect from practice on similar words is greater for proficient readers than for poor readers. More significant than these empirical findings, however, is the novel analytic approach to measure transfer effects. [EDM 2007 Dirichlet] Beck, J. E. (2007, July 9). Difficulties in inferring student knowledge from observations (and why you should care). Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 21-30. Click here for .pdf file. Abstract: Student modeling has a long history in the field of intelligent educational software and is the basis for many tutorial decisions. Furthermore, the task of assessing a student’s level of knowledge is a basic building block in the educational data mining process. If we cannot estimate what students know, it is difficult to perform fine-grained analyses to see if a system’s teaching actions are having a positive effect. In this paper, we demonstrate that there are several unaddressed problems with student model construction that negatively affect the inferences we can make. We present two partial solutions to these problems, using Expectation Maximization to estimate parameters and using Dirichlet priors to bias the model fit procedure. Aside from reliably improving model fit in predictive accuracy, these approaches might result in model parameters that are more plausible. Although parameter plausibility is difficult to quantify, we discuss some guidelines and propose a derived measure of predicted number of trials until mastery as a method for evaluating model parameters. [UM 2007] Beck, J. E., & Chang,
K.-m. (2007, June 25-29). Identifiability: A Fundamental Problem of
Student Modeling. Proceedings of the
11th International Conference on User Modeling (UM 2007), Abstract: In this paper we show how model identifiability is an issue for student modeling: observed student performance corresponds to an infinite family of possible model parameter estimates, all of which make identical predictions about student performance. However, these parameter estimates make different claims, some of which are clearly incorrect, about the student’s unobservable internal knowledge. We propose methods for evaluating these models to find ones that are more plausible. Specifically, we present an approach using Dirichlet priors to bias model search that results in a statistically reliable improvement in predictive accuracy (AUC of 0.620 ± 0.002 vs. 0.614 ± 0.002). Furthermore, the parameters associated with this model provide more plausible estimates of student learning, and better track with known properties of students’ background knowledge. The main conclusion is that prior beliefs are necessary to bias the student modeling search, and even large quantities of performance data alone are insufficient to properly estimate the model. [ICASSP 2007] Anumanchipalli, G. K.,
Ravishankar, M., & Reddy, R. (2007, April 15-20). Improving
Pronunciation Inference Using N-Best List, Acoustics and Orthography.
Proc. 32nd IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), Abstract: In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and phonotactic priors to rescore the hypotheses and refine the baseforms. [IERI 2007] Mostow, J., & Beck, J. (2007).
When the Rubber Meets the Road: Lessons
from the In-School Adventures of an Automated Abstract:
Project LISTEN's Reading Tutor (www.cs.cmu.edu/~listen) uses automatic
speech recognition to listen to children read aloud, and helps them learn to read.
Its experimental deployment in schools has expanded from a single computer
used by eight third graders in one school in 1996 to two hundred computers
used by children in grades 1-3 in nine schools in 2003. This project
illustrates how technology can not just scale up an intervention, but
instrument its implementation. For example, analysis of 2002-2003 usage
showed that session frequency and duration averaged significantly higher in
lab settings than in classrooms. [ICSLP2006] Mostow, J. (2006, September 17-21). Is ASR accurate enough for automated reading tutors, and how can we tell? Ninth International Conference on Spoken Language Processing (Interspeech 2006 — ICSLP), Pittsburgh, PA, 837-840. Click here for .pdf file. Abstract: We discuss pros and cons of several ways to evaluate ASR accuracy in automated tutors that listen to students read aloud. Whether ASR is accurate enough for a particular reading tutor function depends on what ASR-based judgment it requires, the visibility of that judgment to students and teachers, and the amount of input speech on which it is based. How to tell depends on the purpose, criterion, and space of the evaluation. [AAAI2006 help] Chang, K., Beck, J. E., Mostow, J., &
Corbett, A. (2006, July 17). Does Help Help? A Bayes Net Approach to Modeling Tutor
Interventions. AAAI2006 Workshop on Educational Data Mining, Abstract: This paper describes an effort to measure the effectiveness of tutor help in an intelligent tutoring system. Conventional pre- and post- test experimental methods can determine whether help is effective but are expensive to conduct. Furthermore, a pre and post- test methodology ignores a source of information: students request help about words they do not know. Therefore, we propose a dynamic Bayes net (which we call the help model) that models tutor help and student knowledge in one coherent framework. The help model distinguishes two different effects of help: scaffolding immediate performance vs. teaching persistent knowledge that improves long term performance. We train the help model to fit the student performance data gathered from usage of Reading Tutor. The parameters of the trained model suggest that students benefit from both the scaffolding and teaching effects of help. Thus, our framework is able to distinguish two types of influence that help has on the student, and can determine whether help helps learning without an explicit controlled study. [SSSR2006
cloze] Hensler, B. S., & Beck, J. (2006, July 6-8). Are all
questions created equal? Factors that
influence cloze question difficulty. Thirteenth Annual Meeting of the
Society for the Scientific Study of Abstract: The multiple choice cloze (MCC)
assessment methodology is widely used in assessing reading comprehension;
therefore an improved scoring methodology would have broad impact within the
reading research community. We have
constructed an MCC question model that simultaneously estimates the student's
comprehension proficiency and the impact of various terms on MCC difficulty.
To build the model, we analyzed 16,161 MCC question responses that were
administered by a computer reading tutor over the course of a school
year. Participants were 373 students
in grades 1 through 6 (ages 5-12) in urban and suburban public schools in To develop our model of MCC difficulty, we used multinomial logistic regression to calculate the relative impact of a number of factors. Our model includes the location of the deleted target word within the sentence and question length as covariates. As factors, we used student identity, reaction time (rounded to the nearest second) and level of difficulty of the target word. We hypothesized that more proficient readers would use syntactic cues while less proficient readers would not. To add syntax to the model, we used the TreeTagger part of speech tagger to annotate the part of speech of the correct answer for each cloze question. We then computed how many of the distractors could have the same part of speech as the answer. Presumably questions with many distractors able to take on the same part of speech as the answer would be harder.
After training the model on our 16,161 MCC questions, there were two main findings. First, our model found that students who had a second grade reading proficiency (as measured by Woodcock Reading Comprehension Cluster) or higher were sensitive to how many of the possible responses could take on the same part of speech as the correct answer (p= 0.002) for the cloze sentence, while students below second grade proficiency were insensitive to this term (p=0.467). This result suggests that students' syntactic awareness, at least within the context of MCC questions, begins at around the second grade. The second main finding was the degree of correlation of each student's Beta parameter, the model's estimate of her ability to answer MCC questions, with her associated Woodcock test score. The mean within-grade correlation between Beta and the Reading Comprehension Cluster score was 0.69, a very strong fit. [SSSR2006
fluency] Mostow, J. and J. Beck (2006, July 6-8). Refined micro-analysis of fluency gains in a Abstract: Our SSSR2005 talk presented a linear model of speedup in word reading between successive encounters in connected text, based on a quarter of a million such encounters. The model indicated that reading a word in a new context contributed more to speedup than re-encountering it in an old context, implying that wide reading builds fluency more than rereading. Our new, improved model uses a growth curve to model word reading time as a function of the number and types of encounters of the word. This approach lets us estimate -- both overall and at different reading levels -- the relative value of encountering a word in a new context versus an old one, and for the first time on a given day versus subsequently. [ITS2006
gaming] Baker, R. S. J. d., Corbett, A. T., Koedinger, K. R., Evenson,
S., Roll, I., Wagner, A. Z., Naim, M., Raspat, J., Baker, D. J., & Beck,
J. E. (2006, June 26-30). Adapting to When Students Game an Intelligent
Tutoring System. Proceedings of the 8th International Conference on
Intelligent Tutoring Systems, Abstract: It has been found in recent years that many students who use intelligent tutoring systems game the system, attempting to succeed in the educational environment by exploiting properties of the system rather than by learning the material and trying to use that knowledge to answer correctly. In this paper, we introduce a system which gives a gaming student supplementary exercises focused on exactly the material the student bypassed by gaming, and which also expresses negative emotion to gaming students through an animated agent. Students using this system engage in less gaming, and students who receive many supplemental exercises have considerably better learning than is associated with gaming in the control condition or prior studies. [ITS2006 BNT-SM] Chang, K., Beck, J., Mostow, J., & Corbett, A. (2006, June 26-30). A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 104-113. Click here for .pdf file. Abstract: This paper describes an effort to model a student’s changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowledge. Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNTSM. BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox [1]. Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) [2], an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it estimates parameters. [ITS2006
cloze] Hensler, B. S., & Beck, J. (2006, June 26-30). Better
student assessing by finding difficulty factors in a fully automated comprehension
measure. Proceedings of the 8th International Conference on Intelligent
Tutoring Systems, Jhongli, Taiwan, 21-30.
Nominated for Best Paper. Click here for .pdf file. Abstract: The multiple choice cloze (MCC) question format is commonly used to assess students' comprehension. It is an especially useful format for ITS because it is fully automatable and can be used on any text. Unfortunately, very little is known about the factors that influence MCC question difficulty and student performance on such questions. In order to better understand student performance on MCC questions, we developed a model of MCC questions. Our model shows that the difficulty of the answer and the student’s response time are the most important predictors of student performance. In addition to showing the relative impact of the terms in our model, our model provides evidence of a developmental trend in syntactic awareness beginning around the 2nd grade. Our model also accounts for 10% more variance in students’ external test scores compared to the standard scoring method for MCC questions. [ITS2006
vocabulary] Heiner, C., Beck, J., & Mostow, J. (2006, June 26-30). Automated
Vocabulary Instruction in a Abstract: This paper presents a within-subject, randomized experiment to compare automated interventions for teaching vocabulary to young readers using Project LISTEN's Reading Tutor. The experiment compared three conditions: no explicit instruction, a quick definition, and a quick definition plus a post-story battery of extended instruction based on a published instructional sequence for human teachers. A month long study with elementary school children indicates that the quick instruction which lasts about seven seconds has immediate effects on learning gains that did not persist. Extended instruction which lasted about thirty seconds longer than the quick instruction had a persistent effect and produced gains on a posttest one week later. [ITS2006
decomposition] Beck, J. (2006, June 26). Using learning decomposition
to analyze student fluency development. ITS2006 Educational Data Mining
Workshop, Abstract: This paper introduces an approach called learning decomposition to analyze what types of practice are most effective for helping students learn a skill. The approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other. We are able to show that different types of practice differ reliably in how quickly students acquire the skill of reading words quickly and accurately. Specifically, massed practice is generally not effective for helping students learn words, but may be acceptable for less proficient readers. Rereading the same story is generally not as effective as reading a variety of stories, but might be beneficial for more proficient readers. [JNLE2006] Mostow, J. and J. Beck (2006). Some useful tactics to modify, map, and mine data from intelligent tutors. Natural Language Engineering (Special Issue on Educational Applications) 12(2),195-208. © 2006 Cambridge University Press. Click here for .pdf file. Abstract: Mining data logged by intelligent tutoring systems has the potential to discover information of value to students, teachers, authors, developers, researchers, and the tutors themselves -- information that could make education dramatically more effcient, effective, and responsive to individual needs. We factor this discovery process into tactics to modify tutors, map heterogeneous event streams into tabular data sets, and mine them. This model and the tactics identified mark out a roadmap for the emerging area of tutorial data mining, and may provide a useful vocabulary and framework for characterizing past, current, and future work in this area. We illustrate this framework using experiments that tested interventions by an automated reading tutor to help children decode words and comprehend stories. [IJAIED2006] Beck, J. E., & Sison, J. (2006). Using knowledge tracing in a noisy environment to measure student reading proficiencies. International Journal of Artificial Intelligence in Education, 16, 129-143. (In Special “Best of ITS 2004” Issue.) Click here for .pdf file. Abstract: Constructing a student model for language tutors is a challenging task. This paper describes using knowledge tracing to construct a student model of reading proficiency and validates the model. We use speech recognition to assess a student’s reading proficiency at a subword level, even though the speech recognizer output is at the level of words and is statistically noisy. Specifically, we estimate the student’s knowledge of 80 letter to sound mappings, such as ch making the sound /K/ in “chemistry.” At a coarse level, the student model did a better job at estimating reading proficiency for 47.2% of the students than did a standardized test designed for the task. Although not quite as strong as the standardized test, our assessment method can provide a report on the student at any time during the year and requires no break from reading to administer. Our model’s estimate of the student’s knowledge on individual letter to sound mappings is a significant predictor of whether he will ask for help on a particular word. Thus, our student model is able to describe student performance both at a coarse- and at a fine-grain size. [AIED2005 event] Mostow,
J., Beck, J., Cen, H., Gouvea, E., & Heiner, C. (2005, July). Interactive
Demonstration of a Generic Tool to Browse Tutor-Student Interactions.
Interactive Events Proceedings of the 12th International Conference on
Artificial Intelligence in Education (AIED 2005), Abstract: Project LISTEN's Session Browser is a generic tool to browse a database of students' interactions with an automated tutor. Using databases logged by Project LISTEN's Reading Tutor, we illustrate how to specify phenomena to investigate, explore events and the context where they occurred, dynamically drill down and adjust which details to display, and summarize events in human-understandable form. The tool should apply to MySQL databases from other tutors as well. [AIED2005
browser] Mostow, J., Beck, J., Abstract: A basic question in mining data from an intelligent tutoring system is, "What happened when…?" A generic tool to answer such questions should let the user specify which phenomenon to explore; explore selected events and the context in which they occurred; and require minimal effort to adapt the tool to new versions, to new users, or to other tutors. We describe an implemented tool and how it meets these requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It infers the implicit hierarchical structure of tutorial interaction so humans can browse it. A companion paper [1] illustrates the use of this tool to explore data from Project LISTEN's automated Reading Tutor. [AIED2005
interruption] Heiner, C., Beck, J., & Mostow, J. (2005, July 18-22). When do students interrupt help? Effects of individual differences. Proceedings
of the 12th International Conference on Artificial Intelligence in Education
(AIED 2005),
Abstract. When do students interrupt help to request different help? To study this question, we analyze a within-subject experiment in the 2003-2004 version of Project LISTEN's Reading Tutor. From 168,983 trials of this experiment, we report patterns in when students choose to interrupt help. To improve model fit for individual data, we adjust our model to account for individual differences. We report small but significant correlations between a student parameter in our model and gender as well as external measures of motivation and academic performance. [AIED2005
engagement] Beck, J. (2005, July 18-22). Engagement tracing: using
response times to model student disengagement. Proceedings of the 12th International Conference on Artificial
Intelligence in Education (AIED 2005), Abstract: Time on task is an important predictor for how much students learn. However, students must be focused on the learning for the time invested to be productive. Unfortunately, students do not always try their hardest to solve problems presented by computer tutors. This paper explores student disengagement and proposes an approach, engagement tracing, for detecting whether a student is engaged in answering questions. This model is based on item response theory, and uses as input the difficulty of the question, how long the student took to respond, and whether the response was correct. From these data, the model determines the probability a student was actively engaged in trying to answer the question. The model has a reliability of 0.95, and its estimate of student engagement correlates at 0.25 with student gains on external tests. Finally, the model is sensitive enough to detect variations in student engagement within a single tutoring session. The novel aspect of this work is that it requires only data normally collected by a computer tutor, and the affective model is validated against student performance on an external measure. [AIED2005 ASR] Beck, J. E., Chang, K., Mostow, J., & Corbett,
A. (2005, July 19). Using a student
model to improve a computer tutor's speech recognition. Proceedings of the AIED 05 Workshop on
Student Modeling for Language Tutors, 12th International Conference on
Artificial Intelligence in Education, Abstract: Intelligent computer tutors can derive much of their power from having a student model that describes the learner’s competencies. However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input. This paper reports using ASR output from a computer tutor for reading to compare two models of how students learn to read words: a model that assumes students learn words as whole-unit chunks, and a model that assumes students learn the individual letteràsound mappings that make up words. We use the data collected by the ASR to show that a model of letteràsound mappings better describes student performance. We then compare using the student model and the ASR, both alone and in combination, to predict which words the student will read correctly, as scored by a human transcriber. Surprisingly, majority class has a higher classification accuracy than the ASR. However, we demonstrate that the ASR output still has useful information, and that classification accuracy is not a good metric for this task, and the Area Under Curve (AUC) of ROC curves is a superior scoring method. The AUC of the student model is statistically reliably better (0.670 vs. 0.550) than that of the ASR, which in turn is reliably better than majority class. These results show that ASR can be used to compare theories of how students learn to read words, and modeling individual learner’s proficiencies may enable improved speech recognition. [AIED 2005
model] Chang, K.., Beck, J. E., Mostow, J., & Corbett, A. (2005, July
19). Using speech recognition to evaluate two student models for a reading
tutor. Proceedings of the AIED 05
Workshop on Student Modeling for Language Tutors, 12th International
Conference on Artificial Intelligence in Education,
Abstract: Intelligent
Tutoring Systems derive much of their power from having a student model that
describes the learner's competencies. However, constructing a student model
is challenging for computer tutors that use automated speech recognition
(ASR) as input, due to inherent inaccuracies in ASR. We describe two
extremely simplified models of developing word decoding skills and explore
whether there is sufficient information in ASR output to determine which
model fits student performance better, and under what circumstances one model
is preferable to another. The two models
that we describe are a lexical model that assumes students learn words as
whole-unit chunks, and a grapheme-to-phoneme (G-to-P) model that assumes
students learn the individual letter-to-sound mappings that compose the
words. We use the data collected by the ASR to show that the G-to-P model
better describes student performance than the lexical model. We then
determine which model performs better under what conditions. On one hand, the
G-to-P model better correlates with student performance data when the student
is older or when the word is more difficult to read or spell. On the other
hand, the lexical model better correlates with student performance data when the
student has seen the word more times. [AAAI 2005 workshop] Beck, J. (Ed.). (2005, July 10). Proceedings of the AAAI2005 Workshop on Educational Data Mining. Pittsburgh, PA. [AAAI2005
browser] Mostow, J., Beck, J., Cen, H., Abstract: A basic question in mining data from an intelligent tutoring system is, "What happened when…?" We identify requirements for a tool to help answer such questions by finding occurrences of specified phenomena and browsing them in human-understandable form. We describe an implemented tool and how it meets the requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It automatically computes and displays the temporal hierarchy implicit in this representation. We illustrate the use of this tool to mine data from Project LISTEN's automated Reading Tutor. [AAAI2005
usage] Abstract: Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning pages (over their whole course experience) was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty. The page itself was highly predictive, but so was the average time spent on learning pages in a given session. This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits. We also consider the average time spent on learning pages as a function of the time of semester. Students spent less time on pages later in the semester, even for more demanding material. [SSSR 2005] Mostow, J., & Beck,
J. (2005). Micro-analysis of fluency gains in a Reading Tutor that
listens: Wide vs. repeated guided oral
reading. Talk at Twelfth Annual
Meeting of the Society for the Scientific Study of Abstract: Fluency growth is essential but imperfectly understood. By using automatic speech recognition to listen to children read aloud, Project LISTEN's Reading Tutor provides a novel instrument to study fluency development. During the 2002-2003 school year, hundreds of children in grades 1-4 used the Reading Tutor, which recorded them reading millions of words of text. The latency preceding each word reflects the reader’s cognitive effort to identify the word. Using automatic speech recognition to analyze latency changes between successive encounters of words in the same or different contexts provides new data about how fluency grows. * [Toronto 2005] Cunningham, T., & Geva, E. (2005, June 24).
The effects of reading technologies on literacy development of ESL students
[poster presentation]. Twelfth Annual
Meeting of the Society for the Scientific Study of *
[UBC 2005] Reeder, K., Early, M., Kendrick, M., Shapiro, J., & [AERA 2005]
Beck, J. E., & Mostow, J. (2005). Mining Data from Randomized
Within-Subject Experiments in an Automated Reading Tutor (poster in session
34.080, "Logging Students' Learning in Complex Domains: Empirical Considerations and Technological
Solutions"). American Educational Research Association 2005 Annual
Meeting: Demography and Democracy in
the Era of Accountability, Abstract: Experiments embedded in the Reading Tutor help evaluate its decisions in tutoring decoding, vocabulary, and comprehension.
Abstract:
This study looked at factors influencing teachers’ perception and usage of
Project LISTEN’s Reading Tutor, a computerized tutor used with elementary
students in 9 classroom-based, 10 computer lab-based, and 3 specialist-room
school settings. Thirteen interviews and 22 survey responses (of a
possible 28 teachers) examined teachers’ perception of the Reading Tutor and
suggested that teachers’ belief in the Tutor influenced their usage of it (r
= .46, p < .03). Three factors seemed to influence teacher belief:
1) perceived ease of use (r = .52, p < .01), 2) teachers’ reported
experience with computers (r = .41, p < .04) and instructional technology (r
= .48, p < .03), and 3) perceived technical problems such as frequency of
technical problems (r = -.44, p < .04) and speed with which problems were
fixed (r = .49, p < .02). Analysis of these factors suggested four
themes that cut-across factors and seem to influence the way teachers
evaluate and use the Reading Tutor – the technology’s degree of convenience,
competition from other educational priorities and practices, teacher
experience and/or interest with technology, and data available to teachers and
the way teachers prioritize that data. These results suggest that
improving convenience of the Reading Tutor, instituting specialized training
programs, and improving feedback mechanisms for teachers by providing
relevant, situated data may influence teacher belief in the Reading Tutor and
thereby increase teacher usage. This study contributes to current
literature on educational technology usage by supporting previous literature
suggesting that teacher belief in the importance of a technology influences
their use of it. One unique feature of this study is that is uses both
quantitative and qualitative methods to look at the research questions from
two different research perspectives.
Abstract:
A two-month pilot study comprised of 34 second through fourth grade Hispanic
students from four bilingual education classrooms was conducted to compare
the efficacy of the 2004 version of the Project LISTEN Reading Tutor against
the standard practice of sustained silent reading (SSR). The Reading
Tutor uses automated speech recognition to listen to children read
aloud. It provides both spoken and graphical feedback in order to
assist the children with the oral reading task. Prior research with
this software has demonstrated its efficacy within populations of native
English speakers. This study was undertaken to obtain some initial
indication as to whether the tutor would also be effective within a
population of English language learners. The study
employed a crossover design where each participant spent one month in each of
the treatment conditions. The experimental treatment consisted of 25
minutes per day using the Reading Tutor within a small pullout lab
setting. Control treatment consisted of the students who remained in
the classroom where they participated in established reading instruction
activities. Dependent variables consisted of the school districts
curriculum based measures for fluency, sight word recognition and
comprehension. The Reading Tutor
group out-gained the control group in every measure during both halves of the
crossover experiment. Within subject results from a paired T-Test
indicate these gains were significant for one sight word measure (p = .056)
and both fluency measures (p < .001). Effect sizes were 0.55 for
timed sight words, a robust 1.16 for total fluency and an even larger 1.27
for fluency controlled for word accuracy. These dramatic results
observed during a one-month treatment indicate this technology may have much
to offer English language learners.
Abstract:
We describe the automated generation and use of 69,326 comprehension cloze
questions and 5,668 vocabulary matching questions in the 2001-2002 version of
Project LISTEN's Reading Tutor used by 364 students in grades 1-9 at seven
schools. To validate our methods, we used students' performance on
these multiple-choice questions to predict their scores on the Woodcock
Reading Mastery Test. A model based on students' cloze performance
predicted their Passage Comprehension scores with correlation R=.85.
The percentage of vocabulary words that students matched correctly to their
definitions predicted their Word Comprehension scores with correlation R=.61.
We used both
types of questions in a within-subject automated experiment to compare four
ways to preview new vocabulary before a story - defining the word, giving a
synonym, asking about the word, and doing nothing. Outcomes included
comprehension as measured by performance on multiple-choice cloze questions
during the story, and vocabulary as measured by matching words to their
definitions in a posttest after the story. A synonym or short
definition significantly improved posttest performance compared to just
encountering the word in the story - but only for words students didn't
already know, and only if they had a grade 4 or better vocabulary. Such
a preview significantly improved performance during the story on cloze
questions involving the previewed word - but only for students with a grade
1-3 vocabulary. [TICL fluency] Beck, J. E., Jia, P., & Mostow, J. (2004). Automatically assessing oral reading fluency in a computer tutor that listens. Technology, Instruction, Cognition and Learning, 2, 61-81. Click here to download .pdf file. Abstract:
Much of the power of a computer tutor comes from its ability to assess
students. In some domains, including oral reading, assessing the
proficiency of a student is a challenging task for a computer. Our
approach for assessing student reading proficiency is to use data that a
computer tutor collects through its interactions with a student to estimate
his performance on a human-administered test of oral reading fluency.
A model with data collected from the tutor's speech recognizer output
correlated, within-grade, at 0.78 on average with student performance on the
fluency test. For assessing students, data from the speech recognizer
were more useful than student help-seeking behavior. However, adding
help-seeking behavior increased the average within-grade correlation to
0.83. These results show that speech recognition is a powerful source
of data about student performance, particularly for reading. [ITS 2004 tracing] Beck, J. E., & Sison, J. (2004, September 1-3). Using knowledge tracing to measure student reading proficiencies. Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 624-634. Maceio, Brazil. (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html. Click here to download .pdf file. Abstract:
Constructing a student model for language tutors is a challenging task.
This paper describes using knowledge tracing to construct a student model of
reading proficiency and validates the model. We use speech recognition
to assess a student’s reading proficiency at a subword level, even though the
speech recognizer output is at the level of words. Specifically we
estimate the student’s knowledge of 80 letter to sound mappings, such as ch
making the sound /K/ in “chemistry.” At a coarse level, the student
model did a better job at estimating reading proficiency for 47.2% of the
students than did a standardized test designed for the task. Our model’s
estimate of the student’s knowledge on individual letter to sound mappings is
a significant predictor in whether he will ask for help on a particular
word. Thus, our student model is able to describe student performance
both at a coarse- and at a fine-grain size. [ITS 2004 questions] Beck, J. E., Mostow, J., & Bey, J. (2004, September 1-3). Can automated questions scaffold children's reading comprehension? Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 478-490. Maceio, Brazil. (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html. Click here to download .pdf file. Abstract:
Can automatically generated questions scaffold reading comprehension?
We automated three kinds of multiple-choice questions in children’s assisted
reading: A within-subject
experiment in the spring 2003 version of Project LISTEN’s Reading Tutor
randomly inserted all three kinds of questions during stories as it helped
children read them. To compare their effects on story-specific
comprehension, we analyzed 15,196 subsequent cloze test responses by 404
children in grades 1-4. [ITS 2004 disengagement] Beck, J. E. (2004, August
31). Using response times to model student disengagement. Proceedings of
the ITS2004 Workshop on Social and Emotional Intelligence in Learning
Environments, Abstract:
Time on task is an important variable for learning a skill. However,
learners must be focused on the learning for the time invested to be
productive. Unfortunately, students do not always try their hardest to
solve problems presented by computer tutors. This paper explores
student disengagement and proposes a model for detecting whether a student is
engaged in answering questions. This model is based on item response
theory, and uses as input the difficulty of the question, how long the
student took to respond, and whether the response was correct. From
these data, the model determines the probability a student was actively
engaged in trying to answer the question. To validate our model, we
analyze 231 students’ interactions with the 2002-2003 version of the Reading
Tutor. We show that disengagement is better modeled by simultaneously
estimating student proficiency and disengagement than just estimating
disengagement alone. Our best model of disengagement has a correlation
of -0.25 with student learning gains. The novel aspect of this work is
that it requires only data normally collected by a computer tutor, and the
affective model is validated against student performance on an external
measure. [ITS 2004 mining] Mostow, J. (2004, August 30).
Some useful design tactics for mining ITS data. Proceedings of the ITS2004
Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational
Outcomes, Maceió, Abstract:
Mining data logged by intelligent tutoring systems has the potential to
reveal valuable discoveries. What characteristics make such data
conducive to mining? What variables are informative to compute?
Based on our experience in mining data from Project LISTEN’s Reading Tutor,
we discuss how to collect machine-analyzable data and formulate it into
experimental trials. The resulting concepts and tactics mark out a
roadmap for the emerging area of tutorial data mining, and may provide a
useful vocabulary and framework for characterizing past, current, and future
work in this area. [ITS 2004 lessons] Heiner, C., Beck, J., &
Mostow, J. (2004, August 30). Lessons on using ITS data to answer educational
research questions. Proceedings of the ITS2004 Workshop on Analyzing
Student-Tutor Interaction Logs to Improve Educational Outcomes, Abstract:
Some tutoring system projects have completed empirical studies of
student-tutor interaction by manually collecting data while observing fewer
than a hundred students. Analyzing larger, automatically collected data
sets requires new methods to address new problems. We share lessons on
design, analysis, presentation, and iteration. Our lessons are based on
our experience analyzing data from Project LISTEN’s Reading Tutor, which
automatically collected tutorial data from hundreds of students. We
hope that these lessons will help guide analysis of similar datasets from
other intelligent tutoring systems. [ACL 2004 keynote] Mostow, J. (2004, July 22). If I
Have a Hammer: Computational Linguistics in a Abstract:
Project LISTEN’s Reading Tutor uses speech recognition to listen to children
read aloud, and helps them learn to read, as evidenced by rigorous
evaluations of pre- to posttest gains compared to various controls. In
the 2003-2004 school year, children ages 5-14 used the Reading Tutor daily at
school on over 200 computers, logging over 50,000 sessions, 1.5 million
tutorial responses, and 10 million words. This talk uses
the Reading Tutor to illustrate the diverse roles that computational
linguistics can play in an intelligent tutor: A recurring theme
is the use of “big data” to train such models automatically. [SSSR 2004 help] Mostow, J., Beck, J. E., &
Heiner, C. (2004). Which Help Helps? Effects of Various Types of Help
on Word Learning in an Automated Abstract:
When a tutor gives help on a word during assisted oral reading, how does the
type of help matter? We report an automated, within-subject,
randomized-trial experiment embedded in Project LISTEN's Reading Tutor.
Hundreds of children (mostly in grades 1-3) used the Reading Tutor in 2002-2003,
reading millions of words and getting help on hundreds of thousands of
them. The experimental variable was the type of help, selected
randomly by the Reading Tutor whenever it gave help on a word. The
outcome variable was student performance on the next encounter of the
word. We compare effects of several types of help. [SSSR 2004 interventions] Beck, J. E., Sison, J.,
& Mostow, J. (2004, June 27-30). Using automated speech recognition to
measure scaffolding and learning effects of word identification interventions
in a computer tutor that listens. Eleventh Annual Meeting of the Society for
the Scientific Study of Abstract:
Does it help to provide brief word identification assistance to
students? On words they encounter soon afterwards? Does brief
assistance lead to long-term learning gains? Which types of assistance
are best? We have explored these questions using automated experiments
in a computer tutor for reading that listens. We examine data
from 300 students, mostly in grades 1 through 3. The major results were
a definite scaffolding effect in student performance on the same day as they
were given assistance. However, although there was a slight improvement
in longer-term performance, the difference was not statistically significant. [ICALL 2004] Heiner, C., Beck, J. E., & Mostow,
J. (2004, June 17-19). Improving the Help Selection Policy in a Abstract:
What type of oral reading assistance is most effective for a given student on
a given word? We analyze 189,039 randomized trials of a within-subject
experiment to compare the effects of several types of help in the 2002-2003
version of Project LISTEN’s Reading Tutor. The independent variable is
the type of help given on a word. The outcome variable is the student’s
performance at the next encounter of that word, as measured by automatic
speech recognition. Training a help selection policy sensitive to student or
word level improves this outcome by a projected 4% – a substantial effect for
picking a single better intervention. [CALICO 2004] Beck, J. E., & Sison, J. (2004,
June 8-12). Automated student assessment in language tutors. CALICO, Abstract:
The Reading Tutor is a computer tutor that uses Automated Speech Recognition
(ASR) technology to listen to children read aloud and helps them learn how to
read. The research reported here uses ASR output to predict students'
GORT fluency posttest scores. Using a linear regression model, we achieved
correlations of over .80 for predicting first through fourth graders'
performance. Our model's predictive ability is on par with standard public
school reading assessment measures. This work contributes to a better
understanding of automated student assessment in language tutors and
introduces methods for accounting for noisy ASR output. [IJAIE 2004] Murray, R. C., VanLehn, K., & Mostow, J. (2004). Looking Ahead to Select Tutorial Actions: A Decision-Theoretic Approach. International Journal of Artificial Intelligence in Education, 14, 235-278. Download paper as .pdf file. Abstract: We propose and evaluate a decision-theoretic approach for selecting tutorial actions by looking ahead to anticipate their effects on the student and other aspects of the tutorial state. The approach uses a dynamic decision network to consider the tutor’s uncertain beliefs and objectives in adapting to and managing the changing tutorial state. Prototype action selection engines for diverse domains – calculus and elementary reading – illustrate the approach. These applications employ a rich model of the tutorial state, including attributes such as the student’s knowledge, focus of attention, affective state, and next action(s), along with task progress and the discourse state. Our action selection engines have not yet been integrated into complete ITSs (this is the focus of future work), so we use simulated students to evaluate their capability to select rational tutorial actions that emulate the behaviors of human tutors. We also evaluate their capability to select tutorial actions quickly enough for real-world tutoring applications. [ICAAI 2003] Banerjee, S., Mostow, J., Beck, J.,
& Tam, W. (2003, December 15-16). Improving Language Models by Learning
from Speech Recognition Errors in a Abstract: Lowering the perplexity of a language model does not always translate into higher speech recognition accuracy. Our goal is to improve language models by learning from speech recognition errors. In this paper we present an algorithm that first learns to predict which n-grams are likely to increase recognition errors, and then uses that prediction to improve language models so that the errors are reduced. We show that our algorithm reduces a measure of tracking error by more than 24% on unseen test data from a Reading Tutor that listens to children read aloud. [CSMP 2003] Mostow, J., & Beck, J. (2003,
November 3-4). When the Rubber Meets the Road: Lessons from the
In-School Adventures of an Automated Abstract: Project LISTEN's Reading Tutor (www.cs.cmu.edu/~listen) uses automatic speech recognition to listen to children read aloud, and helps them learn to read. Its experimental deployment in schools has expanded from a single computer used by eight third graders in one school in 1996 to two hundred computers used by children in grades 1-3 in nine schools in 2003. This project illustrates how technology can not just scale up an intervention, but instrument its implementation. For example, analysis of 2002-2003 usage showed that session frequency and duration averaged significantly higher in lab settings than in classrooms.
Abstract: This paper extends and evaluates previously published methods for predicting likely miscues in children's oral reading in a Reading Tutor that listens. The goal is to improve the speech recognizer's ability to detect miscues but limit the number of "false alarms" (correctly read words misclassified as incorrect). The "rote" method listens for specific miscues from a training corpus. The "extrapolative" method generalizes to predict other miscues on other words. We construct and evaluate a scheme that combines our rote and extrapolative models. This combined approach reduced false alarms by 0.52% absolute (12% relative) while simultaneously improving miscue detection by 1.04% absolute (4.2% relative) over our existing miscue prediction scheme.
Abstract: One issue in a Reading Tutor that listens is to determine which words the student read correctly. We describe a confidence measure that uses a variety of features to estimate the probability that a word was read correctly. We trained two decision tree classifiers. The first classifier tries to fix insertion and substitution errors made by the speech decoder, while the second classifier tries to fix deletion errors. By applying the two classifiers together, we achieved a relative reduction in false alarm rate by 25.89% while holding the miscue detection rate constant.
Abstract: We present an automated method to ask children questions during assisted reading, and experimentally evaluate its effects on their comprehension. In 2002, after a randomly inserted generic multiple-choice What/Where/When question, children were likelier to correctly answer an automatically generated comprehension question on a later sentence. The positive effects of such questions vanished during the second half of the study in 2003. We hypothesize why.
Abstract: This interactive event demonstrates various aspects of Project LISTEN’s Reading Tutor, which listens to children read aloud, and helps them learn to read.
Abstract: A 2002 Wizard of Oz study showed that emotional scaffolding provided by a human significantly increased children’s persistence in an automated Reading Tutor, as measured by the number of tasks they chose to undertake. We report a 5,965-trial experiment to test a simple automated form of such scaffolding, compared to a control condition without it. 348 children in grades K-4 spent significantly longer per task in the experimental condition due to a design flaw, yet still averaged equal numbers of tasks in both conditions. We theorize that they subjectively gauged effort in terms of number of tasks rather than number or duration of solution attempts.
Abstract: This paper describes our efforts
at constructing a fine-grained student model in Project LISTEN’s intelligent
tutor for reading.
Abstract: This paper reports results on using data mining to extract useful variables from a database that contains interactions between the student and Project LISTEN’s Reading Tutor. Our approach is to find variables we believe to be useful in the information logged by the tutor, and then to derive models that relate those variables to student’s scores on external, paper-based tests of reading proficiency. Once the relationship between the recorded variables and the paper tests is discovered, it is possible to use information recorded by the tutor to assess the student’s current level of proficiency. The major results of this work were the discovery of useful features available to the Reading Tutor that describe students, and a strong predictive model of external tests that correlates with actual test scores at 0.88.
Abstract: A year-long study of 131 second and third graders in 12 classrooms compared three daily 20-minute treatments. (a) 58 students in 6 classrooms used the 1999-2000 version of Project LISTEN’s Reading Tutor, a computer program that uses automated speech recognition to listen to a child read aloud, and gives spoken and graphical assistance. Students took daily turns using one shared Reading Tutor in their classroom while the rest of their class received regular instruction. (b) 34 students in the other 6 classrooms were pulled out daily for one-on-one tutoring by certified teachers. To control for materials, the human tutors used the same set of stories as the Reading Tutor. (c) 39 students served as in-classroom controls, receiving regular instruction without tutoring. We compared students’ pre- to post-test gains on the Word Identification, Word Attack, Word Comprehension, and Passage Comprehension subtests of the Woodcock Reading Mastery Test, and in oral reading fluency. Surprisingly, the human-tutored group significantly outgained the Reading Tutor group only in Word Attack (main effects p<.02, effect size .55). Third graders in both the computer- and human-tutored conditions outgained the control group significantly in Word Comprehension (p<.02, respective effect sizes .56 and .72) and suggestively in Passage Comprehension (p=.14, respective effect sizes .48 and .34). No differences between groups on gains in Word Identification or fluency were significant. These results are consistent with an earlier study in which students who used the 1998 version of the Reading Tutor outgained their matched classmates in Passage Comprehension (p=.11, effect size .60), but not in Word Attack, Word Identification, or fluency. To shed light on outcome differences between tutoring conditions and between individual human tutors, we compared process variables. Analysis of logs from all 6,080 human and computer tutoring sessions showed that human tutors included less rereading and more frequent writing than the Reading Tutor. Micro-analysis of 40 videotaped sessions showed that students who used the Reading Tutor spent considerable time waiting for it to respond, requested help more frequently, and picked easier stories when it was their turn. Human tutors corrected more errors, focussed more on individual letters, and provided assistance more interactively, for example getting students to sound out words rather than sounding out words themselves as the Reading Tutor did.
Abstract: When does taking time to preview a new word before reading a story improve vocabulary and comprehension more than encountering the word in context? To address this question, the 2001-2002 version of Project LISTEN's Reading Tutor embedded an automated experiment to compare three types of vocabulary preview -- defining the word, giving a synonym, or just asking about the word -- and a control condition. Outcomes included within-story comprehension as measured by performance on multiple-choice cloze questions, and post-story vocabulary as measured by matching words to their definitions. We analyze results based on thousands of randomized trials. [ICMI 2002 emotional] Aist, G., Kort, B., Reilly, R., Mostow, J., & Picard, R. (2002, October 14-16). Experimentally Augmenting an Intelligent Tutoring System with Human-Supplied Capabilities: Adding Human-Provided Emotional Scaffolding to an Automated Reading Tutor that Listens. Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002), Pittsburgh, PA, 483-490. Revised version of paper first presented at ITS 2002 Workshop on Empirical Methods for Tutorial Dialogue Systems, San Sebastian, Spain. Download paper in pdf format. Abstract: This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system. We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support – but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too. [ICMI 2002] Mostow, J., Beck, J., Chalasani, R., Abstract: It is easier to record logs of
multimodal human-computer tutorial dialogue than to make sense of them.
In the 2000-2001 school year, we logged the interactions of approximately 400
students who used Project LISTEN’s Reading Tutor and who read aloud over 2.4
million words. This paper discusses some difficulties we encountered
converting the logs into a more easily understandable database. It is
faster to write SQL queries to answer research questions than to analyze
complex log files each time. The database also permits us to construct
a viewer to examine individual
Abstract: This paper explores the problem of
predicting specific reading mistakes, called miscues, on a given word.
Characterizing likely miscues tells an automated reading tutor what to
anticipate, detect, and remediate. As training and test data, we use a
database of over 100,000 miscues transcribed by
Abstract: This paper addresses an indispensable skill using a unique method to teach a critical component: helping children learn to read by using computer-assisted oral reading to help children learn vocabulary. We build on Project LISTEN’s Reading Tutor, a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). To learn a word from reading with the Reading Tutor, students must encounter the word and learn the meaning of the word in context. We modified the Reading Tutor first to help students encounter new words and then to help them learn the meanings of new words. We then compared the Reading Tutor to classroom instruction and to human-assisted oral reading as part of a yearlong study with 144 second and third graders. The result: Second graders did about the same on word comprehension in all three conditions. However, third graders who read with the 1999 Reading Tutor, modified as described in this paper, performed statistically significantly better than other third graders in a classroom control on word comprehension gains – and even comparably with other third graders who read one-on-one with human tutors.
Abstract: A 7-month study of 178 students in grades 1-4 at two schools compared two daily 20-minute treatments. 88 students did Sustained Silent Reading (SSR) in their classrooms. 90 students in 10-computer labs used the 2000-2001 version of Project LISTEN’s Reading Tutor (RT), which uses speech recognition to listen to a child read aloud, and responds with spoken and graphical assistance (www.cs.cmu.edu/~listen). The RT group significantly outgained their statistically matched SSR classmates in phonemic awareness, rapid letter naming, word identification, word comprehension, passage comprehension, fluency, and spelling – especially in grade 1, where effect sizes for these skills ranged from .20 to .72.
Abstract: Analyzing the time allocation of students’ activities in a school-deployed mixed initiative tutor can be illuminating but surprisingly tricky. We discuss some complementary methods that we have used to understand how tutoring time is spent, such as analyzing sample videotaped sessions by hand, and querying a database generated from session logs. We identify issues, methods, and lessons that may be relevant to other tutors. One theme is that iterative design of “non-tutoring” components can enhance a tutor’s effectiveness, not by improved teaching, but by reducing the time wasted on non-learning activities. Another is that it is possible to relate student’s time allocation to improvements in various outcome measures.
Abstract: Our goal is to find a methodology
for directing development effort in an intelligent tutoring system (ITS).
Given that ITS have several AI reasoning components, as well as content to
present, evaluating them is a challenging task. Due to these difficulties,
few evaluation studies to measure the impact of individual components have
been performed. Our architecture evaluates the efficacy of each component of
an ITS and considers the impact of a particular teaching goal when
determining whether a particular component needs improving. For our
AnimalWatch tutor, we found that for certain goals the tutor itself, rather
than its reasoning components, needed improvement. We have found that it is
necessary to know what the system’s teaching goals are before deciding which
component is the limiting factor on performance. [Based on Dr. Beck's
research at
Abstract: This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system. We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support – but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too
Abstract: It is easier to record logs of
multimodal human-computer tutorial dialogue than to make sense of them.
This paper discusses some of the problems in extracting useful information
from such logs and the difficulties we encountered in converting the logs
into a more easily understandable database. Once log files are parsed
into a database, it is possible to write SQL queries to answer research
questions faster than analyzing complex log files each time. The
database permits us to construct a viewer to examine individual
Abstract: Can vocabulary and comprehension assessments be generated automatically for a given text? We describe the automated method used to generate, administer, and score multiple-choice vocabulary and comprehension questions in the 2001-2002 version of Project LISTEN’s Reading Tutor. To validate the method against the Woodcock Reading Mastery Test, we analyzed 69,326 multiple-choice cloze items generated in the course of regular Reading Tutor use by 364 students in grades 1-9 at seven schools. Correlation between predicted and actual scores reached R=.85 for Word and Passage Comprehension. [CVDA 2002 latency] Jia, P., Beck, J. E., & Mostow, J. (2002, June 3). Can a Reading Tutor that Listens use Inter-word Latency to Assess a Student's Reading Ability? ITS 2002 Workshop on Creating Valid Diagnostic Assessments, San Sebastian, Spain, pp. 23-32. Download paper in pdf format. Abstract: This paper describes our use of inter-word latency, the delay before a student speaks a word in the course of reading a sentence aloud, to assess oral reading automatically. The context of our study is a Reading Tutor that uses automated speech recognition to listen to children read aloud. Using data from 58 students in grades 1 through 4, we used inter-word latency to predict scores on external, individually administered, paper-based tests. Correlation between predicted and actual test scores exceeded .7 for fluency, word attack, word identification, word comprehension, and passage comprehension. Compared with paper-based tests, this evaluation method is much cheaper, based on computer-guided oral reading recorded in the course of regular tutor use, and invisible to students. It has the potential to provide continuous assessment of student progress, both to report to teachers and to guide its own tutoring. [IRA 2002 award] Aist, G. (2002, April
29). Helping Children Learn Vocabulary during Computer-Assisted Oral
Reading: A Dissertation Summary [Poster presented as a Distinguished
Finalist for the Outstanding Dissertation of the Year Award]. 47th Annual
Convention of the International Reading Association, [IJAIED 2001] Aist, G. Towards automatic glossarization: automatically constructing and administering vocabulary assistance factoids and multiple-choice assessment. International Journal of Artificial Intelligence in Education (2001) 12, 212-231. Download from IJAIE website. Abstract: We address an important problem with a novel approach: helping children learn words during computer-assisted oral reading. We build on Project LISTEN's Reading Tutor, which is a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). In this paper, we focus on the problem of vocabulary acquisition. To learn a word from reading with the Reading Tutor, students must first encounter the word and then learn the meaning of the word from context. This paper describes how we modified the Reading Tutor to help students learn the meanings of new words by augmenting stories with WordNet-derived comparisons to other words – "factoids". Furthermore, we report results from an embedded experiment designed to evaluate the effectiveness of including factoids in stories that children read with the Reading Tutor. Factoids helped – not for all students and all words, but for third graders seeing rare words, and for single sense rare words tested one or two days later. We also discuss further steps towards automatic construction of explanations of words. [FF 2001] Mostow, J., and Aist, G. Evaluating tutors that listen: An overview of Project LISTEN. In (K. Forbus and P. Feltovich, Eds.) Smart Machines in Education, pp. 169-234. MIT/AAAI Press, 2001. Order book from AAAI Press. [DYD 2001] Aist, G. Towards Worldwide Literacy:
Technological Affordances, Economic Challenges, Affordable Technology.
Development by Design: Workshop on Collaborative Open Source Design of
Appropriate Technologies. MIT Media Lab, [NAACL 2001] Jack Mostow, Greg Aist, Juliet
Bey, Paul Burkhead, Abstract: Project LISTEN’s Reading Tutor helps children learn to read. It uses speech recognition to listen to them read aloud, and responds with spoken and graphical feedback. The demonstration lets attendees try out this interaction themselves. Besides the spoken tutorial dialog, features shown include an automated tutorial for new users, interactive activities that combine assisted reading with other types of steps, and automated field studies to evaluate the efficacy of alternative tutorial interventions by embedding experiments within the Reading Tutor. [WTDS 2001 DT] Murray, R. Charles, Van Lehn,
Kurt, and Mostow, Jack. A Decision-Theoretic Approach for Selecting
Tutorial Discourse Actions. In Proceedings of the NAACL 2001 Workshop on
Adaptation in Dialogue Systems, [WTDS 2001 DTa] Murray, R. Charles, Van Lehn, Kurt, and Mostow, Jack. A Decision-Theoretic Architecture for Selecting Tutorial Discourse Actions. In Proceedings of the AIED-2001 Workshop on Tutorial Dialog Systems, San Antonio, Texas, May 2001, pp. 35-46. Download paper in pdf format. Abstract: We propose a decision-theoretic architecture for selecting tutorial discourse actions. DT Tutor, an action selection engine which embodies our approach, uses a dynamic decision network to consider the tutor’s objectives and uncertain beliefs in adapting to the changing tutorial state. It predicts the effects of the tutor’s discourse actions on the tutorial state, including the student’s internal state, and then selects the action with maximum expected utility. We illustrate our approach with prototype applications for diverse domains: calculus problem-solving and elementary reading. Formative off-line evaluations assess DT Tutor’s ability to select optimal actions quickly enough to keep a student engaged. [AIED 2001 poster] Mostow, J., Aist, G.
S., Burkhead, P., Corbett, A., Abstract: A year-long study of 144 second and third graders compared outcomes (gains in test scores) and process variables (e.g. words read) for Project LISTEN’s Reading Tutor, human tutors, and a classroom control. Human tutors beat the Reading Tutor only in word attack. Both beat the control in grade 3 word comprehension. [AIED 2001 pause video] Jack
Mostow, Cathy Huang, and Brian Tobin. Pause the Video: Quick but
quantitative expert evaluation of tutorial choices in a Reading Tutor that
listens. In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial
Intelligence in Education: AI-ED in the Wired and Wireless Future, pp.
343-353. Abstract: To critique Project LISTEN’s automated Reading Tutor, we adapted a panel-of-judges methodology for evaluating expert systems. Three professional elementary educators watched 15 video clips of the Reading Tutor listening to second and third graders read aloud. Each expert chose which of 10 interventions to make in each situation. To keep the Reading Tutor’s choice from influencing the expert, we paused each video clip just before the Reading Tutor intervened. After the expert responded, we played back what the Reading Tutor had actually done. The expert then rated its intervention compared to hers. Although the experts seldom agreed, they rated the Reading Tutor’s choices as better than their own in 5% of the cases, equally good in 36%, worse but OK in 41%, and inappropriate in only 19%. The lack of agreement and the surprisingly favorable ratings together suggest that either the Reading Tutor’s choices were better than we thought, the experts knew less than we hoped, or the clips showed less than they should. [AIED 2001 miscue mining] James
Fogarty, Laura Dabbish, David Steck, and Jack Mostow. Mining a database
of reading mistakes: For what should an automated Reading Tutor listen?
In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial
Intelligence in Education: AI-ED in the Wired and Wireless Future, pp.
422-433. Abstract: Using a machine learning approach
to mine a database of over 70,000 oral reading mistakes transcribed by [AIED 2001 vocabulary gains]
Aist, G. S., Mostow, J., Tobin, B., Burkhead, P., Corbett, A., Abstract: We describe results on helping children learn vocabulary during computer-assisted oral reading. This paper focuses on one aspect – vocabulary learning – of a larger study comparing computerized oral reading tutoring to classroom instruction and one-on-one human tutoring. 144 students in second and third grade were assigned to one of three conditions: (a) classroom instruction, (b) classroom instruction with one-on-one tutoring replacing part of the school day, and (c) computer instruction replacing part of the school day. For second graders, there were no significant differences between treatments in word comprehension gains. For third graders, however, the computer tutor showed an advantage over classroom instruction for gains in word comprehension (p = 0.042, effect size = 0.56) as measured by the Woodcock Reading Mastery Test. One-on-one human tutoring also showed an advantage over classroom instruction alone (p = 0.039, effect size = 0.72). Computer tutoring and one-on-one human tutoring were not significantly different in terms of word comprehension gains. [AIED 2001 factoids] Gregory S.
Aist. Factoids: Automatically constructing and administering vocabulary
assistance and assessment. In J. D. Moore, C. L. Redfield, and W. L.
Johnson (Eds.), Artificial Intelligence in Education: AI-ED in the
Wired and Wireless Future, pp. 234-245. Abstract: We address an important problem with a novel approach: helping children learn words during computer-assisted oral reading. We build on Project LISTEN's Reading Tutor, which is a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). In this paper, we focus on the problem of vocabulary acquisition. To learn a word from reading with the Reading Tutor, students must first encounter the word and then learn the meaning of the word from context. This paper describes how we modified the Reading Tutor to help students learn the meanings of new words by augmenting stories with WordNet-derived comparisons to other words – “factoids”. Furthermore, we report results from an embedded experiment designed to evaluate the effectiveness of including factoids in stories that children read with the Reading Tutor. Factoids helped – not for all students and all words, but for third graders seeing rare words, and for single-sense rare words tested one or two days later. [2001 PhD] Aist, G. 2001. Helping Children
Learn Vocabulary during Computer-Assisted Oral [AAAI 2000 SA] Aist, G.
Identifying words to explain to a reader: A preliminary study. Student
Abstract and Poster, Proceedings of the Seventeenth National Conference on
Artificial Intelligence (AAAI-2000), p. 1061. [AAAI 2000 DC] Aist, G. Helping
children learn vocabulary during computer assisted oral reading.
SIGART/AAAI Doctoral Consortium, Proceedings of the Seventeenth National
Conference on Artificial Intelligence (AAAI-2000), pp. 1100-1101. [HMC 2000] Aist, G. Taking Turns Talking
About Text in a Abstract: In this paper we report on ongoing work on turn-taking in Project LISTEN's Reading Tutor (Mostow & Aist CALICO 1999). Project LISTEN’s Reading Tutor listens to children read aloud and helps them learn to read. The Reading Tutor’s repertoire of turn-taking behaviors includes not only alternating turns, but also backchanneling, interrupting, and prompting. [ITS 2000 YR] Aist, G. An
informal model of vocabulary acquisition during assisted oral reading and
some implications for computerized instruction. In R. Nkambou (Ed.), ITS'2000
Young Researchers Track Proceedings, pp. 22-24. Fifth International
Conference on Intelligent Tutoring Systems. [ITS 2000 PA] Aist, G. and Mostow, J.
Improving story choice in a reading tutor that listens. Proceedings
of the Fifth International Conference on Intelligent Tutoring Systems
(ITS’2000), p. 645. [ITS 2000 HT] Aist, G. Human Tutor and
Computer Tutor Story Choice in Listening to Children Read Aloud. In B.
du Boulay (Ed.), Proceedings of the ITS'2000 Workshop on Modeling Human
Teaching Tactics and Strategies, pp. 8-10. Fifth International
Conference on Intelligent Tutoring Systems. Abstract: A preliminary report on a comparison of human tutor story choice and mixed-initiative computer tutor story choice in Project LISTEN's Reading Tutor. [ITS 2000 ML] Aist, G. and Mostow, J.
Using Automated Within-Subject Invisible Experiments to Test the Effectiveness
of Automated Vocabulary Assistance. In Joseph Beck (Ed.), Proceedings of
ITS'2000 Workshop on Applying Machine Learning to ITS Design/Construction, pp.
4-8. Fifth International Conference on Intelligent Tutoring
Systems. Abstract: Machine learning offers the potential to allow an intelligent tutoring system to learn effective tutoring strategies. A necessary prerequisite to learning an effective strategy is being able to automatically test a strategy's effectiveness. We conducted an automated, within-subject “invisible experiment” to test the effectiveness of a particular form of vocabulary instruction in a Reading Tutor that listens. Both conditions were in the context of assisted oral reading with the computer. The control condition was encountering a word in a story. The experimental condition was first reading a short automatically generated "factoid" about the word, such as "cheetah can be a kind of cat. Is it here?" and then reading the sentence from the story containing the target word. The initial analysis revealed no significant difference between the conditions. Further inspection revealed that sometimes students benefited from receiving help on "hard" or infrequent words. Designing, implementing, and analyzing this experiment shed light not only on the particular vocabulary help tested, but also on the machine-learning-inspired methodology we used to test the effectiveness of this tutorial action. [ESCA 99] Aist, G. and Mostow,
J. Measuring the Effects of Backchanneling in Computerized Oral Abstract: What is the effect of back channeling on human-computer dialog, and how should such effects be measured? We present experiments designed to evaluate the immediate effects of back channeling on computer-assisted oral reading tutoring. These experiments are implemented in a reading tutor that listens to children read aloud, and helps them learn to read. As a byproduct of designing, conducting, and evaluating these experiments, we are able to describe some unique methodological challenges in evaluating the effects of low-level turn taking dialog behavior. [USPTO 99] Mostow, J. and Aist, G. Reading and Pronunciation Tutor. United States Patent No. 5,920,838. Filed June 2, 1997; issued July 6, 1999. US Patent and Trademark Office. Abstract: A computer implemented reading tutor comprises a player for outputting a response. An input block implementing a plurality of functions such as silence detection, speech recognition, etc. captures the read material. A tutoring function compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player. A quality control function evaluates the captured read material and stores the captured material in the knowledge base under certain conditions. An auto enhancement function uses information available to the tutor to create additional resources such as identifying rhyming words, words with common roots, etc., which can be used as responses. [AAAI99] Mostow, J. and Aist, G. Authoring
New Material in a Abstract: Project LISTEN’s Reading Tutor helps children learn to read by providing assisted practice in reading connected text. A key goal is to provide assistance for reading any English text entered by students or adults. This live demonstration shows how the Reading Tutor helps users enter and narrate stories, and then helps children read them. [CALICO99] Mostow, J. and Aist, G.
Giving Help and Praise in a Abstract: Human tutors make use of a wide range of input and output modalities, such as speech, vision, gaze, and gesture. Computer tutors are typically limited to keyboard and mouse input. Project LISTEN’s Reading Tutor uses speech recognition technology to listen to children read aloud and help them. Why should a computer tutor listen? A computer tutor that listens can give help and praise naturally and unobtrusively. We address the following questions: When and how should a computer tutor that listens help students? When and how should it praise students? We examine how the advantages and disadvantages of speech recognition technology helped shape the design and implementation of the Reading Tutor. Despite its limitations, this technology enables the Reading Tutor to provide patient, unobtrusive, and natural assistance for reading aloud. [SRinCALL] G. Aist. Speech recognition in computer assisted language learning. In K. C. Cameron (ed.), Computer Assisted Language Learning (CALL): Media, Design, and Applications. Lisse: Swets & Zeitlinger, 1999. [CHI99] G. Aist. Skill-specific spoken dialogs in
a reading tutor that listens. Doctoral Consortium paper. In Proceedings
of the Conference on Human Factors in Computing Systems: CHI 99
Extended Abstracts, pp. 55-56. [LIS99] Mostow, J. (ed.), McClelland, J., Fiez, J., McCandliss, B., Plaut, D., and Schneider, W. Poster and short presentation at the NSF Learning & Intelligent Systems Principal Investigators' meeting, Washington, DC, May, 1999. At http://www.cnbc.cmu.edu/collaborative/lisweb/ppt/index.htm. In J. McClelland (PI), Intervention Strategies that Promote Learning: Their Basis and Use in Enhancing Literacy, at http://www.cnbc.cmu.edu/collaborative/lisweb [HCIGW99 CRLT] Mostow, J.
Collaborative Research on Learning Technologies: An Automated [HCIGW99 IS] Mostow, J. Guiding Spoken
Dialogue with Computers by Responding to Prosodic Cues. Proceedings of the NSF Human
Computer Interaction Grantees Workshop (HCIGW99), [ICSLP98 acoustic]
Aist, G., Chan, P., Huang, X. D., Jiang, L., Kennedy, R., Latimer, D.,
Mostow, J., and Yeung, C. How effective is unsupervised data collection for
children's speech recognition? International Conference on Speech and
Language Processing (ICSLP98). Abstract: Children present a unique challenge to automatic speech recognition. Today’s state-of-the-art speech recognition systems still have problems handling children’s speech because acoustic models are trained on data collected from adult speech. In this paper we describe an inexpensive way to mend this problem. We collected children’s speech when they interact with an automated reading tutor. These data are subsequently transcribed by a speech recognition system and automatically filtered. We studied how to use these automatically collected data to improve children’s speech recognition system’s performance. Experiments indicate that automatically collected data can reduce the error rate significantly on children’s speech. [ICLSP98 architecture] Aist,
G. Expanding A Time-Sensitive Conversational Architecture For
Turn-Taking To Handle Content-Driven Interruption. International
Conference on Speech and Language Processing (ICSLP98). Abstract: Turn taking in spoken language systems has generally been push-to-talk or strict alternation (user speaks, system speaks, user speaks, …) with some systems such as telephone-based systems handling barge-in (interruption by the user.) In this paper we describe our time sensitive conversational architecture for turn taking that not only allows alternating turns and barge in, but other conversational behaviors as well. This architecture allows back channeling, prompting the user by taking more than one turn if necessary, and overlapping speech. The architecture is implemented in a Reading Tutor that listens to children read aloud, and helps them. We extended this architecture to allow the Reading Tutor to interrupt the student based on a non-self-corrected mistake – “content-driven interruption”. To the best of our knowledge, the Reading Tutor is thus the first spoken language system to intentionally interrupt the user based on the content of the utterance. [AAAI AMLDP 98] G. Aist and J. Mostow.
Estimating the Effectiveness of Conversational Behaviors in a Abstract: Project LISTEN's Reading Tutor listens to children read aloud, and helps them learn to read. Besides user satisfaction, a primary criterion for tutorial spoken dialogue agents should be educational effectiveness. In order to learn to be more effective, a spoken dialogue agent must be able to evaluate the effect of its own actions. When evaluating the effectiveness of individual actions, rather than comparing a conversational action to "nothing," an agent must compare it to reasonable alternative actions. We describe a methodology for analyzing the immediate effect of a conversational action, and some of the difficulties in doing so. We also describe some preliminary results on evaluating the effectiveness of conversational behaviors in a reading tutor that listens. [AAAI IE 98] J. Kominek, G. Aist, and J.
Mostow. When Listening Is Not Enough: Potential Uses of Vision for a Abstract: Speech offers a powerful avenue between user and computer. However, if the user is not speaking, or is speaking to someone else, what is the computer to make of it? Project LISTEN's Reading Tutor is speech-aware software that strives to teach children to read. Because it is useful to know what the child is doing when reading, we are investigating some potential uses of computer vision. By recording and analyzing video of the Tutor in use, we measured the frequency of events that cannot be detected by speech alone. These include how often the child is visually distracted, and how often the teacher or another student provides assistance. This information helps us assess how vision might enhance the effectiveness of the Reading Tutor. [AAAI CAHM 97] G. S. Aist
and J. Mostow. A time to be silent and a time to speak: Time-sensitive
communicative actions in a reading tutor that listens. AAAI Fall Symposium
on Communicative Actions in Humans and Machines. Abstract: Timing is important in discourse, and key in tutoring. Communicative actions that are too late or too early may be infelicitous. How can an agent engage in temporally appropriate behavior? We present a domain-independent architecture that models elapsed time as a critical factor in understanding the discourse. Our architecture also allows for "invisible experiments" where the agent varies its behavior and studies the effects of its behavior on the discourse. This architecture has been instantiated and is in use in an oral reading tutor that listens to children read aloud and helps them. [PUI 97] G. S. Aist and J. Mostow. When Speech
Input is Not an Afterthought: A Reading Tutor that Listens. Proceedings of
the Workshop on Perceptual User Interfaces, Abstract: Project LISTEN's Reading Tutor listens to children read aloud, and helps them. The first extended in-school use of the Reading Tutor suggests that for this task speech input can be natural, compelling, and effective. [CALL 97] G. S. Aist and J. Mostow. Adapting
Human Tutorial Interventions for a Abstract: Human tutors make use of a wide range of input and output modalities, such as speech, vision, gaze, and gesture. Computer tutors are typically limited to keyboard and mouse input. Project LISTEN's Reading Tutor listens to children read aloud, and helps them. Why should a computer tutor listen? A computer tutor that listens can give help and give praise naturally and unobtrusively. In this paper, we address the following questions: When and how should a computer tutor that listens help students? When and how should a computer tutor that listens praise students? We examine how the advantages and disadvantages of speech recognition helped shape the design and implementation of the Reading Tutor. Despite its limitations, speech recognition enables the Reading Tutor to provide patient, unobtrusive, and natural assistance for reading out loud. [ISGW97 CRLT] J. Mostow. Collaborative
Research on Learning Technologies: An Automated Reading Assistant That
Listens. Proceedings of the NSF
Interactive Systems Grantees Workshop (ISGW97), [ISGW97 IS] J. Mostow. Guiding Spoken
Dialogue with Computers by Responding to Prosodic Cues. Proceedings of the NSF Interactive
Systems Grantees Workshop (ISGW97), [ISGW97 KIDS] J. Mostow and M. Eskenazi. A Database of
Children's Speech. Proceedings
of the NSF Interactive Systems Grantees Workshop (ISGW97), [LDC KIDS] M. Eskenazi and J. Mostow. The CMU
KIDS Speech Corpus. Corpus of children's read speech digitized and
transcribed on two CD-ROMs, with assistance from Multicom Research and David
Graff. Published by the Linguistic Data
Consortium, [IAAI97] J. Mostow. Artificial Intelligence and
Education. Invited talk at the Ninth National Conference on Innovative
Applications of Artificial Intelligence (IAAI-97). [AAAI97] J. Mostow and G. Aist. The Sounds of
Silence: Towards Automated Evaluation of Student Learning in a Abstract: We propose a paradigm for ecologically valid, authentic, unobtrusive, automatic, data-rich, fast, robust, and sensitive evaluation of computer-assisted student performance. We instantiate this paradigm in the context of a Reading Tutor that listens to children read aloud, and helps them. We introduce inter-word latency as a simple prosodic measure of assisted reading performance. Finally, to validate the measure and analyze performance improvement, we report initial experimental results from the first extended in-school deployment of the Reading Tutor. [1997 video] J. Mostow. Pilot Evaluation of
Project LISTEN's Reading Tutor (5-minute video). July, 1997. Presented at the
Fourteenth National Conference on Artificial Intelligence (AAAI-97)
and the Ninth National Conference on Innovative Applications of Artificial
Intelligence (IAAI-97). [EDMEDIA 97] J. Mostow and G. Aist. Project
LISTEN: A [MS 97] G. S. Aist. A General Architecture for a
Real-Time Discourse Agent and a Case Study in Oral [AAAI CMMII 97] G. S. Aist. Challenges for a mixed initiative spoken dialog system for oral reading tutoring. In Computational Models for Mixed Initiative Interaction: Working Notes of the AAAI 1997 Spring Symposium. March, 1997. Download paper in pdf format. Abstract: Deciding when a task is complete and deciding when to intervene and provide assistance are two basic challenges for an intelligent tutoring system. This paper describes these decisions in the context of Project LISTEN, an oral reading tutor that listens to children read aloud and helps them. We present theoretical analysis and experimental results demonstrating that supporting mixed initiative interaction produces better decisions on the task completeness decision than either system-only or user-only initiative. We describe some desired characteristics of a solution to the intervention decision, and specify possible evaluation criteria for such a solution. [CAETI 96 video] J.
Mostow. A [JASA 96] M. Eskenazi. KIDS: A database
of children's speech. Journal of the Acoustic Society of Abstract: We have collected a database of children reading age- and reading-level-appropriate text aloud. This (labelled) data, to be distributed in the near future, was primarily intended to be used in CMU's LISTEN tutor which employs speech recognition to monitor children's reading and then help correct errors. The speaker population was therefore chosen to represent good and poor readers and to incorporate dialects of the speakers for whom the reading coach is intended. Phonemic balance could not be achieved (although it has been calculated) since the primary concern in recording children reading is to present sentences that can effectively be read by first through third graders. The text is a series of sentences we adapted from text in the Weekly Reader series - most of the adaptation concerned the lack of the accompanying images. The text was chosen for its intrinsic interest and widespread use. Several trial recording sessions allowed us to develop a protocol that kept extraneous noises produced by the children at a minimum. We will discuss this and other problems inherent in recording children reading. Novel techniques developed for labelling this kind of speech will also be presented. This work was funded by NSF Grant No. IRI-9528984. [UIST 95] J. Mostow, A.
Hauptmann, and S. Roth. Demonstration of a Reading Coach that Listens. In Proceedings
of the Eighth Annual Symposium on User Interface Software and Technology,
pp. 77-78. Sponsored by ACM SIGGRAPH and SIGCHI in cooperation with SIGSOFT, Abstract: Project LISTEN stands for "Literacy Innovation that Speech Technology ENables." We will demonstrate a prototype automated reading coach that displays text on a screen, listens to a child read it aloud, and helps where needed. We have tested successive prototypes of the coach on several dozen second graders. Mostow et al [AAAI94] reports implementation details and evaluation results. Here we summarize its functionality, the issues it raises in human-computer interaction, and how it addresses them. We are redesigning the coach based on our experience, and will demonstrate its successor at UIST '95. [NSF ISGW 95] J. Mostow & M. Eskenazi,
summary of NSF project, November 1995, [AAAI 94] J. Mostow, S. Roth, A. G. Hauptmann, and M. Kane, "A Prototype Reading Coach that Listens", Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), American Association for Artificial Intelligence, Seattle, WA, August 1994, pp. 785-792. Recipient of the AAAI-94 Outstanding Paper Award. Download paper in pdf format. Abstract: We report progress on a new approach to combating illiteracy -- getting computers to listen to children read aloud. We describe a fully automated prototype coach for oral reading. It displays a story on the screen, listens as a child reads it, and decides whether and how to intervene. We report on pilot experiments with low-reading second graders to test whether these interventions are technically feasible to automate and pedagogically effective to perform. By adapting a continuous speech recognizer, we detected 49% of the misread words, with a false alarm rate under 4%. By incorporating the interventions in a simulated coach, we enabled the children to read and comprehend material at a reading level 0.6 years higher than what they could read on their own. We show how the prototype uses the recognizer to trigger these interventions automatically. [AAAI 94 video] J. Mostow, S. Roth, A. Hauptmann, M. Kane, A. Swift, L. Chase, and B. Weide, "A Reading Coach that Listens (6-minute video)", Video Track of the Twelfth National Conference on Artificial Intelligence (AAAI94), American Association for Artificial Intelligence, Seattle, WA, August 1994. Download paper in pdf format. [ARPA HLT 94] A. G. Hauptmann, J. Mostow, S.
F. Roth, M. Kane, and A. Swift, "A Prototype Reading Coach that Listens:
Summary of Project LISTEN." In C. Weinstein (ed.), Proceedings ARPA
Workshop on Human Language Technology,March 1994, [Eurospeech 93] A. G.
Hauptmann, L. L. Chase, and J. Mostow, "Speech Recognition Applied to
Reading Assistance for Children: A Baseline Language Model", Proceedings
of the 3rd European Conference on Speech Communication and Technology
(EUROSPEECH93), Abstract: We describe an approach to using speech recognition in assisting children's reading. A state-of-the-art speaker independent continuous speech recognizer designed for large vocabulary dictation is adapted to the task of identifying substitutions and omissions in a known text. A baseline language model for this new task is detailed and evaluated against a corpus of children reading graded passages. We are able to identify words missed by a reader with an average false positive rate of 39% and a corresponding false negative rate of 37%. These preliminary results are encouraging for our long-term goal of providing automated coaching for children learning to read. [Video 93] J. Mostow, S. Roth, A. Hauptmann, M.
Kane, A. Swift, L. Chase, and B. Weide, "Getting Computers to Listen to
Children Read: [AAAI 93] J. Mostow, A. G. Hauptmann, L. L. Chase, and S. Roth, "Towards a Reading Coach that Listens: Automated Detection of Oral Reading Errors", Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI93), American Association for Artificial Intelligence, Washington, DC, July 1993, pp. 392-397. Download paper in pdf format. Abstract: What skill is more important to teach than reading? Unfortunately, millions of Americans cannot read. Although a large body of educational software exists to help teach reading, its inability to hear the student limits what it can do. This paper reports a significant step toward using automatic speech recognition to help children learn to read: an implemented system that displays a text, follows as a student reads it aloud, and automatically identifies which words he or she missed. We describe how the system works, and evaluate its performance on a corpus of second graders' oral reading that we have recorded and transcribed. |
||||||||||||||||||
|
|
|
||||||||||||||||||