Project LISTEN
A Reading Tutor that Listens
Last updated: October 15, 2009

 

Summary
Awards 
In the News 
Progress 

 
Research Basis 
Publications

Photos
Videos
People

Project LISTEN Publications

[Note:  Links to full text are included when possible, e.g. after publication or conference presentation.

* marks publications by others.
See In the News for articles by others in newpapers, magazines, etc.
See Research Basis for a brief summary of published intervention studies and research underlying the Reading Tutor.
Most of these conferences and workshops involve two or more stringent peer reviews of the full paper (not just the abstract), including suggested revisions.  Publication in these proceedings is considered archival:  ITS2008 accepted only 30% (63 of 207) submissions as full papers; UM2003 26 of 105; ICMI2002 87 of 165; ITS2002 93 of 167; AIED2001 45 of 112; and AAAI2000 143 of 432.]

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

 


[Interspeech 2009 predictable] Aist, G., & Mostow, J. (2009, September 6-10). Designing Spoken Tutorial Dialogue with Children to Elicit Predictable but Educationally Valuable Responses. 10th Annual Conference of the International Speech Communication Association (Interspeech), Brighton, UK.  Click here for .pdf file.

 

Abstract:  How to construct spoken dialogue interactions with children that are educationally effective and technically feasible? To address this challenge, we propose a design principle that constructs short dialogues in which (a) the user’s utterance are the external evidence of task performance or learning in the domain, and (b) the target utterances can be expressed as a well-defined set, in some cases even as a finite language (up to a small set of variables which may change from exercise to exercise.) The key approach is to teach the human learner a parameterized process that maps input to response. We describe how the discovery of this design principle came out of analyzing the processes of automated tutoring for reading and pronunciation and designing dialogues to address vocabulary and comprehension, show how it also accurately describes the design of several other language tutoring interactions, and discuss how it could extend to non-language tutoring tasks.

 


[SLaTE 2009 predictable] Aist, G., & Mostow, J. (2009, September 3-5). Predictable and Educational Spoken Dialogues: Pilot Results. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England.  Click here for .pdf file.

 

Abstract:  This paper addresses the challenge of designing spoken dialogues that are of educational benefit within the context of an intelligent tutoring system, yet predictable enough to facilitate automatic speech recognition and subsequent processing. We introduce a design principle to meet this goal: construct short dialogues in which the desired student utterances are external evidence of performance or learning in the domain, and in which those target utterances can be expressed as a well-defined set. The key to this principle is to teach the human learner a process that maps inputs to responses. Pilot results in two domains - self-generated questions and morphology exercises - indicate that the approach is promising in terms of its habitability and the predictability of the utterances elicited. We describe the results and sketch a brief taxonomy classifying the elicited utterances according to whether they evidence student performance or learning, whether they are amenable to automatic processing, and whether they support or call into question the hypothesis that such dialogues can elicit spoken utterances that are both educational and predictable.

 


[SLaTE 2009 prosody] Duong, M., & Mostow, J. (2009, September 3-5). Detecting Prosody Improvement in Oral Rereading. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England.  Click here for .ppt file.  Click here for .pdf file.

 

Abstract:  A reading tutor that listens to children read aloud should be able to detect fluency growth - not only in oral reading rate, but also in prosody.  How sensitive can such detection be?  We present an approach to detecting improved oral reading prosody in rereading a given text.  We evaluate our method on data from 133 students ages 7-10 who used Project LISTEN's Reading Tutor.  We compare the sensitivity of our extracted features in detecting improvements.  We use them to compare the magnitude of recency and learning effects.  We find that features computed by correlating the student's prosodic contours with those of an adult narration of the same text are generally not as sensitive to gains as features based solely on the student's speech.  We also find that rereadings on the same day show greater improvement than those on later days: statistically reliable recency effects are almost twice as strong as learning effects for the same features.

 


[SLaTE 2009 contexts] Liu, L., Mostow, J., & Aist, G. (2009, September 3-5). Automated Generation of Example Contexts for Helping Children Learn Vocabulary. Second ISCA Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey Estate, Warwickshire, England.  Click here for .pdf file.

 

Abstract:   This paper addresses the problem of generating good example contexts to help children learn vocabulary. We construct candidate contexts from the Google N-gram corpus. We propose a set of constraints on good contexts, and use them to filter candidate example contexts. We evaluate the automatically generated contexts by comparison to example contexts from children’s dictionaries and from children’s stories.

 


* [IDEC 2009] Reeder, K., Shapiro, J., & Wakefield, J. (2009, July 19-22). A computer based reading tutor for young English language learners: recent research on proficiency gains and affective response. 16th European Conference on Reading and 1st Ibero-American Forum on Literacies, University of Minho, Campus de Gualtar, Braga, Portugal.


[AIED 2009 prosody] Mostow, J., & Duong, M. (2009, July 6-10). Automated Assessment of Oral Reading Prosody. Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED2009), Brighton, UK, 189-196.   Click here for .pdf file.

 

Abstract:  We describe an automated method to assess the expressiveness of children's oral reading by measuring how well its prosodic contours correlate in pitch, intensity, pauses, and word reading times with adult narrations of the same sentences.  We evaluate the method directly against a common rubric used to assess fluency by hand.  We also compare it against manual and automated baselines by its ability to predict fluency and comprehension test scores and gains of 55 children ages 7-10 who used Project LISTEN's Reading Tutor.  It outperforms the human-scored rubric, predicts gains, and could help teachers identify which students are making adequate progress.

 


[AIED 2009 questioning] Mostow, J., & Chen, W. (2009, July 6-10). Generating Instruction Automatically for the Reading Strategy of Self-Questioning. Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED2009), Brighton, UK, 465-472.  Click here for .pdf file.

 

Abstract:  Self-questioning is an important reading comprehension strategy, so it would be useful for an intelligent tutor to help students apply it to any given text. Our goal is to help children generate questions that make them think about the text in ways that improve their comprehension and retention. However, teaching and scaffolding self-questioning involve analyzing both the text and the students’ responses. This requirement poses a tricky challenge to generating such instruction automatically, especially for children too young to respond by typing. This paper describes how to generate self-questioning instruction for an automated reading tutor. Following expert pedagogy, we decompose strategy instruction into describing, modeling, scaffolding, and prompting the strategy. We present a working example to illustrate how we generate each of these four phases of instruction for a given text. We identify some relevant criteria and use them to evaluate the generated instruction on a corpus of 513 children’s stories.

 


[QG 2009 informational] Chen, W., Aist, G., & Mostow, J. (2009, July 6). Generating Questions Automatically from Informational Text. Proceedings of AIED 2009 Workshop on Question Generation, Brighton, UK, 17-24.  Click here for .pdf file.

 

Abstract:  Good readers ask themselves questions during reading.  Our goal is to scaffold this self-questioning strategy automatically to help children in grades 1-3 understand informational text.  In previous work, we showed that instruction for self-questioning can be generated for narrative text.  This paper tests the generality of that approach by applying it to informational text.  We describe the modifications required, and evaluate the approach on informational texts from Project LISTEN's Reading Tutor.

 


[EDM 2009 logging] Mostow, J., & Beck, J. E. (2009, July 1-3). Why, What, and How to Log?  Lessons from LISTEN. Proceedings of the Second International Conference on Educational Data Mining, Córdoba, Spain, 269-278.  Click here for paper as .pdf file.  Click here for poster as .pptx file.

 

Abstract:  The ability to log tutorial interactions in comprehensive, longitudinal, fine-grained detail offers great potential for educational data mining – but what data is logged, and how, can facilitate or impede the realization of that potential.  We propose guidelines gleaned over 15 years of logging, exploring, and analyzing millions of events from Project LISTEN’s Reading Tutor and its predecessors.


 [SSSR 2009 prefixes] Mostow, J., Gates, D., McKeown, M., & Aist, G. (2009). How often are prefixes useful cues to word meaning?  Less than you might think! Sixteenth Annual Meeting of the Society for the Scientific Study of Reading, Boston.  Click here for .ppt file.

Abstract:  We report the frequency and cue validity in WordNet and some large text corpora of several common prefixes often advocated as worth teaching in early grades.  To estimate the cue validity of a prefix to word meaning, e.g. “un-,” to the meaning of over 10,000 distinct words, e.g. “undo” and “uncle,” we computed what percentage of their WordNet definitions contain keywords for the meaning of the prefix, e.g. "cancel," "lack," "no," “not," “opposite," “reverse,” etc.  We analyze the cue validity of each prefix, both overall and how it varies by corpus and by lexical properties such as word frequency, length, part of speech, and whether the remainder of the word is also a word.  This analysis revealed that their utility in deciphering word meaning varies considerably, and is surprisingly poor for some prefixes.  We discuss the implications of these findings for vocabulary instruction in different grades, and for readers at varying levels of sophistication with respect to word structure and word meaning.


[ICTD 2009 Ghana] Mills-Tettey, A., Mostow, J., Dias, M. B., Sweet, T. M., Belousov, S. M., Dias, M. F., & Gong, H. (2009, April 17-19). Improving Child Literacy in Africa: Experiments with an Automated Reading Tutor. 3rd IEEE/ACM International Conference on Information and Communication Technologies and Development (ICTD2009), 129-138. Carnegie Mellon, Doha, Qatar.  Honorable Mention Student Paper Award.  Click here for .pdf file.

 

Abstract:  This paper describes a research endeavor aimed at exploring the role that technology can play in improving child literacy in developing communities. An initial pilot study and subsequent four-month-long controlled field study in Ghana investigated the viability and effectiveness of an automated reading tutor in helping urban children enhance their reading skills in English. In addition to quantitative data suggesting that automated tutoring can be useful for some children in this setting, these studies and an additional preliminary pilot study in Zambia yielded useful qualitative observations regarding the feasibility of applying technology solutions to the challenge of enhancing child literacy in developing communities. This paper presents the findings, observations and lessons learned from the field studies.


 

[IWCS 2009 mental] Chen, W. (2009). Understanding Mental States in Natural Language. Proceedings of the 8th International Workshop on Computational Semantics, Tilburg, Netherlands, 61-72.  Click here for .pdf file.

 

Abstract:  Understanding mental states in narratives is an important aspect of human language comprehension. By “mental states” we refer to beliefs, states of knowledge, points of view, and suppositions, all of which may change over time. In this paper, we propose an approach for automatically extracting and understanding multiple mental states in stories. Our model consists of two parts: (1) a parser that takes an English sentence and translates it to some semantic operations; (2) a mental-state inference engine that reads in the semantic operations and produces a situation model that represents the meaning of the sentence. We present the performance of the system on a corpus of children stories containing both fictional and non-fictional texts.

 


 

[ITS 2008 help] Beck, J. E., Chang, K.-m., Mostow, J., & Corbett, A. (2008, June 23-27). Does help help?  Introducing the Bayesian Evaluation and Assessment methodology. 9th International Conference on Intelligent Tutoring Systems, Montreal, 383-394.  ITS2008 Best Paper Award.  Click here for .pdf file.

 

Abstract:  Most ITS have a means of providing assistance to the student, either on student request or when the tutor determines it would be effective.  Presumably, such assistance is included by the ITS designers since they feel it benefits the students.  However, whether-and how-help helps students has not been a well studied problem in the ITS community.  In this paper we present three approaches for evaluating the efficacy of the Reading Tutor's help:  creating experimental trials from data, learning decomposition, and Bayesian Evaluation and Assessment, an approach that uses dynamic Bayesian networks.  We have found that experimental trials and learning decomposition both find a negative benefit for help--that is, help hurts!  However, the Bayesian Evaluation and Assessment framework finds that help both promotes student long-term learning and provides additional scaffolding on the current problem.  We discuss why these approaches give divergent results, and suggest that the Bayesian Evaluation and Assessment framework is the strongest of the three.  In addition to introducing Bayesian Evaluation and Assessment, a method for simultaneously assessing students and evaluating tutorial interventions, this paper describes how help can both scaffold the current problem attempt as well as teach the student knowledge that will transfer to later problems.

 


 

[ITS 2008 LD] Beck, J. E., & Mostow, J. (2008, June 23-27). How who should practice:  Using learning decomposition to evaluate the efficacy of different types of practice for different types of students. 9th International Conference on Intelligent Tutoring Systems, Montreal, 353-362.  Nominated for Best Paper.  Click here for .pdf file.

 

Abstract:  A basic question of instruction is how much students will actually learn from it.  This paper presents an approach called learning decomposition, which determines the relative efficacy of different types of learning opportunities.  This approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other.  We analyze 346 students reading 6.9 million words and show that different types of practice differ reliably in how efficiently students acquire the skill of reading words quickly and accurately.  Specifically, massed practice is generally not effective for helping students learn words, and rereading the same stories is not as effective as reading a variety of stories.  However, we were able to analyze data for individual student's learning and use bottom-up processing to detect small subgroups of students who did benefit from rereading (11 students) and from massed practice (5 students).  The existence of these has two implications:  1) one size fits all instruction is adequate for perhaps 95% of the student population using computer tutors, but as a community we can do better and 2) the ITS community is well poised to study what type of instruction is optimal for the individual.

 


 

[ITS 2008 compare] Zhang, X., Mostow, J., & Beck, J. E. (2008). A Case Study Empirical Comparison of Three Methods to Evaluate Tutorial Behaviors. 9th International Conference on Intelligent Tutoring Systems, Montreal, 122-131.  Click here for .pdf file.

 

Abstract:  Researchers have used various methods to evaluate the fine-grained interactions of intelligent tutors with their students.  We present a case study comparing three such methods on the same data set, logged by Project LISTEN's Reading Tutor from usage by 174 children in grades 2-4 (typically 7-10 years) over the course of the 2005-2006 school year.  The Reading Tutor chooses randomly between two different types of reading practice.  In assisted oral reading, the child reads aloud and the tutor helps.  In "Word Swap," the tutor reads aloud and the child identifies misread words.  One method we use here to evaluate reading practice is conventional analysis of randomized controlled trials (RCTs), where the outcome is performance on the same words when encountered again later.  The second method is learning decomposition, which estimates the impact of each practice type as a parameter in an exponential learning curve.  The third method is knowledge tracing, which estimates the impact of practice as a probability in a dynamic Bayes net.  The comparison shows qualitative agreement among the three methods, which is evidence for their validity.

 


 

[EDM 2008 freeform] Zhang, X., Mostow, J., Duke, N. K., Trotochaud, C., Valeri, J., & Corbett, A. (2008, June 20-21). Mining Free-form Spoken Responses to Tutor Prompts. Proceedings of the First International Conference on Educational Data Mining, Montreal, 234-241.  Click here for .pdf file.

 

Abstract:  How can an automated tutor assess children's spoken responses despite imperfect speech recognition?  We address this challenge in the context of tutoring children in explicit strategies for reading comprehension.  We report initial progress on collecting, annotating, and mining their spoken responses. Collection and annotation yield authentic but sparse data, which we use to synthesize additional realistic data.  We train and evaluate a classifier to estimate the probability that a response mentions a given target.

 


 

[EDM 2008 analytic] Mostow, J., & Zhang, X. (2008, June 20-21). Analytic Comparison of Three Methods to Evaluate Tutorial Behaviors. Proceedings of the First International Conference on Educational Data Mining, Montreal, 28-37.  Click here for .pdf file.

 

Abstract:  We compare the purposes, inputs, representations, and assumptions of three methods to evaluate the fine-grained interactions of intelligent tutors with their students.  One method is conventional analysis of randomized controlled trials (RCTs).  The second method is learning decomposition, which estimates the impact of each practice type as a parameter in an exponential learning curve.  The third method is knowledge tracing, which estimates the impact of practice as a probability in a dynamic Bayes net.  The comparison leads to a generalization of learning decomposition to account for slips and guesses.

 


 

[IES 2008] Mostow, J., Corbett, A., Valeri, J., Bey, J., Duke, N. K., & Trotochaud, C. (2008, June 10-12). Explicit Comprehension Instruction in an Automated Reading Tutor that Listens:  Year 1 [poster and handout]. IES Third Annual Research Conference, Washington, DC.

 


 

[FLET 2008] Mostow, J. (2008). Experience from a Reading Tutor that listens:  Evaluation purposes, excuses, and methods. In C. K. Kinzer & L. Verhoeven (Eds.), Interactive Literacy Education:  Facilitating Literacy Environments Through Technology, pp. 117-148. New York: Lawrence Erlbaum Associates, Taylor & Francis Group.  Click here to order book from Amazon.com.

Abstract:  This chapter gives three good reasons to evaluate reading software, identifies three methods for doing so, and refutes three excuses for not evaluating – namely, that evaluation is premature, unnecessary, or will be done by others:

(1) Wizard of Oz experiments help test whether (and clarify how) a proposed approach might work, and refute the excuse that evaluation is premature because the approach has not yet been implemented in a proposed system that may take years to develop.

(2) Conventional controlled studies help determine whether an implemented system helps children gain more in reading than they would otherwise.  This criterion is necessary to improve on the status quo, but the difficulty of meeting it refutes the excuse that evaluation is unnecessary due to the supposedly innate superiority of learning on computers, or of a proposed way to use them.

(3) Experiments embedded in an automated tutor help analyze which tutorial actions help which students and words, thereby guiding improvement of the tutor in ways that third party evaluation cannot, thus refuting the excuse that evaluation can be left to others. 

The chapter details some practical lessons learned from designing, performing, and analyzing experiments embedded in Project LISTEN’s school-deployed Reading Tutor, which uses speech recognition to listen to children read aloud, and is helping hundreds of children learn to read. 


[STLL 2008 SC]  Aist, G., & Mostow, J. (2008). Faster, better task choice in a reading tutor that listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech Technologies in Computer Assisted Language Learning:  From Research Toward Practice (pp. 220-240). New York: Routledge.

Abstract:  We analyze the efficiency and effectiveness of task choice in the context of a reading tutor that listens to children read aloud.  We define efficiency as the time to pick a story, and effectiveness in terms of exposing students to new material.  We describe design features we added to improve the Reading Tutor’s efficiency and effectiveness, and evaluate the resulting systems quantitatively, as follows. First, we made the story menu child-friendlier by incorporating two improvements: (a) to support use by nonreaders, the new menu spoke all items on the list; (b) to speed up choice, the new menu required just one click to select an item. Second, we instituted a mixed-initiative story choice policy where the Reading Tutor and the student took turns choosing stories. These improvements made story choice measurably more efficient and effective. 


[STLL 2008 S98]  Mostow, J., Aist, G., Huang, C., Junker, B., Kennedy, R., Lan, H., Latimer, D., O'Connor, R., Tassone, R., Tobin, B., & Wierman, A. (2008). 4-Month evaluation of a learner-controlled Reading Tutor that listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech Technologies in Computer Assisted Language Learning:  From Research Toward Practice (pp. 201-219). New York: Routledge.

 

Abstract:  We evaluated an automated Reading Tutor that let children pick stories to read, and listened to them read aloud. All 72 children in three classrooms (grades 2, 4, 5) were independently tested on the nationally normed Word Attack, Word Identification, and Passage Comprehension subtests of the Woodcock Reading Mastery Test (where they averaged nearly 2 standard deviations below national norms), and on oral reading fluency.  We split each class into 3 matched treatment groups:  Reading Tutor, commercial reading software, or other activities.  In 4 months, the Reading Tutor group gained significantly more in Passage Comprehension than the control group (effect size = 1.2, p=.002) - even though actual usage was a fraction of the planned daily 20-25 minutes.  To help explain these results, we analyzed relationships among gains in Word Attack, Word Identification, Passage Comprehension, and fluency by 108 additional children who used the Reading Tutor in 7 other classrooms (grades 1-4). Gains in Word Identification predicted Passage Comprehension gains only for Reading Tutor users, both in the controlled study (n=21, p=.042, regression coefficient B=.495± s.e. .227) and in the other classrooms (n=108, p=.005, B=.331±.115), where grade was also a significant predictor (p=.024, B=2.575±1.127). 


* [IDEC 2007] Reeder, K., Shapiro, J., & Wakefield, J. (2007, August 5-8). The effectiveness of speech recognition technology in promoting reading proficiency and attitudes for Canadian immigrant children. 15th European Conference on Reading, Humboldt University, Berlin.  Click here for .ppsx Powerpoint presentation.

 

Abstract:  This paper reports on recently-completed Canadian trials of the Reading Tutor, a prototype program that uses advanced speech recognition technology to listen to children read aloud in English. When the program hears the reader experiencing difficulty, it offers help with the goal of enhancing reading fluency, and in turn, comprehension.  We followed 62 Canadian immigrant children in grades 2-7, ages 8 – 13 in three multicultural western Canadian urban elementary schools for 4 to 7 months of daily, 20-minute sessions on the Reading Tutor. Our first goal was to determine the role of English language (L2) proficiency in any reading gains achieved, while controlling for participants’ differing amounts of practice with the software. Our second goal was to describe participants’ attitudes toward, and perceptions of the experience of using the Reading Tutor software.

 

Participants were pre-tested for English language proficiency level and for reading proficiency. At the end of each school’s trial, children were post-tested for reading proficiency, including word recognition, word attack, and word and passage comprehension. The lowest of the three English language proficiency groups showed the strongest reading gains, and did so in ways that reflected specific features of their language development. To assess the attitudinal dimension, we administered a clinical interview to all participants at the conclusion of the trial. We describe children’s perceptions of how the program assisted them in their literate development.

 


* [JECR 2007] Poulsen, R., Wiemer-Hastings, P., & Allbritton, D. (2007). Tutoring Bilingual Students with an Automated Reading Tutor That Listens. Journal of Educational Computing Research, 36(2), 191-221.  Click here for .pdf file.

 

Abstract:  Children from non-English-speaking homes are doubly disadvantaged when learning English in school. They enter school with less prior knowledge of English sounds, word meanings, and sentence structure, and they get little or no reinforcement of their learning outside of the classroom. This article compares the classroom standard practice of sustained silent reading with the Project LISTEN Reading Tutor which uses automated speech recognition to "listen" to children read aloud, providing both spoken and graphical feedback. Previous research with the Reading Tutor has focused primarily on native speaking populations. In this study 34 Hispanic students spent one month in the classroom and one month using the Reading Tutor for 25 minutes per day. The Reading Tutor condition produced significant learning gains in several measures of fluency. Effect sizes ranged from 0.55 to 1.27. These dramatic results from a one-month treatment indicate this technology may have much to offer English language learners.

 


[SLaTE 2007 ASL] Xu, L., Varadharajan, V., Maravich, J., Tongia, R., & Mostow, J. (2007, October 1-3). DeSIGN: An Intelligent Tutor to Teach American Sign Language. SLaTE workshop on Speech and Language Technology for Education, ISCA Tutorial and Research Workshop, The Summit Inn, Farmington, Pennsylvania.  Click here for .pdf file.

 

Abstract:  This paper presents the development of DeSIGN, an educational software application for those deaf students who are taught to communicate using American Sign Language (ASL). The software reinforces English vocabulary and ASL signs by providing two essential components of a tutor, lessons and tests. The current version was designed for 5th and 6th graders, whose literacy skills lag by a grade or more on average. In addition, a game that allows the students to be creative has been integrated into the tests.  Another feature of DeSIGN is its ability to intelligently adapt its tests to the changing knowledge of the student as determined by a knowledge tracing algorithm. A separate interface for the teacher enables additions and modifications to the content of the tutor and provides progress monitoring. These dynamic aspects help motivate the students to use the software repeatedly. This software prototype aims at a feasible and sustainable approach to increase the participation of deaf people in society. DeSIGN has undergone an iteration of testing and is currently in use at a school for the deaf in Pittsburgh.

 


[AIED 2007 motivation] Beck, J. E. (2007, July 9-13). Does learner control affect learning? Proceedings of the 13th International Conference on Artificial Intelligence in Education, Los Angeles, CA, 135-142.  Click here for .pdf file.

 

Abstract:  Many intelligent tutoring systems permit some degree of learner control. A natural question is whether the increased student engagement and motivation such control provides results in additional student learning. This paper uses a novel approach, learning decomposition, to investigate whether students do in fact learn more from a story they select to read than from a story the tutor selects for them. By analyzing 346 students reading approximately 6.9 million words, we have found that students learn approximately 25% more in stories they choose to read, even though from a purely pedagogical standpoint such stories may not be as appropriate as those chosen by the computer. Furthermore, we found that (for our instantiation of learner control) younger students may derive less benefit from learner control than older students, and girls derive less benefit than boys.

 


[AIED 2007 comprehension] Zhang, X., Mostow, J., & Beck, J. E. (2007, July 9-13). Can a Computer Listen for Fluctuations in Reading Comprehension? Proceedings of the 13th International Conference on Artificial Intelligence in Education, Los Angeles, CA, 495-502.  Click here for .pdf file.

 

Abstract:  The ability to detect fluctuation in students' comprehension of text would be very useful for many intelligent tutoring systems. The obvious solution of inserting comprehension questions is limited in its application because it interrupts the flow of reading. To investigate whether we can detect comprehension fluctuations simply by observing the reading process itself, we developed a statistical model of 7805 responses by 289 children in grades 1-4 to multiple-choice comprehension questions in Project LISTEN's Reading Tutor, which listens to children read aloud and helps them learn to read.  Machine-observable features of students' reading behavior turned out to be statistically significant predictors of their performance on individual questions.

 


[EDM 2007 LFA transfer] Leszczenski, J. M., & Beck, J. E. (2007, July 9). What’s in a word? Extending learning factors analysis to modeling reading transfer. Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 31-39.  Click here for .pdf file.

 

Abstract:  Learning Factors Analysis (LFA) has been proposed as a generic solution to evaluate and compare cognitive models of learning [1]. By performing a heuristic search over a space of statistical models, the researcher may evaluate different cognitive representations of a set of skills. We introduce a scalable application of this framework in the context of transfer in reading and demonstrate it upon Reading Tutor data. Using an assumption of a word-level model of learning as a baseline, we apply LFA to determine whether a representation with fewer word independencies will produce a better fit for student learning data. Specifically, we show that representing some groups of words as their common root leads to a better fitting model of student knowledge, indicating that this representation offers more information than merely viewing words as independent, atomic skills. In addition, we demonstrate an approximation to LFA which allows it to scale tractably to large datasets. We find that using a word root-based model of learning leads to an improved model fit, suggesting students make use of this information in their representation of words. Additionally, we present evidence based on both model fit and learning rate relationships that low proficiency students tend to exhibit a lesser degree of transfer through the word root representation than higher proficiency students.

 


[EDM 2007 LD transfer] Zhang, X., Mostow, J., & Beck, J. E. (2007, July 9). All in the (word) family:  Using learning decomposition to estimate transfer between skills in a Reading Tutor that listens. AIED2007 Educational Data Mining Workshop, Marina del Rey, CA.  Click here for .pdf file.

 

Abstract:  In this paper, we use the method of learning decomposition to study students’ mental representations of English words. Specifically, we investigate whether practice on a word transfers to similar words. We focus on the case where similar words share the same root (e.g., “dog” and “dogs”). Our data comes from Project LISTEN’s Reading Tutor during the 2003—2004 school year, and includes 6,213,289 words read by 650 students. We analyze the distribution of transfer effects across students, and identify factors that predict the amount of transfer. The results support some of our hypotheses about learning, e.g., the transfer effect from practice on similar words is greater for proficient readers than for poor readers. More significant than these empirical findings, however, is the novel analytic approach to measure transfer effects.

 


[EDM 2007 Dirichlet] Beck, J. E. (2007, July 9). Difficulties in inferring student knowledge from observations (and why you should care). Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 21-30.  Click here for .pdf file.

 

Abstract:  Student modeling has a long history in the field of intelligent educational software and is the basis for many tutorial decisions. Furthermore, the task of assessing a student’s level of knowledge is a basic building block in the educational data mining process. If we cannot estimate what students know, it is difficult to perform fine-grained analyses to see if a system’s teaching actions are having a positive effect. In this paper, we demonstrate that there are several unaddressed problems with student model construction that negatively affect the inferences we can make. We present two partial solutions to these problems, using Expectation Maximization to estimate parameters and using Dirichlet priors to bias the model fit procedure. Aside from reliably improving model fit in predictive accuracy, these approaches might result in model parameters that are more plausible. Although parameter plausibility is difficult to quantify, we discuss some guidelines and propose a derived measure of predicted number of trials until mastery as a method for evaluating model parameters.

 


[UM 2007] Beck, J. E., & Chang, K.-m. (2007, June 25-29). Identifiability: A Fundamental Problem of Student Modeling.  Proceedings of the 11th International Conference on User Modeling (UM 2007), Corfu, Greece.  Click here for .pdf file.

 

Abstract:  In this paper we show how model identifiability is an issue for student modeling: observed student performance corresponds to an infinite family of possible model parameter estimates, all of which make identical predictions about student performance. However, these parameter estimates make different claims, some of which are clearly incorrect, about the student’s unobservable internal knowledge. We propose methods for evaluating these models to find ones that are more plausible. Specifically, we present an approach using Dirichlet priors to bias model search that results in a statistically reliable improvement in predictive accuracy (AUC of 0.620 ± 0.002 vs. 0.614 ± 0.002). Furthermore, the parameters associated with this model provide more plausible estimates of student learning, and better track with known properties of students’ background knowledge. The main conclusion is that prior beliefs are necessary to bias the student modeling search, and even large quantities of performance data alone are insufficient to properly estimate the model.

 


[ICASSP 2007] Anumanchipalli, G. K., Ravishankar, M., & Reddy, R. (2007, April 15-20). Improving Pronunciation Inference Using N-Best List, Acoustics and Orthography. Proc.  32nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii, Paper 4151.  Click here for .pdf file.

 

Abstract:  In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and phonotactic priors to rescore the hypotheses and refine the baseforms.

 


[IERI 2007] Mostow, J., & Beck, J. (2007). When the Rubber Meets the Road:  Lessons from the In-School Adventures of an Automated Reading Tutor that Listens. In B. Schneider & S.-K. McDonald (Eds.), Scale-Up in Education (Vol. 2, pp. 183-200).  © Rowman & Littlefield Publishers, Lanham, MD.  Click here for .pdf file.

 

Abstract:  Project LISTEN's Reading Tutor (www.cs.cmu.edu/~listen) uses automatic speech recognition to listen to children read aloud, and helps them learn to read.  Its experimental deployment in schools has expanded from a single computer used by eight third graders in one school in 1996 to two hundred computers used by children in grades 1-3 in nine schools in 2003.  This project illustrates how technology can not just scale up an intervention, but instrument its implementation.  For example, analysis of 2002-2003 usage showed that session frequency and duration averaged significantly higher in lab settings than in classrooms.

 


[ICSLP2006] Mostow, J. (2006, September 17-21). Is ASR accurate enough for automated reading tutors, and how can we tell? Ninth International Conference on Spoken Language Processing (Interspeech 2006 — ICSLP), Pittsburgh, PA, 837-840.  Click here for .pdf file.

 

Abstract:  We discuss pros and cons of several ways to evaluate ASR accuracy in automated tutors that listen to students read aloud.  Whether ASR is accurate enough for a particular reading tutor function depends on what ASR-based judgment it requires, the visibility of that judgment to students and teachers, and the amount of input speech on which it is based.  How to tell depends on the purpose, criterion, and space of the evaluation.

 


[AAAI2006 help] Chang, K., Beck, J. E., Mostow, J., & Corbett, A. (2006, July 17). Does Help Help?  A Bayes Net Approach to Modeling Tutor Interventions. AAAI2006 Workshop on Educational Data Mining, Boston, MA.  Click here for .pdf file.

 

Abstract:  This paper describes an effort to measure the effectiveness of tutor help in an intelligent tutoring system. Conventional pre- and post- test experimental methods can determine whether help is effective but are expensive to conduct.  Furthermore, a pre and post- test methodology ignores a source of information: students request help about words they do not know. Therefore, we propose a dynamic Bayes net (which we call the help model) that models tutor help and student knowledge in one coherent framework. The help model distinguishes two different effects of help:  scaffolding immediate performance vs. teaching persistent knowledge that improves long term performance. We train the help model to fit the student performance data gathered from usage of Reading Tutor. The parameters of the trained model suggest that students benefit from both the scaffolding and teaching effects of help. Thus, our framework is able to distinguish two types of influence that help has on the student, and can determine whether help helps learning without an explicit controlled study.

 


[SSSR2006 cloze] Hensler, B. S., & Beck, J. (2006, July 6-8). Are all questions created equal?  Factors that influence cloze question difficulty. Thirteenth Annual Meeting of the Society for the Scientific Study of Reading, Vancouver, BC, Canada.  Click here for .ppt file.

 

Abstract:  The multiple choice cloze (MCC) assessment methodology is widely used in assessing reading comprehension; therefore an improved scoring methodology would have broad impact within the reading research community.  We have constructed an MCC question model that simultaneously estimates the student's comprehension proficiency and the impact of various terms on MCC difficulty. To build the model, we analyzed 16,161 MCC question responses that were administered by a computer reading tutor over the course of a school year.  Participants were 373 students in grades 1 through 6 (ages 5-12) in urban and suburban public schools in Pennsylvania.  Students reading stories on the Reading Tutor were presented with cloze questions with the goal of assessing reading comprehension.  MCC questions were generated randomly by the computer without using a fixed deletion ratio.  A maximum of one word was deleted per sentence, and the distractors were selected from the story being read and were of similar frequency as the deleted target word.  MCC questions and the response choices were read aloud by the computer to the students. 

 

To develop our model of MCC difficulty, we used multinomial logistic regression to calculate the relative impact of a number of factors.  Our model includes the location of the deleted target word within the sentence and question length as covariates.  As factors, we used student identity, reaction time (rounded to the nearest second) and level of difficulty of the target word.  We hypothesized that more proficient readers would use syntactic cues while less proficient readers would not.  To add syntax to the model, we used the TreeTagger part of speech tagger to annotate the part of speech of the correct answer for each cloze question.  We then computed how many of the distractors could have the same part of speech as the answer.  Presumably questions with many distractors able to take on the same part of speech as the answer would be harder.

 

After training the model on our 16,161 MCC questions, there were two main findings.  First, our model found that students who had a second grade reading proficiency (as measured by Woodcock Reading Comprehension Cluster) or higher were sensitive to how many of the possible responses could take on the same part of speech as the correct answer (p= 0.002) for the cloze sentence, while students below second grade proficiency were insensitive to this term (p=0.467).  This result suggests that students' syntactic awareness, at least within the context of MCC questions, begins at around the second grade.  The second main finding was the degree of correlation of each student's Beta parameter, the model's estimate of her ability to answer MCC questions, with her associated Woodcock test score.  The mean within-grade correlation between Beta and the Reading Comprehension Cluster score was 0.69, a very strong fit.

 


[SSSR2006 fluency] Mostow, J. and J. Beck (2006, July 6-8). Refined micro-analysis of fluency gains in a Reading Tutor that listens. Thirteenth Annual Meeting of the Society for the Scientific Study of Reading, Vancouver, BC, Canada.  Click here for .ppt file.

 

Abstract:  Our SSSR2005 talk presented a linear model of speedup in word reading between successive encounters in connected text, based on a quarter of a million such encounters.  The model indicated that reading a word in a new context contributed more to speedup than re-encountering it in an old context, implying that wide reading builds fluency more than rereading.  Our new, improved model uses a growth curve to model word reading time as a function of the number and types of encounters of the word.  This approach lets us  estimate -- both overall and at different reading levels -- the relative value of encountering a word in a new context versus an old one, and for the first time on a given day versus subsequently.

 


[ITS2006 gaming] Baker, R. S. J. d., Corbett, A. T., Koedinger, K. R., Evenson, S., Roll, I., Wagner, A. Z., Naim, M., Raspat, J., Baker, D. J., & Beck, J. E. (2006, June 26-30). Adapting to When Students Game an Intelligent Tutoring System. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 392-401.  Best Paper.  Click here for .pdf file.

 

Abstract:  It has been found in recent years that many students who use intelligent tutoring systems game the system, attempting to succeed in the educational environment by exploiting properties of the system rather than by learning the material and trying to use that knowledge to answer correctly. In this paper, we introduce a system which gives a gaming student supplementary exercises focused on exactly the material the student bypassed by gaming, and which also expresses negative emotion to gaming students through an animated agent. Students using this system engage in less gaming, and students who receive many supplemental exercises have considerably better learning than is associated with gaming in the control condition or prior studies.

 


[ITS2006 BNT-SM] Chang, K., Beck, J., Mostow, J., & Corbett, A. (2006, June 26-30). A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 104-113.  Click here for .pdf file.

 

Abstract:  This paper describes an effort to model a student’s changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowledge.  Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNTSM.  BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox [1]. Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) [2], an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it estimates parameters.

 


[ITS2006 cloze] Hensler, B. S., & Beck, J. (2006, June 26-30). Better student assessing by finding difficulty factors in a fully automated comprehension measure. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 21-30. Nominated for Best Paper.  Click here for .pdf file.

 

Abstract:  The multiple choice cloze (MCC) question format is commonly used to assess students' comprehension. It is an especially useful format for ITS because it is fully automatable and can be used on any text.  Unfortunately, very little is known about the factors that influence MCC question difficulty and student performance on such questions. In order to better understand student performance on MCC questions, we developed a model of MCC questions. Our model shows that the difficulty of the answer and the student’s response time are the most important predictors of student performance. In addition to showing the relative impact of the terms in our model, our model provides evidence of a developmental trend in syntactic awareness beginning around the 2nd grade. Our model also accounts for 10% more variance in students’ external test scores compared to the standard scoring method for MCC questions.

 


[ITS2006 vocabulary] Heiner, C., Beck, J., & Mostow, J. (2006, June 26-30). Automated Vocabulary Instruction in a Reading Tutor. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan.  Click here for .pdf file.

 

Abstract:  This paper presents a within-subject, randomized experiment to compare automated interventions for teaching vocabulary to young readers using Project LISTEN's Reading Tutor. The experiment compared three conditions: no explicit instruction, a quick definition, and a quick definition plus a post-story battery of extended instruction based on a published instructional sequence for human teachers. A month long study with elementary school children indicates that the quick instruction which lasts about seven seconds has immediate effects on learning gains that did not persist. Extended instruction which lasted about thirty seconds longer than the quick instruction had a persistent effect and produced gains on a posttest one week later.

 


[ITS2006 decomposition] Beck, J. (2006, June 26). Using learning decomposition to analyze student fluency development. ITS2006 Educational Data Mining Workshop, Jhongli, Taiwan.  Click here for .pdf file.

 

Abstract:  This paper introduces an approach called learning decomposition to analyze what types of practice are most effective for helping students learn a skill. The approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other. We are able to show that different types of practice differ reliably in how quickly students acquire the skill of reading words quickly and accurately. Specifically, massed practice is generally not effective for helping students learn words, but may be acceptable for less proficient readers. Rereading the same story is generally not as effective as reading a variety of stories, but might be beneficial for more proficient readers.

 


[JNLE2006] Mostow, J. and J. Beck (2006). Some useful tactics to modify, map, and mine data from intelligent tutors. Natural Language Engineering (Special Issue on Educational Applications) 12(2),195-208.  © 2006 Cambridge University Press.  Click here for .pdf file.

 

Abstract:  Mining data logged by intelligent tutoring systems has the potential to discover information of value to students, teachers, authors, developers, researchers, and the tutors themselves -- information that could make education dramatically more effcient, effective, and responsive to individual needs. We factor this discovery process into tactics to modify tutors, map heterogeneous event streams into tabular data sets, and mine them. This model and the tactics identified mark out a roadmap for the emerging area of tutorial data mining, and may provide a useful vocabulary and framework for characterizing past, current, and future work in this area. We illustrate this framework using experiments that tested interventions by an automated reading tutor to help children decode words and comprehend stories.


[IJAIED2006] Beck, J. E., & Sison, J. (2006). Using knowledge tracing in a noisy environment to measure student reading proficiencies. International Journal of Artificial Intelligence in Education, 16, 129-143.  (In Special “Best of ITS 2004” Issue.)  Click here for .pdf file.

Abstract:  Constructing a student model for language tutors is a challenging task. This paper describes using knowledge tracing to construct a student model of reading proficiency and validates the model. We use speech recognition to assess a student’s reading proficiency at a subword level, even though the speech recognizer output is at the level of words and is statistically noisy. Specifically, we estimate the student’s knowledge of 80 letter to sound mappings, such as ch making the sound /K/ in “chemistry.” At a coarse level, the student model did a better job at estimating reading proficiency for 47.2% of the students than did a standardized test designed for the task. Although not quite as strong as the standardized test, our assessment method can provide a report on the student at any time during the year and requires no break from reading to administer. Our model’s estimate of the student’s knowledge on individual letter to sound mappings is a significant predictor of whether he will ask for help on a particular word. Thus, our student model is able to describe student performance both at a coarse- and at a fine-grain size.


[AIED2005 event] Mostow, J., Beck, J., Cen, H., Gouvea, E., & Heiner, C. (2005, July). Interactive Demonstration of a Generic Tool to Browse Tutor-Student Interactions. Interactive Events Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam, 29-32.  Click here for .pdf file.

 

Abstract:  Project LISTEN's Session Browser is a generic tool to browse a database of students' interactions with an automated tutor.  Using databases logged by Project LISTEN's Reading Tutor, we illustrate how to specify phenomena to investigate, explore events and the context where they occurred, dynamically drill down and adjust which details to display, and summarize events in human-understandable form.   The tool should apply to MySQL databases from other tutors as well.


[AIED2005 browser] Mostow, J., Beck, J., Cuneo, A., Gouvea, E., & Heiner, C. (2005, July 18-22). A Generic Tool to Browse Tutor-Student Interactions:  Time Will Tell! Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam, 884-886.  Click here for .pdf file.

 

Abstract:  A basic question in mining data from an intelligent tutoring system is, "What happened when…?"  A generic tool to answer such questions should let the user specify which phenomenon to explore; explore selected events and the context in which they occurred; and require minimal effort to adapt the tool to new versions, to new users, or to other tutors.  We describe an implemented tool and how it meets these requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time.  It infers the implicit hierarchical structure of tutorial interaction so humans can browse it. A companion paper [1] illustrates the use of this tool to explore data from Project LISTEN's automated Reading Tutor.


[AIED2005 interruption] Heiner, C., Beck, J., & Mostow, J. (2005, July 18-22). When do students interrupt help?  Effects of individual differences.  Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam, 819-826.  Note:  This paper was accepted as a poster, but due to a publishing error, the printed proceedings include the original submitted version instead of the 3-page revised version.  Click here for 3-page accepted version.  Click here for 8-page published version.

 

Abstract. When do students interrupt help to request different help? To study this question, we analyze a within-subject experiment in the 2003-2004 version of Project LISTEN's Reading Tutor. From 168,983 trials of this experiment, we report patterns in when students choose to interrupt help. To improve model fit for individual data, we adjust our model to account for individual differences. We report small but significant correlations between a student parameter in our model and gender as well as external measures of motivation and academic performance.

 


[AIED2005 engagement] Beck, J. (2005, July 18-22). Engagement tracing:  using response times to model student disengagement. Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam, 88-95.  Click here for .pdf file.

 

Abstract:  Time on task is an important predictor for how much students learn.  However, students must be focused on the learning for the time invested to be productive.  Unfortunately, students do not always try their hardest to solve problems presented by computer tutors.  This paper explores student disengagement and proposes an approach, engagement tracing, for detecting whether a student is engaged in answering questions.  This model is based on item response theory, and uses as input the difficulty of the question, how long the student took to respond, and whether the response was correct.  From these data, the model determines the probability a student was actively engaged in trying to answer the question.  The model has a reliability of 0.95, and its estimate of student engagement correlates at 0.25 with student gains on external tests.  Finally, the model is sensitive enough to detect variations in student engagement within a single tutoring session.  The novel aspect of this work is that it requires only data normally collected by a computer tutor, and the affective model is validated against student performance on an external measure.  


[AIED2005 ASR] Beck, J. E., Chang, K., Mostow, J., & Corbett, A. (2005, July 19). Using a student model to improve a computer tutor's speech recognition. Proceedings of the AIED 05 Workshop on Student Modeling for Language Tutors, 12th International Conference on Artificial Intelligence in Education, Amsterdam, 2-11.  Click here for .pdf file.

 

Abstract:  Intelligent computer tutors can derive much of their power from having a student model that describes the learner’s competencies.  However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input.  This paper reports using ASR output from a computer tutor for reading to compare two models of how students learn to read words:  a model that assumes students learn words as whole-unit chunks, and a model that assumes students learn the individual letteràsound mappings that make up words.  We use the data collected by the ASR to show that a model of letteràsound mappings better describes student performance.  We then compare using the student model and the ASR, both alone and in combination, to predict which words the student will read correctly, as scored by a human transcriber.  Surprisingly, majority class has a higher classification accuracy than the ASR.  However, we demonstrate that the ASR output still has useful information, and that classification accuracy is not a good metric for this task, and the Area Under Curve (AUC) of ROC curves is a superior scoring method.  The AUC of the student model is statistically reliably better (0.670 vs. 0.550) than that of the ASR, which in turn is reliably better than majority class.  These results show that ASR can be used to compare theories of how students learn to read words, and modeling individual learner’s proficiencies may enable improved speech recognition.


[AIED 2005 model] Chang, K.., Beck, J. E., Mostow, J., & Corbett, A. (2005, July 19). Using speech recognition to evaluate two student models for a reading tutor. Proceedings of the AIED 05 Workshop on Student Modeling for Language Tutors, 12th International Conference on Artificial Intelligence in Education, Amsterdam, 12-21.  Click here for .pdf file.

 

Abstract:  Intelligent Tutoring Systems derive much of their power from having a student model that describes the learner's competencies. However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input, due to inherent inaccuracies in ASR. We describe two extremely simplified models of developing word decoding skills and explore whether there is sufficient information in ASR output to determine which model fits student performance better, and under what circumstances one model is preferable to another.

 

The two models that we describe are a lexical model that assumes students learn words as whole-unit chunks, and a grapheme-to-phoneme (G-to-P) model that assumes students learn the individual letter-to-sound mappings that compose the words. We use the data collected by the ASR to show that the G-to-P model better describes student performance than the lexical model. We then determine which model performs better under what conditions. On one hand, the G-to-P model better correlates with student performance data when the student is older or when the word is more difficult to read or spell. On the other hand, the lexical model better correlates with student performance data when the student has seen the word more times.


[AAAI 2005 workshop] Beck, J. (Ed.). (2005, July 10). Proceedings of the AAAI2005 Workshop on Educational Data Mining. Pittsburgh, PA.


[AAAI2005 browser] Mostow, J., Beck, J., Cen, H., Cuneo, A., Gouvea, E., & Heiner, C. (2005, July 10). An Educational Data Mining Tool to Browse Tutor-Student Interactions:  Time Will Tell! Proceedings of the Workshop on Educational Data Mining, National Conference on Artificial Intelligence, Pittsburgh, 15-22.  Click here for .pdf file.

Abstract:  A basic question in mining data from an intelligent tutoring system is, "What happened when…?"  We identify requirements for a tool to help answer such questions by finding occurrences of specified phenomena and browsing them in human-understandable form.  We describe an implemented tool and how it meets the requirements.  The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time.  It automatically computes and displays the temporal hierarchy implicit in this representation.  We illustrate the use of this tool to mine data from Project LISTEN's automated Reading Tutor.


[AAAI2005 usage]  Arnold, A., Scheines, R., Beck, J. E., & Jerome, B. (2005, July 10). Time and attention:  students, sessions, and tasks. Proceedings of the AAAI2005 Workshop on Educational Data Mining, Pittsburgh, PA, 62-66.  Click here for .pdf file.

Abstract:  Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty.  We computed the time spent on each page by each student during each session they were logged in.  We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning pages (over their whole course experience) was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty.  The page itself was highly predictive, but so was the average time spent on learning pages in a given session.  This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits.  We also consider the average time spent on learning pages as a function of the time of semester.  Students spent less time on pages later in the semester, even for more demanding material.


[SSSR 2005] Mostow, J., & Beck, J. (2005). Micro-analysis of fluency gains in a Reading Tutor that listens:  Wide vs. repeated guided oral reading.  Talk at Twelfth Annual Meeting of the Society for the Scientific Study of Reading. Toronto.  Click here to download PowerPoint presentation.

Abstract:  Fluency growth is essential but imperfectly understood.  By using automatic speech recognition to listen to children read aloud, Project LISTEN's Reading Tutor provides a novel instrument to study fluency development.  During the 2002-2003 school year, hundreds of children in grades 1-4 used the Reading Tutor, which recorded them reading millions of words of text.  The latency preceding each word reflects the reader’s cognitive effort to identify the word.  Using automatic speech recognition to analyze latency changes between successive encounters of words in the same or different contexts provides new data about how fluency grows.


* [Toronto 2005] Cunningham, T., & Geva, E. (2005, June 24). The effects of reading technologies on literacy development of ESL students [poster presentation]. Twelfth Annual Meeting of the Society for the Scientific Study of Reading, Toronto.

 


* [UBC 2005] Reeder, K., Early, M., Kendrick, M., Shapiro, J., & Wakefield, J. (2005, April). The Role of L1 in Young Multilingual Readers' Success With a Computer-Based Reading Tutor. Fifth International Symposium on Bilingualism, Barcelona, Spain.

 


[AERA 2005] Beck, J. E., & Mostow, J. (2005). Mining Data from Randomized Within-Subject Experiments in an Automated Reading Tutor (poster in session 34.080, "Logging Students' Learning in Complex Domains:  Empirical Considerations and Technological Solutions"). American Educational Research Association 2005 Annual Meeting:  Demography and Democracy in the Era of Accountability, Montreal, Canada.  Click here to download PowerPoint poster.

Abstract:  Experiments embedded in the Reading Tutor help evaluate its decisions in tutoring decoding, vocabulary, and comprehension.



[Kant masters thesis] Kant, P. M. (2004). The Influence of Teachers' Perceptions on Usage of an Educational Technology:  A study of Project LISTEN's Reading Tutor. Unpublished Master's Thesis, University of Pittsburgh, Pittsburgh, PA.  Click here to download .pdf file.

Abstract: This study looked at factors influencing teachers’ perception and usage of Project LISTEN’s Reading Tutor, a computerized tutor used with elementary students in 9 classroom-based, 10 computer lab-based, and 3 specialist-room school settings.  Thirteen interviews and 22 survey responses (of a possible 28 teachers) examined teachers’ perception of the Reading Tutor and suggested that teachers’ belief in the Tutor influenced their usage of it (r = .46, p < .03).  Three factors seemed to influence teacher belief: 1) perceived ease of use (r = .52, p < .01), 2) teachers’ reported experience with computers (r = .41, p < .04) and instructional technology (r = .48, p < .03), and 3) perceived technical problems such as frequency of technical problems (r = -.44, p < .04) and speed with which problems were fixed (r = .49, p < .02).  Analysis of these factors suggested four themes that cut-across factors and seem to influence the way teachers evaluate and use the Reading Tutor – the technology’s degree of convenience, competition from other educational priorities and practices, teacher experience and/or interest with technology, and data available to teachers and the way teachers prioritize that data.  These results suggest that improving convenience of the Reading Tutor, instituting specialized training programs, and improving feedback mechanisms for teachers by providing relevant, situated data may influence teacher belief in the Reading Tutor and thereby increase teacher usage.  This study contributes to current literature on educational technology usage by supporting previous literature suggesting that teacher belief in the importance of a technology influences their use of it.  One unique feature of this study is that is uses both quantitative and qualitative methods to look at the research questions from two different research perspectives.



* [ESL 2004] Poulsen, R. (2004). Tutoring Bilingual Students With an Automated Reading Tutor That Listens:  Results of a Two-Month Pilot Study. Unpublished Masters Thesis, DePaul University, Chicago, ILClick here to download .pdf file.

Abstract:  A two-month pilot study comprised of 34 second through fourth grade Hispanic students from four bilingual education classrooms was conducted to compare the efficacy of the 2004 version of the Project LISTEN Reading Tutor against the standard practice of sustained silent reading (SSR).  The Reading Tutor uses automated speech recognition to listen to children read aloud.  It provides both spoken and graphical feedback in order to assist the children with the oral reading task.  Prior research with this software has demonstrated its efficacy within populations of native English speakers.  This study was undertaken to obtain some initial indication as to whether the tutor would also be effective within a population of English language learners. 

The study employed a crossover design where each participant spent one month in each of the treatment conditions.  The experimental treatment consisted of 25 minutes per day using the Reading Tutor within a small pullout lab setting.  Control treatment consisted of the students who remained in the classroom where they participated in established reading instruction activities.  Dependent variables consisted of the school districts curriculum based measures for fluency, sight word recognition and comprehension.

The Reading Tutor group out-gained the control group in every measure during both halves of the crossover experiment.  Within subject results from a paired T-Test indicate these gains were significant for one sight word measure (p = .056) and both fluency measures (p < .001).  Effect sizes were 0.55 for timed sight words, a robust 1.16 for total fluency and an even larger 1.27 for fluency controlled for word accuracy.  These dramatic results observed during a one-month treatment indicate this technology may have much to offer English language learners.



[TICL questions] Mostow, J., Beck, J., Bey, J., Cuneo, A., Sison, J., Tobin, B., & Valeri, J. (2004). Using automated questions to assess reading comprehension, vocabulary, and effects of tutorial interventions. Technology, Instruction, Cognition and Learning, 2, 97-134.  Click here to download .pdf file.

Abstract:  We describe the automated generation and use of 69,326 comprehension cloze questions and 5,668 vocabulary matching questions in the 2001-2002 version of Project LISTEN's Reading Tutor used by 364 students in grades 1-9 at seven schools.  To validate our methods, we used students' performance on these multiple-choice questions to predict their scores on the Woodcock Reading Mastery Test.  A model based on students' cloze performance predicted their Passage Comprehension scores with correlation R=.85.  The percentage of vocabulary words that students matched correctly to their definitions predicted their Word Comprehension scores with correlation R=.61.

We used both types of questions in a within-subject automated experiment to compare four ways to preview new vocabulary before a story - defining the word, giving a synonym, asking about the word, and doing nothing.  Outcomes included comprehension as measured by performance on multiple-choice cloze questions during the story, and vocabulary as measured by matching words to their definitions in a posttest after the story.  A synonym or short definition significantly improved posttest performance compared to just encountering the word in the story - but only for words students didn't already know, and only if they had a grade 4 or better vocabulary.  Such a preview significantly improved performance during the story on cloze questions involving the previewed word - but only for students with a grade 1-3 vocabulary.


 

[TICL fluency] Beck, J. E., Jia, P., & Mostow, J. (2004). Automatically assessing oral reading fluency in a computer tutor that listens. Technology, Instruction, Cognition and Learning, 2, 61-81.  Click here to download .pdf file.

Abstract:  Much of the power of a computer tutor comes from its ability to assess students.  In some domains, including oral reading, assessing the proficiency of a student is a challenging task for a computer.  Our approach for assessing student reading proficiency is to use data that a computer tutor collects through its interactions with a student to estimate his performance on a human-administered test of oral reading fluency.   A model with data collected from the tutor's speech recognizer output correlated, within-grade, at 0.78 on average with student performance on the fluency test.  For assessing students, data from the speech recognizer were more useful than student help-seeking behavior.  However, adding help-seeking behavior increased the average within-grade correlation to 0.83.  These results show that speech recognition is a powerful source of data about student performance, particularly for reading.


 

[ITS 2004 tracing] Beck, J. E., & Sison, J. (2004, September 1-3). Using knowledge tracing to measure student reading proficiencies. Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 624-634.  Maceio, Brazil.  (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html. Click here to download .pdf file.

Abstract:  Constructing a student model for language tutors is a challenging task.  This paper describes using knowledge tracing to construct a student model of reading proficiency and validates the model.  We use speech recognition to assess a student’s reading proficiency at a subword level, even though the speech recognizer output is at the level of words.  Specifically we estimate the student’s knowledge of 80 letter to sound mappings, such as ch making the sound /K/ in “chemistry.”  At a coarse level, the student model did a better job at estimating reading proficiency for 47.2% of the students than did a standardized test designed for the task.  Our model’s estimate of the student’s knowledge on individual letter to sound mappings is a significant predictor in whether he will ask for help on a particular word.  Thus, our student model is able to describe student performance both at a coarse- and at a fine-grain size. 


 

[ITS 2004 questions] Beck, J. E., Mostow, J., & Bey, J. (2004, September 1-3). Can automated questions scaffold children's reading comprehension? Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 478-490.  Maceio, Brazil.  (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html. Click here to download .pdf file.

Abstract: Can automatically generated questions scaffold reading comprehension?  We automated three kinds of multiple-choice questions in children’s assisted reading:
1.    Wh- questions:  ask a generically worded What/Where/When question.
2.    Sentence prediction:  ask which of three sentences belongs next.
3.    Cloze:  ask which of four words best fills in a blank in the next sentence. 

A within-subject experiment in the spring 2003 version of Project LISTEN’s Reading Tutor randomly inserted all three kinds of questions during stories as it helped children read them.  To compare their effects on story-specific comprehension, we analyzed 15,196 subsequent cloze test responses by 404 children in grades 1-4.
·    Wh- questions significantly raised children’s subsequent cloze performance.
·    This effect was cumulative over the story rather than a recency effect.
·    Sentence prediction questions probably helped (p = .07).
·    Cloze questions did not improve performance on later questions.
·    The rate of hasty responses rose over the year.
·    Asking a question less than 10 seconds after the previous question increased the likelihood of the student skipping the question or giving a hasty response.
The results show that a computer can scaffold a child’s comprehension of a given text without understanding the text itself, provided it avoids irritating the student.


 

[ITS 2004 disengagement] Beck, J. E. (2004, August 31). Using response times to model student disengagement. Proceedings of the ITS2004 Workshop on Social and Emotional Intelligence in Learning Environments, Maceio, Brazil, 13-20.  Click here to download .pdf file.

Abstract:  Time on task is an important variable for learning a skill.  However, learners must be focused on the learning for the time invested to be productive.  Unfortunately, students do not always try their hardest to solve problems presented by computer tutors.  This paper explores student disengagement and proposes a model for detecting whether a student is engaged in answering questions.  This model is based on item response theory, and uses as input the difficulty of the question, how long the student took to respond, and whether the response was correct.  From these data, the model determines the probability a student was actively engaged in trying to answer the question.  To validate our model, we analyze 231 students’ interactions with the 2002-2003 version of the Reading Tutor.  We show that disengagement is better modeled by simultaneously estimating student proficiency and disengagement than just estimating disengagement alone.  Our best model of disengagement has a correlation of -0.25 with student learning gains.  The novel aspect of this work is that it requires only data normally collected by a computer tutor, and the affective model is validated against student performance on an external measure.


 

[ITS 2004 mining] Mostow, J. (2004, August 30). Some useful design tactics for mining ITS data. Proceedings of the ITS2004 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, Maceió, Alagoas, Brazil, 20-28. Click here to download .pdf file.   Click here to download Powerpoint presentation.

Abstract:  Mining data logged by intelligent tutoring systems has the potential to reveal valuable discoveries.  What characteristics make such data conducive to mining?  What variables are informative to compute?  Based on our experience in mining data from Project LISTEN’s Reading Tutor, we discuss how to collect machine-analyzable data and formulate it into experimental trials.  The resulting concepts and tactics mark out a roadmap for the emerging area of tutorial data mining, and may provide a useful vocabulary and framework for characterizing past, current, and future work in this area.


 

[ITS 2004 lessons] Heiner, C., Beck, J., & Mostow, J. (2004, August 30). Lessons on using ITS data to answer educational research questions. Proceedings of the ITS2004 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, Maceio, Brazil, 1-9.  Click here to download .pdf file.

Abstract:  Some tutoring system projects have completed empirical studies of student-tutor interaction by manually collecting data while observing fewer than a hundred students.  Analyzing larger, automatically collected data sets requires new methods to address new problems.  We share lessons on design, analysis, presentation, and iteration.  Our lessons are based on our experience analyzing data from Project LISTEN’s Reading Tutor, which automatically collected tutorial data from hundreds of students.  We hope that these lessons will help guide analysis of similar datasets from other intelligent tutoring systems.


 

[ACL 2004 keynote] Mostow, J. (2004, July 22). If I Have a Hammer:  Computational Linguistics in a Reading Tutor that Listens [Invited keynote address]. 42nd Annual Meeting of the Association of Computational Linguistics (ACL-EACL 2004), Barcelona, SpainClick here to download Powerpoint presentation.

Abstract:  Project LISTEN’s Reading Tutor uses speech recognition to listen to children read aloud, and helps them learn to read, as evidenced by rigorous evaluations of pre- to posttest gains compared to various controls.  In the 2003-2004 school year, children ages 5-14 used the Reading Tutor daily at school on over 200 computers, logging over 50,000 sessions, 1.5 million tutorial responses, and 10 million words.

This talk uses the Reading Tutor to illustrate the diverse roles that computational linguistics can play in an intelligent tutor:
·    A domain model describes a skill to learn, such as mapping from spelling to pronunciation.
·    A production model predicts student behavior, such as likely oral reading mistakes.
·    A language model predicts likely word sequences for a given task, such as oral reading.
·    A student model estimates a student’s skills, such as mastery of grapheme-to-phoneme mappings.
·    A pedagogical model guides tutorial decisions, such as choosing words a student is ready to try.

A recurring theme is the use of “big data” to train such models automatically.


 

[SSSR 2004 help] Mostow, J., Beck, J. E., & Heiner, C. (2004). Which Help Helps?  Effects of Various Types of Help on Word Learning in an Automated Reading Tutor that Listens. Eleventh Annual Meeting of the Society for the Scientific Study of Reading, Amsterdam, The NetherlandsClick here to download Powerpoint presentation.

Abstract:  When a tutor gives help on a word during assisted oral reading, how does the type of help matter?  We report an automated, within-subject, randomized-trial experiment embedded in Project LISTEN's Reading Tutor.  Hundreds of children (mostly in grades 1-3) used the Reading Tutor in 2002-2003, reading millions of words and getting help on hundreds of thousands of them.   The experimental variable was the type of help, selected randomly by the Reading Tutor whenever it gave help on a word.  The outcome variable was student performance on the next encounter of the word.  We compare effects of several types of help.


 

[SSSR 2004 interventions] Beck, J. E., Sison, J., & Mostow, J. (2004, June 27-30). Using automated speech recognition to measure scaffolding and learning effects of word identification interventions in a computer tutor that listens. Eleventh Annual Meeting of the Society for the Scientific Study of Reading, Amsterdam, The Netherlands. Click here to download Powerpoint poster.

Abstract:  Does it help to provide brief word identification assistance to students?  On words they encounter soon afterwards?  Does brief assistance lead to long-term learning gains?  Which types of assistance are best?  We have explored these questions using automated experiments in a computer tutor for reading that listens.   We examine data from 300 students, mostly in grades 1 through 3.  The major results were a definite scaffolding effect in student performance on the same day as they were given assistance.  However, although there was a slight improvement in longer-term performance, the difference was not statistically significant.


 

[ICALL 2004] Heiner, C., Beck, J. E., & Mostow, J. (2004, June 17-19). Improving the Help Selection Policy in a Reading Tutor that Listens. Proceedings of the InSTIL/ICALL Symposium on NLP and Speech Technologies in Advanced Language Learning Systems, Venice, Italy, 195-198.  Click here to download .pdf file.

Abstract:  What type of oral reading assistance is most effective for a given student on a given word? We analyze 189,039 randomized trials of a within-subject experiment to compare the effects of several types of help in the 2002-2003 version of Project LISTEN’s Reading Tutor.  The independent variable is the type of help given on a word.  The outcome variable is the student’s performance at the next encounter of that word, as measured by automatic speech recognition. Training a help selection policy sensitive to student or word level improves this outcome by a projected 4% – a substantial effect for picking a single better intervention.


 

[CALICO 2004] Beck, J. E., & Sison, J. (2004, June 8-12). Automated student assessment in language tutors. CALICO, Pittsburgh, PA.

Abstract:  The Reading Tutor is a computer tutor that uses Automated Speech Recognition (ASR) technology to listen to children read aloud and helps them learn how to read.  The research reported here uses ASR output to predict students' GORT fluency posttest scores. Using a linear regression model, we achieved correlations of over .80 for predicting first through fourth graders' performance.  Our model's predictive ability is on par with standard public school reading assessment measures. This work contributes to a better understanding of automated student assessment in language tutors and introduces methods for accounting for noisy ASR output.


 

[IJAIE 2004] Murray, R. C., VanLehn, K., & Mostow, J. (2004). Looking Ahead to Select Tutorial Actions: A Decision-Theoretic Approach. International Journal of Artificial Intelligence in Education, 14, 235-278.  Download paper as .pdf file.

Abstract:  We propose and evaluate a decision-theoretic approach for selecting tutorial actions by looking ahead to anticipate their effects on the student and other aspects of the tutorial state. The approach uses a dynamic decision network to consider the tutor’s uncertain beliefs and objectives in adapting to and managing the changing tutorial state. Prototype action selection engines for diverse domains – calculus and elementary reading – illustrate the approach. These applications employ a rich model of the tutorial state, including attributes such as the student’s knowledge, focus of attention, affective state, and next action(s), along with task progress and the discourse state. Our action selection engines have not yet been integrated into complete ITSs (this is the focus of future work), so we use simulated students to evaluate their capability to select rational tutorial actions that emulate the behaviors of human tutors. We also evaluate their capability to select tutorial actions quickly enough for real-world tutoring applications.


 

[ICAAI 2003] Banerjee, S., Mostow, J., Beck, J., & Tam, W. (2003, December 15-16). Improving Language Models by Learning from Speech Recognition Errors in a Reading Tutor that Listens. Proceedings of the Second International Conference on Applied Artificial Intelligence, Fort Panhala, Kolhapur, IndiaDownload paper as .pdf file.

Abstract:  Lowering the perplexity of a language model does not always translate into higher speech recognition accuracy. Our goal is to improve language models by learning from speech recognition errors. In this paper we present an algorithm that first learns to predict which n-grams are likely to increase recognition errors, and then uses that prediction to improve language models so that the errors are reduced. We show that our algorithm reduces a measure of tracking error by more than 24% on unseen test data from a Reading Tutor that listens to children read aloud. 


 

[CSMP 2003] Mostow, J., & Beck, J. (2003, November 3-4). When the Rubber Meets the Road:  Lessons from the In-School Adventures of an Automated Reading Tutor that Listens. Conceptualizing Scale-Up: Multidisciplinary Perspectives, Park Hyatt Hotel, Washington, D.C. 

Abstract:  Project LISTEN's Reading Tutor (www.cs.cmu.edu/~listen) uses automatic speech recognition to listen to children read aloud, and helps them learn to read.  Its experimental deployment in schools has expanded from a single computer used by eight third graders in one school in 1996 to two hundred computers used by children in grades 1-3 in nine schools in 2003.  This project illustrates how technology can not just scale up an intervention, but instrument its implementation.  For example, analysis of 2002-2003 usage showed that session frequency and duration averaged significantly higher in lab settings than in classrooms.



[Eurospeech 2003 miscues]  Banerjee, S., Beck, J., & Mostow, J. (2003, September 1-4). Evaluating the Effect of Predicting Oral Reading Miscues. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, 3165-3168.  Download paper in .pdf format.

Abstract:  This paper extends and evaluates previously published methods for predicting likely miscues in children's oral reading in a Reading Tutor that listens. The goal is to improve the speech recognizer's ability to detect miscues but limit the number of "false alarms" (correctly read words misclassified as incorrect). The "rote" method listens for specific miscues from a training corpus. The "extrapolative" method generalizes to predict other miscues on other words. We construct and evaluate a scheme that combines our rote and extrapolative models. This combined approach reduced false alarms by 0.52% absolute (12% relative) while simultaneously improving miscue detection by 1.04% absolute (4.2% relative) over our existing miscue prediction scheme.



[Eurospeech 2003 confidence] Tam, Y.-C., Mostow, J., Beck, J., & Banerjee, S. (2003, September 1-4). Training a Confidence Measure for a Reading Tutor that Listens. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, 3161-3164.  Download paper in Postscript format.

Abstract:  One issue in a Reading Tutor that listens is to determine which words the student read correctly.  We describe a confidence measure that uses a variety of features to estimate the probability that a word was read correctly.  We trained two decision tree classifiers.  The first classifier tries to fix insertion and substitution errors made by the speech decoder, while the second classifier tries to fix deletion errors.  By applying the two classifiers together, we achieved a relative reduction in false alarm rate by 25.89% while holding the miscue detection rate constant.



[AIED 2003] Beck, J. E., Mostow, J., Cuneo, A., & Bey, J. (2003, July 20-24). Can automated questioning help children's reading comprehension? Proceedings of the Tenth International Conference on Artificial Intelligence in Education (AIED2003), Sydney, Australia. Download paper in pdf format.

Abstract:  We present an automated method to ask children questions during assisted reading, and experimentally evaluate its effects on their comprehension.  In 2002, after a randomly inserted generic multiple-choice What/Where/When question, children were likelier to correctly answer an automatically generated comprehension question on a later sentence.  The positive effects of such questions vanished during the second half of the study in 2003.  We hypothesize why.



[AIED 2003 event]  Mostow, J., & Beck, J. E. (2003, July 20-24). Project LISTEN's Reading Tutor:  Interactive Event Description. Supplemental  Proceedings of the Tenth International Conference on Artificial Intelligence in Education (AIED2003), Sydney, AustraliaDownload paper in pdf format

Abstract:  This interactive event demonstrates various aspects of Project LISTEN’s Reading Tutor, which listens to children read aloud, and helps them learn to read.



[UM 2003 persistence] Mostow, J., Beck, J. E., & Valeri, J. (2003, June 22). Can Automated Emotional Scaffolding Affect Student Persistence?  A Baseline Experiment. Workshop on "Assessing and Adapting to User Attitudes and Affect: Why, When and How?" at the 9th International Conference on User Modeling (UM'03), Johnstown, PA. Download paper in pdf format.

Abstract:  A 2002 Wizard of Oz study showed that emotional scaffolding provided by a human significantly increased children’s persistence in an automated Reading Tutor, as measured by the number of tasks they chose to undertake. We report a 5,965-trial experiment to test a simple automated form of such scaffolding, compared to a control condition without it.  348 children in grades K-4 spent significantly longer per task in the experimental condition due to a design flaw, yet still averaged equal numbers of tasks in both conditions.  We theorize that they subjectively gauged effort in terms of number of tasks rather than number or duration of solution attempts.



[UM 2003 predict] Beck, J. E., Jia, P., Sison, J., & Mostow, J. (2003, June 22-26). Predicting student help-request behavior in an intelligent tutor for reading. Proceedings of the 9th International Conference on User Modeling, Johnstown, PA.  Download paper in pdf format.

Abstract:  This paper describes our efforts at constructing a fine-grained student model in Project LISTEN’s intelligent tutor for reading.  Reading is different from most domains that have been studied in the intelligent tutoring community, and presents unique challenges.  Constructing a model of the user from voice input and mouse clicks is difficult, as is constructing a model when there is not a well-defined domain model.  We use a database describing student interactions with our tutor to train a classifier that predicts whether students will click on a particular word for help with 83.2% accuracy.  We have augmented the classifier with features describing properties of the word’s individual graphemes, and discuss how such knowledge can be used to assess student skills that cannot be directly measured.



[UM 2003 assess] Beck, J. E., Jia, P., & Mostow, J. (2003, June 22-26). Assessing student proficiency in a Reading Tutor that listens. Proceedings of the 9th International Conference on User Modeling, Johnstown, PA.  Download paper in pdf format.

Abstract:   This paper reports results on using data mining to extract useful variables from a database that contains interactions between the student and Project LISTEN’s Reading Tutor.  Our approach is to find variables we believe to be useful in the information logged by the tutor, and then to derive models that relate those variables to student’s scores on external, paper-based tests of reading proficiency.  Once the relationship between the recorded variables and the paper tests is discovered, it is possible to use information recorded by the tutor to assess the student’s current level of proficiency.  The major results of this work were the discovery of useful features available to the Reading Tutor that describe students, and a strong predictive model of external tests that correlates with actual test scores at 0.88.



[JECR 2003] Mostow, J., Aist, G., Burkhead, P., Corbett, A., Cuneo, A., Eitelman, S., Huang, C., Junker, B., Sklar, M. B., & Tobin, B. (2003). Evaluation of an automated Reading Tutor that listens:  Comparison to human tutoring and classroom instruction. Journal of Educational Computing Research, 29(1), 61-117. Download paper in MS Word .doc format.

Abstract:  A year-long study of 131 second and third graders in 12 classrooms compared three daily 20-minute treatments. (a) 58 students in 6 classrooms used the 1999-2000 version of Project LISTEN’s Reading Tutor, a computer program that uses automated speech recognition to listen to a child read aloud, and gives spoken and graphical assistance.  Students took daily turns using one shared Reading Tutor in their classroom while the rest of their class received regular instruction.  (b) 34 students in the other 6 classrooms were pulled out daily for one-on-one tutoring by certified teachers.  To control for materials, the human tutors used the same set of stories as the Reading Tutor.  (c) 39 students served as in-classroom controls, receiving regular instruction without tutoring.  We compared students’ pre- to post-test gains on the Word Identification, Word Attack, Word Comprehension, and Passage Comprehension subtests of the Woodcock Reading Mastery Test, and in oral reading fluency. 

Surprisingly, the human-tutored group significantly outgained the Reading Tutor group only in Word Attack (main effects p<.02, effect size .55).  Third graders in both the computer- and human-tutored conditions outgained the control group significantly in Word Comprehension (p<.02, respective effect sizes .56 and .72) and suggestively in Passage Comprehension (p=.14, respective effect sizes .48 and .34).  No differences between groups on gains in Word Identification or fluency were significant.  These results are consistent with an earlier study in which students who used the 1998 version of the Reading Tutor outgained their matched classmates in Passage Comprehension (p=.11, effect size .60), but not in Word Attack, Word Identification, or fluency. 

To shed light on outcome differences between tutoring conditions and between individual human tutors, we compared process variables.  Analysis of logs from all 6,080 human and computer tutoring sessions showed that   human tutors included less rereading and more frequent writing than the Reading Tutor. Micro-analysis of 40 videotaped sessions showed that students who used the Reading Tutor spent considerable time waiting for it to respond, requested help more frequently, and picked easier stories when it was their turn.  Human tutors corrected more errors, focussed more on individual letters, and provided assistance more interactively, for example getting students to sound out words rather than sounding out words themselves as the Reading Tutor did. 



[SSSR 2003] Mostow, J., Beck, J., Bey, J., Cuneo, A., Sison, J., & Tobin, B. (2003, June 12-15). An Embedded Experiment to Evaluate the Effectiveness of Vocabulary Previews in an Automated Reading Tutor. Talk presented at Tenth Annual Meeting of the Society for Scientific Studies of Reading, Boulder, CO.  Download Powerpoint presentation.

Abstract:  When does taking time to preview a new word before reading a story improve vocabulary and comprehension more than encountering the word in context?  To address this question, the 2001-2002 version of Project LISTEN's Reading Tutor embedded an automated experiment to compare three types of vocabulary preview -- defining the word, giving a synonym, or just asking about the word -- and a control condition.  Outcomes included within-story comprehension as measured by performance on multiple-choice cloze questions, and post-story vocabulary as measured by matching words to their definitions.  We analyze results based on thousands of randomized trials.


[ICMI 2002 emotional]  Aist, G., Kort, B., Reilly, R., Mostow, J., & Picard, R. (2002, October 14-16). Experimentally Augmenting an Intelligent Tutoring System with Human-Supplied Capabilities:  Adding Human-Provided Emotional Scaffolding to an Automated Reading Tutor that Listens. Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002), Pittsburgh, PA, 483-490.  Revised version of paper first presented at ITS 2002 Workshop on Empirical Methods for Tutorial Dialogue Systems, San Sebastian, Spain.  Download paper in pdf format.

 

Abstract:  This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system.  We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support – but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too.

 


 

[ICMI 2002] Mostow, J., Beck, J., Chalasani, R., Cuneo, A., & Jia, P. (2002, October 14-16). Viewing and Analyzing Multimodal Human-computer Tutorial Dialogue:   A Database Approach. Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002), Pittsburgh, PA.  Revised version of paper first presented at ITS 2002 Workshop on Empirical Methods for Tutorial Dialogue Systems, San Sebastian, Spain. Download paper in pdf format.
 

Abstract:  It is easier to record logs of multimodal human-computer tutorial dialogue than to make sense of them.  In the 2000-2001 school year, we logged the interactions of approximately 400 students who used Project LISTEN’s Reading Tutor and who read aloud over 2.4 million words.  This paper discusses some difficulties we encountered converting the logs into a more easily understandable database.  It is faster to write SQL queries to answer research questions than to analyze complex log files each time.  The database also permits us to construct a viewer to examine individual Reading Tutor-student interactions.  This combination of queries and viewable data has turned out to be very powerful, and we discuss how we have combined them to answer research questions. 



[ICSLP 2002]  Mostow, J., Beck, J., Winter, S. V., Wang, S., & Tobin, B. (2002, September 16-20). Predicting oral reading miscues. Seventh International Conference on Spoken Language Processing (ICSLP-02), Denver, CO. Download paper in pdf format.

Abstract:  This paper explores the problem of predicting specific reading mistakes, called miscues, on a given word.  Characterizing likely miscues tells an automated reading tutor what to anticipate, detect, and remediate.  As training and test data, we use a database of over 100,000 miscues transcribed by University of Colorado researchers.  We explore approaches that exploit different sources of predictive power:  the uneven distribution of words in text, and the fact that most miscues are real words.  We compare the approaches’ ability to predict miscues of other readers on other text.  A simple rote method does best on the most frequent 100 words of English, while an extrapolative method for predicting real-word miscues performs well on less frequent words, including words not in the training data. 



[ETS 2002] Aist, G. (2002). Helping Children Learn Vocabulary During Computer-Assisted Oral Reading. Educational Technology and Society, 5(2).  On-line at http://ifets.ieee.org/periodical/vol_2_2002/aist.html

Abstract:  This paper addresses an indispensable skill using a unique method to teach a critical component:  helping children learn to read by using computer-assisted oral reading to help children learn vocabulary. We build on Project LISTEN’s Reading Tutor, a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). To learn a word from reading with the Reading Tutor, students must encounter the word and learn the meaning of the word in context. We modified the Reading Tutor first to help students encounter new words and then to help them learn the meanings of new words.  We then compared the Reading Tutor to classroom instruction and to human-assisted oral reading as part of a yearlong study with 144 second and third graders. The result: Second graders did about the same on word comprehension in all three conditions. However, third graders who read with the 1999 Reading Tutor, modified as described in this paper, performed statistically significantly better than other third graders in a classroom control on word comprehension gains – and even comparably with other third graders who read one-on-one with human tutors. 



[SSSR 2002]  Mostow, J., Aist, G., Bey, J., Burkhead, P., Cuneo, A., Junker, B., Rossbach, S., Tobin, B., Valeri, J., & Wilson, S. (2002, June 27-30). Independent practice versus computer-guided oral reading: Equal-time comparison of sustained silent reading to an automated reading tutor that listens. Ninth Annual Meeting of the Society for the Scientific Study of Reading, Chicago, Illinois. Download paper in pdf format.  Download Powerpoint presentation.

Abstract:  A 7-month study of 178 students in grades 1-4 at two schools compared two daily 20-minute treatments. 88 students did Sustained Silent Reading (SSR) in their classrooms.  90 students in 10-computer labs used the 2000-2001 version of Project LISTEN’s Reading Tutor (RT), which uses speech recognition to listen to a child read aloud, and responds with spoken and graphical assistance (www.cs.cmu.edu/~listen).  The RT group significantly outgained their statistically matched SSR classmates in phonemic awareness, rapid letter naming, word identification, word comprehension, passage comprehension, fluency, and spelling – especially in grade 1, where effect sizes for these skills ranged from .20 to .72. 



[ITS 2002 time]  Mostow, J., Aist, G., Beck, J., Chalasani, R., Cuneo, A., Jia, P., & Kadaru, K. (2002, June 5-7). A La Recherche du Temps Perdu , or As Time Goes By: Where does the time go in a Reading Tutor that listens? Sixth International Conference on Intelligent Tutoring Systems (ITS'2002), Biarritz, France. Download paper in pdf format. (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html

Abstract: Analyzing the time allocation of students’ activities in a school-deployed mixed initiative tutor can be illuminating but surprisingly tricky.  We discuss some complementary methods that we have used to understand how tutoring time is spent, such as analyzing sample videotaped sessions by hand, and querying a database generated from session logs.  We identify issues, methods, and lessons that may be relevant to other tutors.  One theme is that iterative design of “non-tutoring” components can enhance a tutor’s effectiveness, not by improved teaching, but by reducing the time wasted on non-learning activities.  Another is that it is possible to relate student’s time allocation to improvements in various outcome measures. 



[ITS 2002 simulated] Beck, J. E. (2002). Directing Development Effort with Simulated Students. Sixth International Conference on Intelligent Tutoring Systems (ITS'2002), Biarritz, France. Download paper in pdf format.  (c) Springer-Verlag athttp://www.springer.de/comp/lncs/index.html

Abstract:  Our goal is to find a methodology for directing development effort in an intelligent tutoring system (ITS). Given that ITS have several AI reasoning components, as well as content to present, evaluating them is a challenging task. Due to these difficulties, few evaluation studies to measure the impact of individual components have been performed. Our architecture evaluates the efficacy of each component of an ITS and considers the impact of a particular teaching goal when determining whether a particular component needs improving. For our AnimalWatch tutor, we found that for certain goals the tutor itself, rather than its reasoning components, needed improvement. We have found that it is necessary to know what the system’s teaching goals are before deciding which component is the limiting factor on performance.  [Based on Dr. Beck's research at University of Massachusetts Amherst prior to joining Project LISTEN.]



[ITS 2002 poster] Aist, G., Kort, B., Reilly, R., Mostow, J., & Picard, R. (2002, June 5-7). Adding Human-Provided Emotional Scaffolding to an Automated Reading Tutor that Listens Increases Student Persistence [Poster]. Sixth International Conference on Intelligent Tutoring Systems (ITS'2002), Biarritz, France, 992. Download paper in pdf format.  (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html



[EMTDS 2002 scaffolding] Aist, G., Kort, B., Reilly, R., Mostow, J., & Picard, R. (2002, June 4). Experimentally Augmenting an Intelligent Tutoring System with Human-Supplied Capabilities:  Adding Human-Provided Emotional Scaffolding to an Automated Reading Tutor that Listens. ITS 2002 Workshop on Empirical Methods for Tutorial Dialogue Systems, San Sebastian, Spain. Download paper in pdf format.

Abstract: This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system.  We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support – but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too 



[EMTDS 2002 viewer] Mostow, J., Beck, J., Chalasani, R., Cuneo, A., & Jia, P. (2002, June 4). Viewing and Analyzing Multimodal Human-computer Tutorial Dialogue:   A Database Approach. ITS 2002 Workshop on Empirical Methods for Tutorial Dialogue Systems, San Sebastian, SpainDownload paper in .doc format.  Download paper in pdf format.

Abstract:  It is easier to record logs of multimodal human-computer tutorial dialogue than to make sense of them.  This paper discusses some of the problems in extracting useful information from such logs and the difficulties we encountered in converting the logs into a more easily understandable database.  Once log files are parsed into a database, it is possible to write SQL queries to answer research questions faster than analyzing complex log files each time.  The database permits us to construct a viewer to examine individual Reading Tutor-student interactions.  This combination of queries and viewable data has turned out to be very powerful.  We provide examples of questions that can be answered by each technique as well as how to use them together. 



[CVDA 2002 comprehension] Mostow, J., Tobin, B., & Cuneo, A. (2002, June 3). Automated Comprehension Assessment in a Reading Tutor. ITS 2002 Workshop on Creating Valid Diagnostic Assessments, San Sebastian, Spain, pp. 52-63. Download paper in pdf format.

Abstract:  Can vocabulary and comprehension assessments be generated automatically for a given text? We describe the automated method used to generate, administer, and score multiple-choice vocabulary and comprehension questions in the 2001-2002 version of Project LISTEN’s Reading Tutor. To validate the method against the Woodcock Reading Mastery Test, we analyzed 69,326 multiple-choice cloze items generated in the course of regular Reading Tutor use by 364 students in grades 1-9 at seven schools.  Correlation between predicted and actual scores reached R=.85 for Word and Passage Comprehension. 


[CVDA 2002 latency] Jia, P., Beck, J. E., & Mostow, J. (2002, June 3). Can a Reading Tutor that Listens use Inter-word Latency to Assess a Student's Reading Ability? ITS 2002 Workshop on Creating Valid Diagnostic Assessments, San Sebastian, Spain, pp. 23-32. Download paper in pdf format.

Abstract:  This paper describes our use of inter-word latency, the delay before a student speaks a word in the course of reading a sentence aloud, to assess oral reading automatically. The context of our study is a Reading Tutor that uses automated speech recognition to listen to children read aloud. Using data from 58 students in grades 1 through 4, we used inter-word latency to predict scores on external, individually administered, paper-based tests.  Correlation between predicted and actual test scores exceeded .7 for fluency, word attack, word identification, word comprehension, and passage comprehension.  Compared with paper-based tests, this evaluation method is much cheaper, based on computer-guided oral reading recorded in the course of regular tutor use, and invisible to students.  It has the potential to provide continuous assessment of student progress, both to report to teachers and to guide its own tutoring. 


[IRA 2002 award] Aist, G. (2002, April 29). Helping Children Learn Vocabulary during Computer-Assisted Oral Reading:  A Dissertation Summary [Poster presented as a Distinguished Finalist for the Outstanding Dissertation of the Year Award]. 47th Annual Convention of the International Reading Association, San Francisco, CA


[IJAIED 2001] Aist, G.  Towards automatic glossarization: automatically constructing and administering vocabulary assistance factoids and multiple-choice assessment. International Journal of Artificial Intelligence in Education (2001) 12, 212-231. Download from IJAIE website.

Abstract: We address an important problem with a novel approach: helping children learn words during computer-assisted oral reading. We build on Project LISTEN's Reading Tutor, which is a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). In this paper, we focus on the problem of vocabulary acquisition. To learn a word from reading with the Reading Tutor, students must first encounter the word and then learn the meaning of the word from context. This paper describes how we modified the Reading Tutor to help students learn the meanings of new words by augmenting stories with WordNet-derived comparisons to other words – "factoids". Furthermore, we report results from an embedded experiment designed to evaluate the effectiveness of including factoids in stories that children read with the Reading Tutor. Factoids helped – not for all students and all words, but for third graders seeing rare words, and for single sense rare words tested one or two days later. We also discuss further steps towards automatic construction of explanations of words. 


[FF 2001]  Mostow, J., and Aist, G.  Evaluating tutors that listen: An overview of Project LISTEN.  In (K. Forbus and P. Feltovich, Eds.) Smart Machines in Education, pp. 169-234.  MIT/AAAI Press, 2001. Order book from AAAI Press.


[DYD 2001] Aist, G. Towards Worldwide Literacy: Technological Affordances, Economic Challenges, Affordable Technology. Development by Design: Workshop on Collaborative Open Source Design of Appropriate Technologies.  MIT Media Lab, Cambridge, Massachusetts, July 22, 2001. Download paper in pdf format.


[NAACL 2001] Jack Mostow, Greg Aist, Juliet Bey, Paul Burkhead, Andrew Cuneo, Susan Rossbach, Brian Tobin, Joe Valeri, and Sara Wilson.  A hands-on demonstration of Project LISTEN’s Reading Tutor and its embedded experiments.  Refereed demo presented at Language Technologies 2001:  The Second Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA, June 2001. Download paper in pdf format.

Abstract:  Project LISTEN’s Reading Tutor helps children learn to read.  It uses speech recognition to listen to them read aloud, and responds with spoken and graphical feedback. The demonstration lets attendees try out this interaction themselves.  Besides the spoken tutorial dialog, features shown include an automated tutorial for new users, interactive activities that combine assisted reading with other types of steps, and automated field studies to evaluate the efficacy of alternative tutorial interventions by embedding experiments within the Reading Tutor. 


[WTDS 2001 DT] Murray, R. Charles, Van Lehn, Kurt, and Mostow, Jack.  A Decision-Theoretic Approach for Selecting Tutorial Discourse Actions. In Proceedings of the NAACL 2001 Workshop on Adaptation in Dialogue Systems, Pittsburgh, PA, June 2001. Download paper in pdf format.


[WTDS 2001 DTa] Murray, R. Charles, Van Lehn, Kurt, and Mostow, Jack.  A Decision-Theoretic Architecture for Selecting Tutorial Discourse Actions. In Proceedings of the AIED-2001 Workshop on Tutorial Dialog Systems, San Antonio, Texas, May 2001, pp. 35-46. Download paper in pdf format.

Abstract:  We propose a decision-theoretic architecture for selecting tutorial discourse actions. DT Tutor, an action selection engine which embodies our approach, uses a dynamic decision network to consider the tutor’s objectives and uncertain beliefs in adapting to the changing tutorial state. It predicts the effects of the tutor’s discourse actions on the tutorial state, including the student’s internal state, and then selects the action with maximum expected utility. We illustrate our approach with prototype applications for diverse domains: calculus problem-solving and elementary reading. Formative off-line evaluations assess DT Tutor’s ability to select optimal actions quickly enough to keep a student engaged. 


 [AIED 2001 poster] Mostow, J., Aist, G. S., Burkhead, P., Corbett, A., Cuneo, A., Eitelman, S., Huang, C., Junker, B., Platz, C., Sklar, M. B., and Tobin, B.  A controlled evaluation of computer- versus human-assisted oral reading.  In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial Intelligence in Education:  AI-ED in the Wired and Wireless Future, pp. 586-588.  Amsterdam:  IOS Press.  Presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference, San Antonio, Texas, May 2001. Download paper in pdf format.

Abstract:  A year-long study of 144 second and third graders compared outcomes (gains in test scores) and process variables (e.g. words read) for Project LISTEN’s Reading Tutor, human tutors, and a classroom control. Human tutors beat the Reading Tutor only in word attack.  Both beat the control in grade 3 word comprehension. 


[AIED 2001 pause video]  Jack Mostow, Cathy Huang, and Brian Tobin.  Pause the Video: Quick but quantitative expert evaluation of tutorial choices in a Reading Tutor that listens.  In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial Intelligence in Education:  AI-ED in the Wired and Wireless Future, pp. 343-353.  Amsterdam:  IOS Press.  Presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference,San Antonio, Texas, May 2001. Download paper in pdf format.

Abstract:  To critique Project LISTEN’s automated Reading Tutor, we adapted a panel-of-judges methodology for evaluating expert systems. Three professional elementary educators watched 15 video clips of the Reading Tutor listening to second and third graders read aloud. Each expert chose which of 10 interventions to make in each situation. To keep the Reading Tutor’s choice from influencing the expert, we paused each video clip just before the Reading Tutor intervened.  After the expert responded, we played back what the Reading Tutor had actually done.  The expert then rated its intervention compared to hers. 

Although the experts seldom agreed, they rated the Reading Tutor’s choices as better than their own in 5% of the cases, equally good in 36%, worse but OK in 41%, and inappropriate in only 19%.  The lack of agreement and the surprisingly favorable ratings together suggest that either the Reading Tutor’s choices were better than we thought, the experts knew less than we hoped, or the clips showed less than they should. 


[AIED 2001 miscue mining] James Fogarty, Laura Dabbish, David Steck, and Jack Mostow.  Mining a database of reading mistakes: For what should an automated Reading Tutor listen?  In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial Intelligence in Education:  AI-ED in the Wired and Wireless Future, pp. 422-433.  Amsterdam:  IOS Press.  Presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference,San Antonio, Texas, May 2001. Download paper in pdf format.

Abstract:  Using a machine learning approach to mine a database of over 70,000 oral reading mistakes transcribed by University of Colorado researchers, we generated 225 rules based on graphophonemic context to predict the frequency of the 71 most common decoding errors in mapping graphemes to phonemes. To evaluate their generality, we tested how well they predicted the frequency of the same decoding errors for different readers on different text.  We achieved .473 correlation between predicted and actual frequencies, compared to .350 correlation for context-independent versions of the same rules.  These rules may help an automated reading tutor listen better to children reading aloud. 


[AIED 2001 vocabulary gains] Aist, G. S., Mostow, J., Tobin, B., Burkhead, P., Corbett, A., Cuneo, A., Junker, B.,  and  Sklar, M. B. 
Computer-assisted oral reading helps third graders learn vocabulary better than a classroom control – about as well as one-on-one human-assisted oral reading.  In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial Intelligence in Education:  AI-ED in the Wired and Wireless Future, pp. 267-277.  Amsterdam:  IOS Press.  Presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference,San Antonio, Texas, May 2001. Download paper in pdf format.

Abstract:  We describe results on helping children learn vocabulary during computer-assisted oral reading. This paper focuses on one aspect – vocabulary learning – of a larger study comparing computerized oral reading tutoring to classroom instruction and one-on-one human tutoring. 144 students in second and third grade were assigned to one of three conditions: (a) classroom instruction, (b) classroom instruction with one-on-one tutoring replacing part of the school day, and (c) computer instruction replacing part of the school day. For second graders, there were no significant differences between treatments in word comprehension gains. For third graders, however, the computer tutor showed an advantage over classroom instruction for gains in word comprehension (p = 0.042, effect size = 0.56) as measured by the Woodcock Reading Mastery Test. One-on-one human tutoring also showed an advantage over classroom instruction alone (p = 0.039, effect size = 0.72). Computer tutoring and one-on-one human tutoring were not significantly different in terms of word comprehension gains. 


[AIED 2001 factoids]  Gregory S. Aist.  Factoids: Automatically constructing and administering vocabulary assistance and assessment.  In J. D. Moore, C. L. Redfield, and W. L. Johnson (Eds.), Artificial Intelligence in Education:  AI-ED in the Wired and Wireless Future, pp. 234-245.  Amsterdam:  IOS Press.  Presented at the Tenth Artificial Intelligence in Education (AI-ED) Conference, San Antonio, Texas, May 2001. Download paper in pdf format.

Abstract: We address an important problem with a novel approach:  helping children learn words during computer-assisted oral reading. We build on Project LISTEN's Reading Tutor, which is a computer program that adapts automatic speech recognition to listen to children read aloud, and helps them learn to read (http://www.cs.cmu.edu/~listen). In this paper, we focus on the problem of vocabulary acquisition. To learn a word from reading with the Reading Tutor, students must first encounter the word and then learn the meaning of the word from context. This paper describes how we modified the Reading Tutor to help students learn the meanings of new words by augmenting stories with WordNet-derived comparisons to other words – “factoids”. Furthermore, we report results from an embedded experiment designed to evaluate the effectiveness of including factoids in stories that children read with the Reading Tutor. Factoids helped – not for all students and all words, but for third graders seeing rare words, and for single-sense rare words tested one or two days later. 


[2001 PhD] Aist, G. 2001. Helping Children Learn Vocabulary during Computer-Assisted Oral Reading. Ph.D. dissertation, Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Download thesis in pdf format.


[AAAI 2000 SA] Aist, G. Identifying words to explain to a reader: A preliminary study.  Student Abstract and Poster, Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), p. 1061.  Austin, TX, August 2000. Download paper in pdf format.


[AAAI 2000 DC] Aist, G.  Helping children learn vocabulary during computer assisted oral reading.  SIGART/AAAI Doctoral Consortium, Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), pp. 1100-1101.  Austin, TX, July 2000. Download .ps file.


[HMC 2000] Aist, G.  Taking Turns Talking About Text in a Reading Tutor that Listens.  Proceedings of the Third Workshop On Human-Machine Conversation. Grand Hotel Villa Serbelloni, Bellagio, Italy, July 12-14, 2000. Download paper in pdf format.

Abstract:  In this paper we report on ongoing work on turn-taking in Project LISTEN's Reading Tutor (Mostow & Aist CALICO 1999). Project LISTEN’s Reading Tutor listens to children read aloud and helps them learn to read. The Reading Tutor’s repertoire of turn-taking behaviors includes not only alternating turns, but also backchanneling, interrupting, and prompting. 


[ITS 2000 YR]  Aist, G.  An informal model of vocabulary acquisition during assisted oral reading and some implications for computerized instruction.  In R. Nkambou (Ed.), ITS'2000 Young Researchers Track Proceedings, pp. 22-24.  Fifth International Conference on Intelligent Tutoring Systems.  Montreal, Canada, June 2000.  Download paper in pdf format.  (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html


[ITS 2000 PA] Aist, G. and Mostow, J.  Improving story choice in a reading tutor that listens.  Proceedings of the Fifth International Conference on Intelligent Tutoring Systems (ITS’2000), p. 645.  Montreal, Canada, June 2000.  Poster Abstract. Download paper in pdf format.  (c) Springer-Verlag at http://www.springer.de/comp/lncs/index.html.


[ITS 2000 HT] Aist, G.  Human Tutor and Computer Tutor Story Choice in Listening to Children Read Aloud.  In B. du Boulay (Ed.), Proceedings of the ITS'2000 Workshop on Modeling Human Teaching Tactics and Strategies, pp. 8-10.  Fifth International Conference on Intelligent Tutoring Systems.  Montreal, Canada, June 2000. Download paper in pdf format.

Abstract: A preliminary report on a comparison of human tutor story choice and mixed-initiative computer tutor story choice in Project LISTEN's Reading Tutor. 


[ITS 2000 ML] Aist, G. and Mostow, J.  Using Automated Within-Subject Invisible Experiments to Test the Effectiveness of Automated Vocabulary Assistance. In Joseph Beck (Ed.), Proceedings of ITS'2000 Workshop on Applying Machine Learning to ITS Design/Construction, pp. 4-8.  Fifth International Conference on Intelligent Tutoring Systems.  Montreal, Canada, June 2000. Download paper in pdf format.

Abstract: Machine learning offers the potential to allow an intelligent tutoring system to learn effective tutoring strategies. A necessary prerequisite to learning an effective strategy is being able to automatically test a strategy's effectiveness. We conducted an automated, within-subject “invisible experiment” to test the effectiveness of a particular form of vocabulary instruction in a Reading Tutor that listens. Both conditions were in the context of assisted oral reading with the computer. The control condition was encountering a word in a story. The experimental condition was first reading a short automatically generated "factoid" about the word, such as "cheetah can be a kind of cat. Is it here?" and then reading the sentence from the story containing the target word. The initial analysis revealed no significant difference between the conditions. Further inspection revealed that sometimes students benefited from receiving help on "hard" or infrequent words. Designing, implementing, and analyzing this experiment shed light not only on the particular vocabulary help tested, but also on the machine-learning-inspired methodology we used to test the effectiveness of this tutorial action. 


[ESCA 99] Aist, G. and Mostow, J.  Measuring the Effects of Backchanneling in Computerized Oral Reading Tutoring. Proceedings of the ESCA Workshop on Prosody and Dialog.  Eindhoven, Netherlands, September 1999. Download paper in pdf format.

Abstract:  What is the effect of back channeling on human-computer dialog, and how should such effects be measured?  We present experiments designed to evaluate the immediate effects of back channeling on computer-assisted oral reading tutoring.  These experiments are implemented in a reading tutor that listens to children read aloud, and helps them learn to read.  As a byproduct of designing, conducting, and evaluating these experiments, we are able to describe some unique methodological challenges in evaluating the effects of low-level turn taking dialog behavior. 


[USPTO 99] Mostow, J. and Aist, G. Reading and Pronunciation Tutor.  United States Patent No. 5,920,838.  Filed June 2, 1997; issued July 6, 1999.   US Patent and Trademark Office.

Abstract:  A computer implemented reading tutor comprises a player for outputting a response. An input block implementing a plurality of functions such as silence detection, speech recognition, etc. captures the read material. A tutoring function compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player. A quality control function evaluates the captured read material and stores the captured material in the knowledge base under certain conditions. An auto enhancement function uses information available to the tutor to create additional resources such as identifying rhyming words, words with common roots, etc., which can be used as responses. 


[AAAI99] Mostow, J. and Aist, G.  Authoring New Material in a Reading Tutor that Listens.  Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, July 1999, pp. 918-919.  In the refereed Intelligent Systems Demonstration track.  Also presented at 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), College Park, MD, June, 1999. Download paper in pdf format.

Abstract:  Project LISTEN’s Reading Tutor helps children learn to read by providing assisted practice in reading connected text.  A key goal is to provide assistance for reading any English text entered by students or adults.  This live demonstration shows how the Reading Tutor helps users enter and narrate stories, and then helps children read them. 


[CALICO99] Mostow, J. and Aist, G.  Giving Help and Praise in a Reading Tutor with Imperfect Listening – Because Automated Speech Recognition Means Never Being Able to Say You’re Certain. CALICO Journal16:3, 407-424.  Special issue (M. Holland, Ed.), Tutors that Listen:  Speech recognition for Language Learning, 1999. Download paper in pdf format.

Abstract:  Human tutors make use of a wide range of input and output modalities, such as speech, vision, gaze, and gesture. Computer tutors are typically limited to keyboard and mouse input. Project LISTEN’s Reading Tutor uses speech recognition technology to listen to children read aloud and help them. Why should a computer tutor listen? A computer tutor that listens can give help and praise naturally and unobtrusively. We address the following questions: When and how should a computer tutor that listens help students? When and how should it praise students? We examine how the advantages and disadvantages of speech recognition technology helped shape the design and implementation of the Reading Tutor.  Despite its limitations, this technology enables the Reading Tutor to provide patient, unobtrusive, and natural assistance for reading aloud. 


[SRinCALL]  G. Aist. Speech recognition in computer assisted language learning. In K. C. Cameron (ed.), Computer Assisted Language Learning (CALL): Media, Design, and Applications. Lisse: Swets & Zeitlinger, 1999. 


[CHI99] G. Aist. Skill-specific spoken dialogs in a reading tutor that listens. Doctoral Consortium paper.  In Proceedings of the Conference on Human Factors in Computing Systems:  CHI 99 Extended Abstracts, pp. 55-56.  Pittsburgh, PA, May 15 - 20, 1999. Download paper in pdf format.


[LIS99] Mostow, J. (ed.), McClelland, J., Fiez, J., McCandliss, B., Plaut, D., and Schneider, W.  Poster and short presentation at the NSF Learning & Intelligent Systems Principal Investigators' meeting, Washington, DC, May, 1999.  At http://www.cnbc.cmu.edu/collaborative/lisweb/ppt/index.htm.  In J. McClelland (PI), Intervention Strategies that Promote Learning: Their Basis and Use in Enhancing Literacy, at http://www.cnbc.cmu.edu/collaborative/lisweb


[HCIGW99 CRLT] Mostow, J.  Collaborative Research on Learning Technologies: An Automated Reading Assistant That Listens.  Proceedings of the NSF Human-Computer Interaction Grantees Workshop (HCIGW99), Orlando, FL, February, 1999.  At http://nsf-workshop.engr.ucf.edu/reports.asp under "2. Speech and Natural Language Understanding". 


[HCIGW99 IS] Mostow, J.  Guiding Spoken Dialogue with Computers by Responding to Prosodic Cues.  Proceedings of the NSF Human Computer Interaction Grantees Workshop (HCIGW99), Orlando, FL, February, 1999.  At http://nsf-workshop.engr.ucf.edu/reports.asp under "2. Speech and Natural Language Understanding". 


[ICSLP98 acoustic] Aist, G., Chan, P., Huang, X. D., Jiang, L., Kennedy, R., Latimer, D., Mostow, J., and Yeung, C. How effective is unsupervised data collection for children's speech recognition?  International Conference on Speech and Language Processing (ICSLP98).  Sydney, Australia, December, 1998. Click here for .PDF.

Abstract: Children present a unique challenge to automatic speech recognition. Today’s state-of-the-art speech recognition systems still have problems handling children’s speech because acoustic models are trained on data collected from adult speech. In this paper we describe an inexpensive way to mend this problem. We collected children’s speech when they interact with an automated reading tutor. These data are subsequently transcribed by a speech recognition system and automatically filtered. We studied how to use these automatically collected data to improve children’s speech recognition system’s performance. Experiments indicate that automatically collected data can reduce the error rate significantly on children’s speech. 


[ICLSP98 architecture] Aist, G.  Expanding A Time-Sensitive Conversational Architecture For Turn-Taking To Handle Content-Driven Interruption. International Conference on Speech and Language Processing (ICSLP98).  Sydney, Australia, December, 1998. Download paper in pdf format.

Abstract: Turn taking in spoken language systems has generally been push-to-talk or strict alternation (user speaks, system speaks, user speaks, …) with some systems such as telephone-based systems handling barge-in (interruption by the user.)  In this paper we describe our time sensitive conversational architecture for turn taking that not only allows alternating turns and barge in, but other conversational behaviors as well.  This architecture allows back channeling, prompting the user by taking more than one turn if necessary, and overlapping speech.  The architecture is implemented in a Reading Tutor that listens to children read aloud, and helps them. We extended this architecture to allow the Reading Tutor to interrupt the student based on a non-self-corrected mistake – “content-driven interruption”. To the best of our knowledge, the Reading Tutor is thus the first spoken language system to intentionally interrupt the user based on the content of the utterance. 


[AAAI AMLDP 98] G. Aist and J. Mostow. Estimating the Effectiveness of Conversational Behaviors in a Reading Tutor that Listens. AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Stanford, CA, March 1998.  Reprinted in Proceedings of the Conference on Automated Learning and Discovery (CONALD98), June 11-13, 1998, Carnegie Mellon University, Pittsburgh, PA. Download paper in pdf format.

Abstract: Project LISTEN's Reading Tutor listens to children read aloud, and helps them learn to read. Besides user satisfaction, a primary criterion for tutorial spoken dialogue agents should be educational effectiveness. In order to learn to be more effective, a spoken dialogue agent must be able to evaluate the effect of its own actions. When evaluating the effectiveness of individual actions, rather than comparing a conversational action to "nothing," an agent must compare it to reasonable alternative actions. We describe a methodology for analyzing the immediate effect of a conversational action, and some of the difficulties in doing so. We also describe some preliminary results on evaluating the effectiveness of conversational behaviors in a reading tutor that listens. 


[AAAI IE 98] J. Kominek, G. Aist, and J. Mostow. When Listening Is Not Enough: Potential Uses of Vision for a Reading Tutor that Listens. AAAI Spring Symposium on Intelligent Environments, Stanford, CA, March 1998, pp. 161-167.  Reprinted in Proceedings of the Conference on Automated Learning and Discovery (CONALD98), June 11-13, 1998, Carnegie Mellon University, Pittsburgh, PA. Download paper in pdf format.

Abstract: Speech offers a powerful avenue between user and computer. However, if the user is not speaking, or is speaking to someone else, what is the computer to make of it? Project LISTEN's Reading Tutor is speech-aware software that strives to teach children to read. Because it is useful to know what the child is doing when reading, we are investigating some potential uses of computer vision. By recording and analyzing video of the Tutor in use, we measured the frequency of events that cannot be detected by speech alone. These include how often the child is visually distracted, and how often the teacher or another student provides assistance. This information helps us assess how vision might enhance the effectiveness of the Reading Tutor. 


[AAAI CAHM 97] G. S. Aist and J. Mostow. A time to be silent and a time to speak: Time-sensitive communicative actions in a reading tutor that listens. AAAI Fall Symposium on Communicative Actions in Humans and Machines. Boston, MA, November, 1997. Not for citation. Download paper in pdf format.

Abstract: Timing is important in discourse, and key in tutoring. Communicative actions that are too late or too early may be infelicitous. How can an agent engage in temporally appropriate behavior? We present a domain-independent architecture that models elapsed time as a critical factor in understanding the discourse. Our architecture also allows for "invisible experiments" where the agent varies its behavior and studies the effects of its behavior on the discourse. This architecture has been instantiated and is in use in an oral reading tutor that listens to children read aloud and helps them. 


[PUI 97] G. S. Aist and J. Mostow. When Speech Input is Not an Afterthought: A Reading Tutor that Listens. Proceedings of the Workshop on Perceptual User Interfaces, Banff, Canada, October, 1997.  Reprinted in Proceedings of the Conference on Automated Learning and Discovery (CONALD98), June 11-13, 1998, Carnegie Mellon University, Pittsburgh, PA. Download paper in pdf format.

Abstract: Project LISTEN's Reading Tutor listens to children read aloud, and helps them. The first extended in-school use of the Reading Tutor suggests that for this task speech input can be natural, compelling, and effective. 


[CALL 97] G. S. Aist and J. Mostow. Adapting Human Tutorial Interventions for a Reading Tutor that Listens: Using Continuous Speech Recognition in Interactive Educational Multimedia. In CALL'97 Conference on Multimedia. Exeter, England, September, 1997. Download paper in pdf format.

Abstract: Human tutors make use of a wide range of input and output modalities, such as speech, vision, gaze, and gesture. Computer tutors are typically limited to keyboard and mouse input. Project LISTEN's Reading Tutor listens to children read aloud, and helps them. Why should a computer tutor listen? A computer tutor that listens can give help and give praise naturally and unobtrusively. In this paper, we address the following questions: When and how should a computer tutor that listens help students? When and how should a computer tutor that listens praise students? We examine how the advantages and disadvantages of speech recognition helped shape the design and implementation of the Reading Tutor. Despite its limitations, speech recognition enables the Reading Tutor to provide patient, unobtrusive, and natural assistance for reading out loud. 


[ISGW97 CRLT] J. Mostow. Collaborative Research on Learning Technologies: An Automated Reading Assistant That Listens. Proceedings of the NSF Interactive Systems Grantees Workshop (ISGW97), Stevenson, Washington, August, 1997. 


[ISGW97 IS] J. Mostow. Guiding Spoken Dialogue with Computers by Responding to Prosodic Cues. Proceedings of the NSF Interactive Systems Grantees Workshop (ISGW97), Stevenson, Washington, August, 1997. 


[ISGW97 KIDS] J. Mostow and M. Eskenazi. A Database of Children's Speech. Proceedings of the NSF Interactive Systems Grantees Workshop (ISGW97), Stevenson, Washington, August, 1997. 


[LDC KIDS] M. Eskenazi and J. Mostow. The CMU KIDS Speech Corpus. Corpus of children's read speech digitized and transcribed on two CD-ROMs, with assistance from Multicom Research and David Graff. Published by the Linguistic Data Consortium, University of Pennsylvania. August, 1997. 


[IAAI97] J. Mostow. Artificial Intelligence and Education. Invited talk at the Ninth National Conference on Innovative Applications of Artificial Intelligence (IAAI-97). Providence, RI, July, 1997. 


[AAAI97] J. Mostow and G. Aist. The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97). American Association for Artificial Intelligence, Providence, RI, July, 1997. Pages 355-361. Click here for presentation slides. Download paper in pdf format.

Abstract: We propose a paradigm for ecologically valid, authentic, unobtrusive, automatic, data-rich, fast, robust, and sensitive evaluation of computer-assisted student performance. We instantiate this paradigm in the context of a Reading Tutor that listens to children read aloud, and helps them. We introduce inter-word latency as a simple prosodic measure of assisted reading performance. Finally, to validate the measure and analyze performance improvement, we report initial experimental results from the first extended in-school deployment of the Reading Tutor. 


[1997 video] J. Mostow. Pilot Evaluation of Project LISTEN's Reading Tutor (5-minute video). July, 1997. Presented at the Fourteenth National Conference on Artificial Intelligence (AAAI-97) and the Ninth National Conference on Innovative Applications of Artificial Intelligence (IAAI-97). Providence, RI


[EDMEDIA 97] J. Mostow and G. Aist. Project LISTEN: A Reading Tutor that Listens. In World Conference on Educational Multimedia and Hypermedia. Calgary, Canada, June, 1997. Live demonstration. 


[MS 97] G. S. Aist. A General Architecture for a Real-Time Discourse Agent and a Case Study in Oral Reading Tutoring. May, 1997. Master's Project in Computational Linguistics at Carnegie Mellon University, supervised by J. Mostow. 


[AAAI CMMII 97] G. S. Aist. Challenges for a mixed initiative spoken dialog system for oral reading tutoring. In Computational Models for Mixed Initiative Interaction: Working Notes of the AAAI 1997 Spring Symposium. March, 1997. Download paper in pdf format.

Abstract: Deciding when a task is complete and deciding when to intervene and provide assistance are two basic challenges for an intelligent tutoring system. This paper describes these decisions in the context of Project LISTEN, an oral reading tutor that listens to children read aloud and helps them. We present theoretical analysis and experimental results demonstrating that supporting mixed initiative interaction produces better decisions on the task completeness decision than either system-only or user-only initiative. We describe some desired characteristics of a solution to the intervention decision, and specify possible evaluation criteria for such a solution. 


[CAETI 96 video] J. Mostow. A Reading Tutor that Listens (5-minute video). November, 1996. Presented at the DARPA CAETI Community Conference, November 19-22, 1996, Berkeley, CA


[JASA 96] M. Eskenazi.  KIDS:  A database of children's speech. Journal of the Acoustic Society of America 100:4(2), December 1996. 

Abstract:  We have collected a database of children reading age- and reading-level-appropriate text aloud. This (labelled) data, to be distributed in the near future, was primarily intended to be used in CMU's LISTEN tutor which employs speech recognition to monitor children's reading and then help correct errors. The speaker population was therefore chosen to represent good and poor readers and to incorporate dialects of the speakers for whom the reading coach is intended. Phonemic balance could not be achieved (although it has been calculated) since the primary concern in recording children reading is to present sentences that can effectively be read by first through third graders. The text is a series of sentences we adapted from text in the Weekly Reader series - most of the adaptation concerned the lack of the accompanying images. The text was chosen for its intrinsic interest and widespread use.  Several trial recording sessions allowed us to develop a protocol that kept extraneous noises produced by the children at a minimum. We will discuss this and other problems inherent in recording children reading. Novel techniques developed for labelling this kind of speech will also be presented.  This work was funded by NSF Grant No. IRI-9528984. 


[UIST 95] J. Mostow, A. Hauptmann, and S. Roth. Demonstration of a Reading Coach that Listens. In Proceedings of the Eighth Annual Symposium on User Interface Software and Technology, pp. 77-78. Sponsored by ACM SIGGRAPH and SIGCHI in cooperation with SIGSOFT, Pittsburgh, PA, November, 1995. Download paper in ps format.

Abstract: Project LISTEN stands for "Literacy Innovation that Speech Technology ENables." We will demonstrate a prototype automated reading coach that displays text on a screen, listens to a child read it aloud, and helps where needed. We have tested successive prototypes of the coach on several dozen second graders. Mostow et al [AAAI94] reports implementation details and evaluation results. Here we summarize its functionality, the issues it raises in human-computer interaction, and how it addresses them. We are redesigning the coach based on our experience, and will demonstrate its successor at UIST '95. 


[NSF ISGW 95] J. Mostow & M. Eskenazi, summary of NSF project, November 1995, Cambridge, MA: "Guiding Spoken Dialogue with Computers by Responding to Prosodic Cues." In R. Jacobs, Proceedings of NSF Interactive Systems Program Grantees Workshop.


[AAAI 94] J. Mostow, S. Roth, A. G. Hauptmann, and M. Kane, "A Prototype Reading Coach that Listens", Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), American Association for Artificial Intelligence, Seattle, WA, August 1994, pp. 785-792. Recipient of the AAAI-94 Outstanding Paper Award. Download paper in pdf format.

Abstract: We report progress on a new approach to combating illiteracy -- getting computers to listen to children read aloud. We describe a fully automated prototype coach for oral reading. It displays a story on the screen, listens as a child reads it, and decides whether and how to intervene. We report on pilot experiments with low-reading second graders to test whether these interventions are technically feasible to automate and pedagogically effective to perform. By adapting a continuous speech recognizer, we detected 49% of the misread words, with a false alarm rate under 4%. By incorporating the interventions in a simulated coach, we enabled the children to read and comprehend material at a reading level 0.6 years higher than what they could read on their own. We show how the prototype uses the recognizer to trigger these interventions automatically. 


[AAAI 94 video] J. Mostow, S. Roth, A. Hauptmann, M. Kane, A. Swift, L. Chase, and B. Weide, "A Reading Coach that Listens (6-minute video)", Video Track of the Twelfth National Conference on Artificial Intelligence (AAAI94), American Association for Artificial Intelligence, Seattle, WA, August 1994. Download paper in pdf format.


[ARPA HLT 94] A. G. Hauptmann, J. Mostow, S. F. Roth, M. Kane, and A. Swift, "A Prototype Reading Coach that Listens: Summary of Project LISTEN." In C. Weinstein (ed.), Proceedings ARPA Workshop on Human Language Technology,March 1994, Plainsboro, NJ, page 237. Morgan Kaufmann Publishers, Inc. Download paper in pdf format.


[Eurospeech 93] A. G. Hauptmann, L. L. Chase, and J. Mostow, "Speech Recognition Applied to Reading Assistance for Children: A Baseline Language Model", Proceedings of the 3rd European Conference on Speech Communication and Technology (EUROSPEECH93), Berlin, September 1993, pp. 2255-2258. 

Abstract: We describe an approach to using speech recognition in assisting children's reading. A state-of-the-art speaker independent continuous speech recognizer designed for large vocabulary dictation is adapted to the task of identifying substitutions and omissions in a known text. A baseline language model for this new task is detailed and evaluated against a corpus of children reading graded passages. We are able to identify words missed by a reader with an average false positive rate of 39% and a corresponding false negative rate of 37%. These preliminary results are encouraging for our long-term goal of providing automated coaching for children learning to read. 


[Video 93] J. Mostow, S. Roth, A. Hauptmann, M. Kane, A. Swift, L. Chase, and B. Weide, "Getting Computers to Listen to Children Read: A New Way to Combat Illiteracy (7-minute video)", Overview and research methodology of Project LISTEN as of July 1993. 


[AAAI 93] J. Mostow, A. G. Hauptmann, L. L. Chase, and S. Roth, "Towards a Reading Coach that Listens: Automated Detection of Oral Reading Errors", Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI93), American Association for Artificial Intelligence, Washington, DC, July 1993, pp. 392-397. Download paper in pdf format.

Abstract: What skill is more important to teach than reading? Unfortunately, millions of Americans cannot read. Although a large body of educational software exists to help teach reading, its inability to hear the student limits what it can do. 

This paper reports a significant step toward using automatic speech recognition to help children learn to read: an implemented system that displays a text, follows as a student reads it aloud, and automatically identifies which words he or she missed. We describe how the system works, and evaluate its performance on a corpus of second graders' oral reading that we have recorded and transcribed.

Return to top