Project LISTEN
A Reading Tutor that Listens
Last updated: August 11, 2011

Research Basis for Project LISTEN’s Reading Tutor

Jack Mostow, Director, Project LISTEN

Carnegie Mellon University School of Computer Science

RI-NSH 4213, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890

www.cs.cmu.edu/~listen

November 11, 2003; revised August 10, 2011

Abstract

This document summarizes two types of published research underlying Project LISTEN’s automated Reading Tutor. Intervention studies measured the Reading Tutor’s effectiveness. Other research, others’ as well as our own, served to guide its development. The cited Project LISTEN publications can be downloaded from www.cs.cmu.edu/~listen except where precluded by copyright or not yet in print.

Acknowledgements

The work described here was supported in part by NSF under ITR/IERI Grant REC-0326153, by the Institute of Education Sciences, U.S. Department of Education through Grant R305A080628 to Carnegie Mellon University, and by the Heinz Endowments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Heinz Endowments.

Special thanks to the co-principal investigators who helped formulate the IERI proposals from which portions of this document are excerpted, especially reading expert Professor Rollanda O’Connor.

1. Summary of intervention studies

Speech-recognition-based, computer-guided oral reading has demonstrated usability, user acceptance, assistive effectiveness, and even pre- to post-test gains [Cole et al., 1999; Mostow et al., 1994; Nix et al., 1998; Russell et al., 1996; Williams, 2002; Williams et al., 2000] – but the proof of the pudding is whether it significantly increases learning gains over gains that children make otherwise. Even with barely 20 minutes of use per day, successive versions of the Reading Tutor have produced substantially higher comprehension gains than current practices in controlled studies lasting several months. To ensure that results were due to the Reading Tutor intervention, we compared different treatments within the same classrooms and randomized treatment assignment, stratifying by pretest scores within class. We used valid and reliable measures [Woodcock, 1998] to measure gains from pre- to post-test. We computed effect size as the difference in gains between the Reading Tutor and current practice, divided by the average standard deviation in gains of the two groups. Effect sizes for passage comprehension were substantial compared to other studies [NRP, 2000]: 0.60 for 63 students in grades 2, 4, and 5 at a low-income urban school [Mostow and Aist, 2001; Mostow et al., 2003b]; 0.48 for 66 third graders at a lower-middle class urban school [Aist et al., 2001; Mostow et al., 2001; Mostow et al., 2003a]; and 0.66 for 52 first graders at two suburban Blue Ribbon Schools of Excellence [Mostow et al., 2002b; Mostow and Beck, under revision].

1.1. Pilot study (1996-97)

The Reading Tutor achieved dramatic results in the first pilot study of extended use long enough to demonstrate significant learning. During the 1996-97 school year, a pilot group of low-reading third graders used the Reading Tutor one at a time in a small office under the individual supervision of a school aide. According to school-administered pre- and post-tests, six third graders who started almost three years below grade level averaged two years of progress in under eight months use [Aist and Mostow, 1997].

1.2. Within-classroom comparison (1998)

In spring 1998, we did our first controlled study of the Reading Tutor in classroom settings at Fort Pitt Elementary [Mostow et al., 2003b]. All 72 students in 3 classrooms (grades 2, 4, and 5) that had not previously used the Reading Tutor were independently pre-tested on the Word Attack, Word Identification, and Passage Comprehension subtests of the Woodcock Reading Mastery Test [Woodcock, 1987]. We split each class into 3 matched treatment groups – Reading Tutor, commercial reading software, or regular classroom activities, including other software use. We assigned students to treatments randomly, matched within classroom by pretest scores. Even though the study lasted only 4 months, and actual usage was a fraction of the planned daily 20-25 minutes, students who used the 1998 version of the Reading Tutor significantly outgained their matched classmates in comprehension (effect size .60, p = .002), progressing faster than their national cohort. (No other differences were significant, and commercial software fell in between.) As the principal said, “these children were closing the gap.”

1.3. Comparison to human tutors (1999-2000)

In 1999-2000, we evaluated the new, mixed story choice version of the Reading Tutor at a second school in a lower-middle class community near Pittsburgh. This year-long study of 131 second and third graders in 12 classrooms compared three daily 20-minute treatments. (a) 58 students in 6 classrooms used the 1999-2000 version of the Reading Tutor. Students took daily turns using one shared Reading Tutor in their classroom while the rest of their class received regular instruction. (b) 34 students in the other 6 classrooms were pulled out daily for one-on-one tutoring by certified teachers. To control for materials, the human tutors used the same set of stories as the Reading Tutor. (c) 39 students served as in-classroom controls, receiving regular instruction without tutoring. We pre- and post-tested students in word identification, word attack, word comprehension, passage comprehension, and fluency.

To our surprise, human tutors beat the Reading Tutor only in Word Attack (effect size .55). Third graders in both the computer- and human-tutored conditions outgained the control group in Word Comprehension (effect sizes of .56 and .72, respectively) and Passage Comprehension (effect sizes of .48 and .55, respectively) [Aist et al., 2001; Mostow et al., 2001]. No other differences in gains were significant.

1.4. Equal-time comparison to Sustained Silent Reading (2000-2001)

According to the National Reading Panel, “the amount of gain attributable to reading alone should be the baseline comparison against which the efficacy of instructional procedures is tested. If an instructional method does better than reading alone, it would be safe to conclude that method works” [NRP, 2000, Ch. 3, p. 27]. A 7-month study of 178 students in grades 1-4 at two Blue Ribbon Schools of Excellence compared two treatments, each provided in daily 20-minute sessions. 88 students did Sustained Silent Reading (SSR) as already implemented in their classrooms (including teacher read-aloud in grade 1 until students were ready for independent reading practice). 90 students in 10-computer labs used the 2000-2001 version of Project LISTEN’s Reading Tutor. The Reading Tutor group significantly outgained their statistically matched SSR classmates in phonemic awareness, rapid letter naming, word identification, word comprehension, passage comprehension, fluency, and spelling – especially in grade 1, where effect sizes for between-treatment differences in gains ranged from .20 to .72 [Mostow et al., 2002b].

1.5. Effectiveness for English language learners [this section added 6/6/05 and updated 8/11/11]

2004 marked the first independent, third-party, controlled evaluation of the Reading Tutor [Poulsen, 2004]. This two-month pilot study included 34 second through fourth grade Hispanic students from four bilingual education classrooms. The study compared the efficacy of the 2004 version of the Project LISTEN Reading Tutor against the standard practice of Sustained Silent Reading (SSR). This study was undertaken to obtain some initial indication as to whether the tutor would also be effective within a population of English language learners.

The study employed a crossover design where each participant spent one month in each of the treatment conditions. The experimental treatment consisted of 25 minutes per day using the Reading Tutor within a small pullout lab setting. Students in the control treatment remained in the classroom where they participated in established reading instruction activities. Dependent variables consisted of the school district’s curriculum based measures for fluency, sight word recognition, and comprehension.

The Reading Tutor group outgained the control group in every measure during both halves of the crossover experiment. Within-subject results from a paired T-Test indicate that these gains were significant for one sight word measure (p = .056) and both fluency measures (p < .001). Effect sizes were 0.55 for timed sight words, a robust 1.16 for total fluency and an even larger 1.27 for fluency controlled for word accuracy. These dramatic results observed during a one-month treatment indicate that this technology may have much to offer English language learners.

Two additional groups of Canadian researchers conducted independent evaluations of the Reading Tutor with English language learners and as of June 2005 are analyzing the data.

A 10-week study by Kenneth Reeder, Margaret Early, Maureen Kendrick, Jon Shapiro, and Jane Wakefield at the University of British Columbia [CTV, 2006; D’Silva et al., 2005; Reeder et al., 2004; Reeder et al., 2005; Reeder et al., 2007; Reeder et al., 2009] involved 77 students from five Vancouver elementary schools, grades 2-6 (ages 7-12 years). Their home languages were Hindi (14), Mandarin (21), Spanish (21), and English (21: 11 using the Reading Tutor, and 10 in a human tutoring program). Gains by the Reading Tutor group matched gains by the human tutoring group on most reading measures, and interviews showed favorable affect impact by the Reading Tutor.

A 12-week study by Esther Geva and Todd Cunningham at the University of Toronto [Cunningham, 2006; Cunningham and Geva, 2005] involved 104 ESL students in grades 4-6 at eight schools. The study compared three treatments: the Reading Tutor; Kurzweil 3000, which reads aloud to the student and provides vocabulary support; and regular ESL classroom instruction. Analysis of data from 77 students in Grades 4-6 found pre- to posttest gains on some measures of language and literacy skill, with no significant differences among conditions, but did not measure oral reading fluency.

An 18-week crossover study in Accra, Ghana, [Korsah et al., 2010] provided the Reading Tutor as a supplemental intervention for 89 children in 3 schools varying in affluence. It found treatment effect sizes of over 1 standard deviation (considered large) for fluency gains at the two poorer schools and for spelling gains at one of them, but no significant differences between treatments at the most affluent school.

A 10-week crossover study in Bangalore, India, [Weber and Bali, 2010] focused on 62 low-income elementary school students at 3 schools. This population had little or no exposure to English outside of school. Overall, they averaged significantly higher gains in oral reading fluency (but not in spelling) over the 5 weeks during which they used the Reading Tutor than over the other 5 weeks.

2. Summary of underlying research

Why does the Reading Tutor improve comprehension? Theoretically, students who recognize words effortlessly can devote more attention to comprehension [LaBerge and Samuels, 1974], and the relationship between rate of oral reading and reading comprehension is strong through the elementary years [Pinnel et al., 1995]. The cognitive load imposed by word identification before it has become a mentally automatic process consumes limited mental resources, such as attention and short term memory, needed to comprehend the sentence and its relationship to the surrounding context [Perfetti, 1992].

However, decoding practice by itself does not necessarily improve fluency or comprehension. Some studies found that teaching children to recognize isolated words quickly gave no advantage in reading comprehension [Fleischer et al., 1979], or that comprehension did not improve unless readers recognized the words nearly as fast in context as in lists [Levy et al., 1997]. Thus fluency makes a unique contribution to comprehension over that made by word identification [Ehri and McCormick, 1998; O'Connor et al., 2002; Shankweiler et al., 1999].

Guided oral reading provides opportunities to practice word identification and comprehension in context. There is ample evidence that one of the major differences between good and poor readers is the amount of time they spend reading. Poor readers are unlikely to practice on their own. Students who need the most practice spend the least amount of time actually reading [Allington, 1977]. How time is spent reading matters too [Mostow et al., 2002a]. Poor readers tend to reread the same easy stories over and over [Aist, 2002a]. Modifying the Reading Tutor to take turns picking stories exposed students to more new vocabulary than they saw when they chose the stories [Aist, 2002a; Aist, 2002b; Aist and Mostow, 2003; Mostow et al., 2003b].

The Reading Tutor aims for the zone of proximal development [Doolittle, 1997] by dynamically updating its estimate of the student’s reading level, and picking stories accordingly – which are somewhat harder than students choose when it is their turn [Mostow et al., 2003a].

The Reading Tutor scaffolds key processes in reading – and tests its own scaffolding. Scaffolding provides information at the “teachable moments” when it is needed. For example, explicit vocabulary instruction is important but time-consuming [Beck et al., 2002]. Explaining unfamiliar words and concepts in context can remediate deficits in vocabulary and background knowledge [Elley, 1989], so we added support for vocabulary acquisition by presenting short “factoids” – comparisons to other words [Aist, 2001b; Aist, 2002a]. An automated experiment embedded in the Reading Tutor tested the effectiveness of reading a factoid just before a new word in a story, compared to simply encountering the word in context without a factoid. The outcome variable was performance on a multiple-choice question, presented the next day the student used the Reading Tutor. Analysis of over 3,000 randomized trials showed that factoids helped on rare, single-sense words, and that they helped third graders more than second graders [Aist, 2000; Aist, 2001a; Aist, 2001b]. By acquiring predictive models of the effects of tutorial actions, embedded experiments can inform a decision-theoretic approach to tutoring [Beck, 2001; Beck, 2002; Beck and Woolf, 2000; Beck and Woolf, 2001; Beck et al., 2000; Murray et al., revisions under review].

The zone of proximal development depends on tutorial scaffolding as well as on student proficiency [Murray and Arroyo, 2002], so the Reading Tutor lets the student read as much as possible, but helps as much as necessary. It provides spoken and graphical assistance when it notices the student click for help, hesitate, get stuck, skip a word, make a mistake, or encounter a word likely to be misread [Mostow and Aist, 1999]. Its “visual speech” [Massaro, 1998] uses talking-mouth videoclips of phonemes to scaffold phonemic awareness. The Reading Tutor assists word identification by previewing new words [Mostow, to appear] and reading hard words aloud. Its word attack hints include rhyming and sounding out. It supports vocabulary acquisition by explaining new words [Aist, 2001b; Aist, 2002a; Mostow et al., 2003c]. It scaffolds comprehension by reading hard sentences aloud and by asking questions [NRP, 2000] – “cloze” items [Mostow et al., 2002c] and generic “who-what-where” questions, which at first appeared to boost comprehension of nearby sentences in an embedded experiment [Beck et al., 2003]. The Reading Tutor bolsters motivation by listening attentively, “backchanneling” [Aist and Mostow, 1999], giving encouragement [Aist et al., 2002], and praising good or improved performance [Mostow and Aist, 1999]. By reducing frustration [Betts, 1946] and making a wide range of authentic, engaging text cognitively accessible to the child, scaffolding helps address the motivational issues of confidence, challenge, curiosity, and control pivotal to effective tutoring [Lepper and Chabay, 1988; Lepper et al., 1993]. Poor readers’ listening comprehension is far above their independent reading level [Curtis, 1980; Spache, 1981], so reading hard words and sentences to them reduces frustration and repairs comprehension failures caused by lack of automaticity in word identification.

One approach to improving automaticity is repeated reading, in which students read a passage or page of text until their reading rate increases by a given amount, usually 25% or more [Samuels, 1979]. A recent review of the repeated reading literature [Meyer and Felton, 1999] recommended that poor readers practice building fluency for 10-20 minutes per day over a long duration, engage in reading aloud, and use text at their instructional level. However, improving word recognition accuracy and comprehension can require assistance to remediate errors [McCoy and Pany, 1986; Young et al., 1996] – which requires listening to the student read aloud.