LTI SRS Home
Call for Presentations
Contacts
Goals
Topics of Interest
Important Dates
Guidelines
2004 Proceedings
2004 Selected Abstracts
2004 Event Program
2004 Slides
2003 Slides
2003 Selected Abstracts
2003 Event Program

 


Abstracts 2004

Time: 9:00 am

Speaker: Jonathan Brown

Retrieval of Authentic Documents for Reader-Specific Lexical Practice

 

When a teacher gives a reading assignment in today's language learning classrooms, all of the students are almost always reading the same text. Although students have different reading levels, it is impractical for a single teacher to seek out unique texts matched to each student's abilities. In this presentation, I describe REAP, a system designed to assign each student individualized readings by combining new techniques in reading difficulty estimationi[1] and detailed student and curriculum modeling[2] with the large amount of authentic materials on the Web. REAP is designed to be used as an additional resource in teacher-led classes, as well as to be used by reading comprehension researchers for testing hypotheses on how to improve reading skills for L1 as well as L2 learners. I describe how researchers can use this tool to get fine-grained control over selection of reading materials, so that they can more easily test these new learning hypotheses.

Vocabulary acquisition is the primary factor we use in matching texts to a student's abilities. These abilities are modeled as a histogram of words. We also model each desired curriculum level as a histogram of words, learned from a corpus of texts that the students would normally read. Differences between the student model and that of the next desired skill level indicate where the student needs to focus. The system can also prioritize different criteria during the search. For instance, the system can retrieve documents based solely on the vocabulary terms needed to progress toward the next level, thereby focusing on curriculum. REAP can also take into account other goals, such as student interests, special topics, or an upcoming test, all represented as word histograms. This allows teachers and researchers to decide what they want the students to focus on for each session.

[1] K. Collins-Thompson and J. Callan. (2004.) "A language modeling approach to predicting reading difficulty." In Proceedings of the HLT/NAACL 2004 Conference. Boston.

[2] J. Brown and M. Eskenazi. (2004.) "Retrieval of Authentic Documents for Reader-Specific Lexical Practice." In Proceedings of InSTIL/ICALL Symposium 2004. Venice, Italy.


Time: 9:30 am

Speaker:
Wen Wu

Incremental Detection of Text on Road Signs from Video

 

Automatic detection of text from video is an essential task for video indexing and understanding. In this talk, we focus on the task of automatically detecting text on road signs from video. Text on road signs carries much useful information necessary for a driver's safely driving and efficient navigation. Automatically detecting text on road signs can help to keep a driver aware of the traffic situation and surrounding environments. Such a multimedia system can reduce driver's cognitive load and enhance safety in driving, which is especially useful for elderly drivers with weak visual acuity.

In this talk, I will present a fast and robust framework for incrementally detecting text on road signs from natural scene video. The new framework makes two main contributions. First, the framework applies a Divide-and-Conquer strategy to decompose the original task into two sub-tasks, that is, localization of road signs and detection of text. Corresponding algorithms for the two sub-tasks are proposed and they are smoothly incorporated into a unified framework through a real-time feature tracking algorithm. Second, the framework provides a novel way for text detection from video by integrating 2D features in each video frame (e.g., color, edges, texture) with 3D information available in a video sequence (e.g., object structure). The feasibility of the proposed framework has been evaluated on 22 video sequences captured from a moving vehicle. The new framework gives an overall text detection rate of 88.9% and false hit rate of 9.2%, which makes it possible for it to be applied to a driving assistant system and other tasks of text detection from video.

Reference:

W. Wu, X. Chen and J. Yang. Incremental Detection of Text on Road Signs from Video with Application to a Driving Assistant System. To appear in ACM Multimedia, New York, USA, 2004. (Oral Presentation).


Time: 10:30 am

Speaker: Kenji
Sagae

Using Dependencies for Easy, Fast and Accurate Grammatical/Functional Analysis

 

Modern statistical syntactic parsers have achieved very high levels of accuracy over the past ten years, and we have begun to see their impact on several areas of language technologies, such as question answering, machine translation, and semantic-role labeling. Because the Penn Treebank (PTB) is widely used for training of such parsers, it is common to associate PTB-style constituent trees with statistical parsing. However, there are instances where other syntactic representations would be easier to use, and just as useful (if not more). One such instance is the assignment of grammatical relations (or even PTB function tags) to words. In this case, dependencies are not only more comfortable to understand and faster to annotate, but also easier to process and largely just as effective.

I will discuss a simple representation based on lexical dependencies, which I have been using in the syntactic analysis of parent-child dialogs. I will present a simple deterministic algorithm for dependency parsing, and show the accuracy of the dependencies it produces is very close to the accuracy of current PTB constituent statistical parsers (91% vs. 93%). Although PTB constituent parsers have a slight edge, they are quite complex. I will show that a dependency parser that performs almost as well can be surprisingly simple and fast.

I will also discuss how these dependencies can be used to determine PTB function tags (such as subject, predicate, temporal, beneficiary, locative, etc). The current state-of-the-art on assigning function tags to text is the work of Blaheta (2000, 2003), and it uses (among other features) PTB parse trees nodes. I will present results that are very similar using no constituent information, only dependencies. Both methods achieve an overall accuracy of about 87% in function tagging (not counting .NULL. tags). Blaheta.s method is slightly better on tags classified as .grammatical. (subject, predicate, etc), while the dependency approach is slightly better on .form/function. tags (temporal, locative, manner, etc).

This approach to function tagging can also be used to label all dependency arcs, when training data is available. In fact, a relatively small training corpus (less than 10,000 words) can be used to produce a system that assigns a grammatical relation label to every dependency arc with an accuracy of about 90% in a corpus of parent-child dialogs.


Time: 11:00 am

Speaker: Guy Lebanon

Hyperplane Margin Classifiers on the Multinomial Manifold

 


Time: 11:30

Speaker: Antoine
Raux

Maximum Likelihood Adaptation of Semi-Continuous HMMs by Latent Variable Decomposition of State Distributions

 

Hidden Markov Models, the single most used method for speech recognition, involve two types of parameters: transition probabilities, which model the temporal aspect of speech, and output distribution parameters (usually means, variances and weights of Gaussian mixtures), which capture the spectral properties of sub-phonemic units, each unit being equivalent to a state in the model. In Continuous Density HMMs (CDHMMs), each state has its own output distribution, independent of that of other states. While this makes for powerful models, it implies the use of a large number of Gaussians, since there are typically on the order of several thousand states and tens or hundreds of Gaussians per mixture. This requires a large amount of training data and makes the use of such models computationally expensive. On the other hand, in Semi-Continuous HMMs (SCHMMs), all the states share a single set of Gaussians and only the mixture weights depend on the state. Compared to CDHMMs, SCHMMs are more compact in size, require less data to train well and result in comparable recognition performance with much faster decoding speeds. Nevertheless, the use of SCHMMs in large vocabulary speech recognition systems has declined considerably in recent years. A significant factor that has contributed to this is that systems that use SCHMMs cannot be easily adapted to new acoustic (environmental or speaker) conditions. While maximum likelihood (ML) adaptation techniques have been very successful for CDHMMs, these have not worked to a usable degree for SCHMMs. In this talk, I will present a new framework for supervised ML adaptation of SCHMMs, built upon the paradigm of Probabilistic Latent Semantic Analysis (PLSA). We use PLSA to decompose the probability distribution of each Gaussian given the state (i.e. the mixture weights) according to a latent variable. The decomposition is performed using a variant of the Expectation Maximization algorithm. I will show how our approach is equivalent to smoothing the mixture weight matrix obtained by retraining the original model on a small amount of adaptation data. Experiments on non-native speech recognition in the framework of the Let's Go spoken dialogue system demonstrate the effectiveness of this method


Time: 1:30 pm

Speaker: Yee Man (Betty) Cheng

Language Technologist's Approach to Understanding G-Protein-GPCR Interaction

 

String alignments and n-grams are commonly used in language technology applications, such as machine translation, information retrieval, speech recognition and synthesis. In machine translation, alignment can yield high accuracy if the source and target languages have similar word order. However, if the two languages have very different word order, getting a correct alignment can be difficult and an n-gram based MT system may perform better. Likewise, a correct alignment of protein sequences can yield high accuracy in prediction problems. But segments or "words" in the protein sequence can shuffle in their linear order while preserving their orientation in 3D space and therefore the protein's function or "meaning" as well.

The superfamily of proteins in this study, G-protein coupled receptors (GPCR), are important in pharmacological research as they are the target of approximately 60% of current drugs on the market (Muller, 2000). Coupling with G-proteins, these receptors regulate much of the cell's reactions to external stimuli. Abnormalities in this regulation can lead to cancer, Alzheimer's, Parkinson's and other diseases. Identification of the type of G-proteins that can bind to a particular GPCR can provide information on the causes and symptoms of the disease the receptor is involved in.

Previous studies on predicting the family of G-proteins that can couple to a given GPCR sequence have focused on the intracellular domains of the receptor sequence, either using alignment-based features (Cao et al., 2003; Qian et al., 2003), n-gram features (Moller et al., 2001) or physiochemical properties of the amino acids (Henriksson, 2003). From the roles of alignments and n-grams in MT and their analogy to the protein language, we have chosen to combine alignment and n-gram information in a hybrid prediction method using a k-nearest neighbours (k-NN) classifier on sequence alignment similarity and a k-NN classifier on Euclidean distance of n-gram counts. Our method outperforms the current state-of-the-art in precision, recall and F1. Systematic experiments with our prediction method were able to validate biologists' hypothesis that most of the coupling specificity information resides in the 2nd and 3rd intracellular loops of the receptor, while providing evidence for a new hypothesis that the information is more localized to the beginning of the 2nd intracellular loop.

Cao, J., R. Panetta, et al. (2003). "A naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins." Bioinformatics 19(2): 234-40.

Henriksson, A. (2003). Prediction of G-protein Coupling of GPCRs - A Chemometric Approach. Engineering Biology. Linkoping, Linkoping University: 79.

Moller, S., J. Vilo, et al. (2001). "Prediction of the coupling specificity of G protein coupled receptors to their G proteins." Bioinformatics 17 Suppl 1: S174-81.

Muller, G. (2000). "Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach." Curr Med Chem 7(9): 861-88.

Qian, B., O. S. Soyer, et al. (2003). "Depicting a protein's two faces: GPCR classification by phylogenetic tree-based HMMs." FEBS Lett 554(1-2): 95-9.


Time: 2:00 pm

Speaker: John Kominek

On the Road to High Quality Universal Speech Synthesis

 

Machine Translation has the Vaquois Triangle -- a famous high-level perspective that delineates the major approaches to MT, as well as their limitations. You can have either universality (through an Interlingua) or high quality (Direct translation), but not both. In between, trying to find a happy medium, reside Transfer techniques.

The field of Speech Synthesis also has such a triangle, with similarly frustrating trade-offs: either high quality or full flexibility, but not both. In this talk I begin by drawing the corresponding parallels, explaining where the three major approaches fit in, and their historical development. These three are unit-selection, spectrogram-based, and articulatory synthesis.

By directly employing segments of recorded speech, unit-selection synthesis can achieve excellent voice quality, but at the expense of flexibility. A universal synthesizer, ideally, can mimic any person in any language, in a full range of styles. Achieving this, though, demands precise modeling of the human vocal tract and articulators -- as yet an unsolved problem. In between, spectrogram-based synthesizers offer good controlability, but do not sound as natural as unit-selection techniques.

Two paths can thus be taken on the road to high quality universal synthesis. One can start with a flexible synthesizer and attempt to make it sound better. Or one can start with a good sounding synthesizer and try to make it more flexible. This talk will follow the second path.

To illustrate, we tackle the problem of "accent transformation" -- changing the accent of one person to sound more like that of another. This is made possible using CMU's recently created "Arctic Speech Databases," a parallel corpus of carefully spoken English sentences. Editions exist for American, Canadian, Scottish, Indian, and Japanese accented English. Grafting a new accent onto an existing voice is desirable for localizing a synthesizer to match that of a target region. Or, moving in the opposite direction, by making a native voice sound foreign, hence "exotic".


Time: 2:30 pm

Speaker: Nikesh Garera

Towards a Personal Briefing Assistant

 

The preparation of summary reports from raw information is a common task in research projects. A tool that highlights useful items for a summary would allow report writers to be more productive, by reducing the time needed to assess individual items. It has further potential benefit in that it can be used to create user-specific or audience-specific digests. In the latter case, multiple tailored reports could in principle be generated from the same input information. With this motivation, we present a design of an adaptive system that learns to extract important items from weekly interviews by observing the behavior of human summary authors.

Our application scenario involves a report writer producing digests on a week-to-week basis and our goal is to make this person more efficient over time. We propose to do this by presenting the writer with successively better ordered lists of items (such that digest-worthy items appear at the top of the ordered list).

We identified salient features used for learning in this new domain by studying the corpus of project interviews. This corpus consisted of weekly progress interviews of project members collected over a period of 4 months. The features were then annotated in the corpus and were used as parameters in a regression model. This model is incrementally trained from user input and is used to reorder items in successive weeks. We measure the user effort in terms of how far down the user has to go in the list in order to select all important items in a weekly set.

In our evaluation study, 7 expert subjects (project members, managers) were asked to create 5-item summaries for 12 successive weeks, using a selection interface. The results with the assistance of our system show an improvement in average precision by a factor of more than 2.21 by the end of the learning period as compared to the baseline of no learning. Other evaluation metrics also show significant improvement. A low inter-rater agreement (Kappa=0.26) indicates that the subjects are selecting different items and the learned models are individual. Moreover, the different feature weights in the regression models for each subject identify their summarization differences. We also report our ongoing work of automatic feature extraction to make this approach domain independent.

The talk will include a short demonstration of our system showing how the learned models can be used to populate a template for a standard quarterly report


Time: 3:30 pm

Speaker: Luo Si

Federated Search in Uncooperative Environments

 

Conventional search engines such as Google or AltaVista are effective when an information source allows its contents to be crawled and indexed in a centralized database. However, a large amount of information cannot be crawled and searched by conventional search engines either due to intellectual property protection or frequent information update. This type of information is valuable. For example, hidden Web contents that can not be searched by conventional search engines have been estimated to be 2-50 times larger than the visible Web and are often created and maintained by professionals.

Federated search provides the solution of the search problem for the information that cannot be searched by conventional search engines. It includes three sub-problems: i) acquiring information about the contents of each information source (resource representation), ii) ranking the sources and selecting a small number of them for a given query (resource ranking), and iii) merging the results returned from the selected sources into a single ranked list (result-merging).

This work addresses federated search problems in uncooperative environments such as the Web where information sources can not be assumed to share their contents or use the same type of search engine. Empirically effective solutions have been proposed to the full range of federated search sub-problems such as new algorithms for information source estimation, resource selection and results merging.

Furthermore, a unified utility maximization framework is proposed to combine the separate solutions together to construct effective systems of different federated search applications. This is the first probabilistic framework for integrating the different components of a federated search system. The more unified view of federated search task provides a new opportunity to utilize available information. It enables us to configure individual components globally to get desired overall results of different applications, which is superior to the simple choice of combining individual effective solutions together in previous research.

This work advances the state-of-the-art of federated search. The more theoretical foundation, the better empirically results and the better modeling of real world applications make the new research a bridge to turn federated search from a cool research topic to a much more practical tool.

Related references:

Si, L. & Callan, J. (2002a). Using sampled data and regression to merge search engine results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

Si, L. & Callan., J. (2003a). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

Si, L. & Callan, J. (2003b). A Semi-Supervised learning method to merge search engine results. ACM Transactions on Information Systems, 21(4).

Si, L. & Callan, J. (2004). The effect of database size distribution on resource selection algorithms. In Distributed Multimedia Information Retrieval, LNCS 2924, Springer.

Si, L. & Callan, J. (2004). Unified Utility Maximization for Distributed Information Retrieval in Uncooperative Environments. In Proceedings of the 13th International Conference on Information and Knowledge Management, ACM.


Time: 4:00 pm

Speaker: Satanjeev (Bano) Banerjee

Automatically Detecting the Structure of Human Meetings

 

We are interested in automatically extracting the structure of meetings between humans. Such structure includes the state of a meeting (presentation, discussion, etc), the roles of each meeting participant (presenter, discussion participator, observer, etc), the onset/offset boundaries of agenda items, and the onset/offset boundaries of regions of decisions (such as action items). In this talk we will talk about our current research into detecting these various aspects of human meetings.

In particular, we will present a simple taxonomy of meeting states and participant roles. We trained a decision tree classifier that learns to detect these states and roles from simple speech-based features such as the number of speakers and the lengths of utterances and speech-overlaps. This classifier detects meeting states 18% absolute more accurately than a random classifier, and detects participant roles 10% absolute more accurately than a majority classifier. We will then report on the effect of adding more advanced features such as the words in the utterances as output by an automatic speech recognizer, as well as features drawn from other modalities such as the body positions and face directions of the various participants relative to each other as output by a camera-image processor.

Finally we will present initial research on agenda item and decision region boundary detection. Unlike meeting state and participant role detection, the problem of detecting agenda items and decision regions does not easily lend itself to a typical machine learning approach, since there are no clear pre-defined classes. However, preliminary observations of recorded meeting data suggest that different agenda items usually differ highly in both the pattern of words used in discussing them, as well as in the identities of the participants involved in the discussions thereof. We will report on our ongoing research where we draw upon ideas from the realm of topic tracking and leverage the above characteristics to perform agenda item/decision region detection.