The Information Retrieval group studies a wide range of issues related to finding, organizing, analyzing, and communicating information.
BioInformatics: Pattern discovery from and automated mapping between pretein sequences, folding structures and biological functions is a new line of research where we actively collaborate with biologists. [Yang]
Cross-Media Question Answering: The saying "a picture is worth a thousand of words" certainly applies to graphical data representations for comparative analysis, trend detection, and related tasks. Our new adventure is to use information extraction techniques to acquire text-image parallel data from the Web, to learn cross-media associations from those data, to exploit probabilistic reasoning across words and pictures, and to answer questions using both. [Yang]
Distributed Information Retrieval: Searching in a networked environment containing thousands of searchable databases that are in different formats, in different languages, in different types of search engines, and controlled by other people. [Callan]
Information Access in K-12 Education: Most IR tools are designed for adults. This project develops search interfaces, information access tools, and lesson plans that are customized for use by students in K-12 education. [Callan]
Information Filtering: Automatically monitoring a stream of documents (e.g., news stories, news groups, etc) to find just those stories that are interesting to you. Learning from example what kinds of documents you find interesting. [Callan, Yang]
Question Answering: Typical IR systems return a set of documents, or perhaps a set of queries. LTI Question Answering software extracts information from documents in large, open-domain corpora to answer questions in subject areas that are not known in advance. [Callan]
Text Categorization: Using machine learning algorithms (regression models, nearest neighbor classification, support vector machines, hidden Markov models, etc. ) to automatically classify documents into a pre-defined taxonomy of categories (such as the Yahoo! hierarchy) is an open challenge in IR. We have developed state-of-the-art classification systems, and are currently focusing on solving large-scale hierarchical classification and hypertext classification problems in real-world applications. [Yang]
Topic Detection & Tracking (TDT): Adapting supervised and unsupervised learning techniques to temporally-ordered data streams (such as newswire data or radio or television broadcasts), to automatically detect novel events, track the new trends for events of user's interest, and filter important information for user's attention have been a part of our current research. [Yang]
Translingual Information Retrieval: Using queries in one language (such as English) to research for documents in different languages (such as German, Italian, Chinese, etc). We use statistical techniques to learn mappings between any language pairs from bilingually parallel text as the training data. We are also investigating approaches to mining the web for automatically acquired real-world parallel corpora on which we train our translingual search engines. [Yang]
Faculty | Visitors | Grad Students | Undergrads | Staff | Alumni |
---|---|---|---|---|---|
Jamie Callan
Yiming Yang |
M. Elena Renda |
Doug Baker Ulas Bardak Kevyn Collins-Thompson Krzysztof Czuba Fan Li Jie Lu Paul Ogilvie Monica Rogati Luo Si Ashwin Tengli Jian Zhang Yi Zhang |
Bryan Kisiel Nian-li Ma Thi Nhu Truong |
Brian Archibald Tom Ault Rafael A. Calvo Shoubin Dong Yibing Geng Charles W. Lattimer Danny Lee Xin Liu Eric Ochlschlager Thomas Pierce |