CNPq:  IMiMD - Indexing and Data Mining in Multimedia Databases

 
Christos Faloutsos Phone: (412)-268.1457
Department of Computer Science Fax : (412)-268.5576
Carnegie Mellon Univ. Email: christos@cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~christos

Keywords

Data mining, spatial access methods, metric access methods, multimedia

Project Award Information

Project Summary

This project focuses on indexing multimedia data and on developing new tools to find patterns and correlations in such data. Multimedia objects can often be mapped to n-dimensional points through feature extraction. If not, then they can be treated as metric data, when we are provided a pair-wise distance function. Our methods focus on multimedia, metric and spatial data alike. Typical questions include: "find video clips similar to a given video clip"; "how strong is the correlation (or anti-correlation) between the locations of schools and the locations of libraries?", "how many schools are within 5 miles from libraries?".

This is a joint project with Prof. Caetano Traina from the University of Sao Paulo, Brazil.

Goals, Objectives, and Targeted Activities

For indexing, the goals are (a) to provide formulas to estimate the selectivities for similarity queries and (b) to build faster searching  structures. For data mining, the goals are to provide tools for detection of spatial correlations and to develop fast visualization algorithms for spatial and multimedia datasets.

Indication of Success

We have already achieved several of the above goals:


Long range results: This is an exploratory project, whose aim is to show that power laws and fractals are the correct tools to use for data mining and pattern discovery in large spatial and temporal datasets. This is in contrast to the textbook approaches, which use the uniformity and independence assumptions, and the Gaussian and Poisson distributions; although easy to study, these assumptions are clearly unrealistic for an overwhelming majority of real datasets.

Project Impact

Project References

The following refereed publications mention the  NSF support, since March 2001:
  1. [Filho01] Roberto F. Santos Filho, Agma Traina, Caetano Traina Jr. and Christos Faloutsos Similarity search without tears: the OMNI family of all-purpose access methods ICDE 2001, Heidelberg, Germany, April 2-6 2001.
  2. [Pan01] Jia-Yu Pan and Christos Faloutsos VideoGraph: A New Tool for Video Mining and Classification JCDL'01
  3. [Traina01] Agma Traina, Caetano Traina, Spiros Papadimitriou and Christos Faloutsos Tri-Plots: Scalable Tools for Multidimensional Data  Mining KDD 2001, San Francisco, CA, August 2001.
  4. [Bi01] Zhiqiang Bi, Christos Faloutsos and Flip Korn The "DGX" Distribution for Mining Massive, Skewed Data KDD 2001, San  Francisco, CA, August 2001. ("Best Paper Runner-Up" Award.)
  5. [Traina02] Caetano Traina, Agma Traina, Christos Faloutsos, and Bernhard Seeger, Fast Indexing and Visualization of Metric Datasets using  Slim-trees, IEEE-TKDE, 14, 2, pp. 244-260, March-April 2002.

Area Background

The project requires familiarity with spatial and metric access methods, as well as with multimedia databases.
 

Area References:

GPRA performance criteria

Discoveries at and across the frontiers of science and engineering: The project straddles many areas: databases (spatial/metric access methods), machine vision (eg., for face and image indexing), and fractals (several real, metric datasets are self-similar, leading to better analysis)

Connections between discoveries and their use in the service of society: Retrieval by multimedia content has numerous applications: medical image retrieval, automatic video processing, scientific databases, to name a few.