15-826(A) - MULTIMEDIA DATABASES AND DATA MINING
INSTRUCTOR: Christos Faloutsos
UNITS: 12
SPRING 2002 - Subject to change.
DESCRIPTION
The course covers advanced algorithms for learning, analysis, data management
and visualization of large datasets. Topics include indexing for text and DNA
databases, searching medical and multimedia databases by content, fundamental
signal processing methods, compression, fractals in databases, data mining,
privacy and security issues, rule discovery and data visualization.
TOPICS TO BE COVERED
1. Database topics:
o Traditional databases: Advanced hashing and multi-key access methods, for
main-memory and for disk-based data.
o Text databases: indexing text and DNA strings, clustering, information filtering,
LSI (singular value decomposition).
o Multimedia databases: Searching by content in signals: Time sequences, photographs
and medical images, video clips, feature extraction, continuous media storage
and delivery.
2. Tools:
o Fundamental signal processing methods: Discrete Fourier Transform, wavelets,
JPEG and MPEG compression.
o Singular Value Decomposition: revisited
o Fractals in databases: Self-similarity/non-uniformity ofreal datasets, fractal
dimensions, selectivity using fractals and multifractals, fractal image compression,
self-similarity in web-traffic patterns.
3. Datamining:
o Review of Statistical methods,
o Review of AI-methods,
o Database methods - Massive datasets: Association rules; Frequent sets; single-pass
learning algorithms; information compression and reconstruction; Sampling; condensed
data representations; Datacubes; Cube-trees; function finding.
o Security and Privacy Protection: Datafly, Scrub, Mu-Argus, and k-Similar.
o Visualization of large data sets
4. OVERVIEW OF RECENT TOPICS: Mobile databases; Active Disks for data mining;
Web databases; Future directions.
PREREQUISITES
Introductory database course 15-415 (familiarity with
B-trees and Hashing), or permission of the instructor.
TEXT
Copies of instructor's transparencies and notes, as well as copies of selected
articles will be made available. The required text is
* Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer Academic Press, 1996.
Recommended, but not required texts:
* William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P.
Flannery, Numerical Recipes in C, Cambridge University Press, 1992, 2nd Edition.
* Korth, H., Silbershatz, A., Database System Concepts, 2nd edition, McGraw
Hill Inc., 1991.
* Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann, 2000.
METHOD OF EVALUATION
The course involves
* A midterm (20%)
* Homeworks (10%)
* A Project (40%)
* A Final exam (30%)
Projects will be carried out in teams of 1-3. A detailed handout about the project will be distributed at the beginning of the course, along with a list of suggested projects. The goal of the project is to give the participants the opportunity to tackle a large, interesting problem, which may lead to a publication.