Carnegie Mellon University
15721 Database System Design and Implementation
Spring 2003 - C. Faloutsos
Syllabus
OVERVIEW
A Database Management System (DBMS) is a software system designed to efficiently
store, retrieve, manipulate, and query large amounts of data. Since the
introduction of the relational data model in 1970, the database management
system industry has grown to $100 billion dollars a year and increases
by more that 25% every year. With the new and emerging internet applications
posing new requirements in the DBMS design and implementation, the database
market is expected to grow even faster, and database design and implementation
techniques are constantly evolving to meet the new requirements.
Nevertheless, there is a basic and fairly standard set of techniques
that have been developed to support high-performance database systems.
These techniques include: B+trees, hash-based join algorithms, hierarchical
and two-phase locking, two-phase commit, write-ahead-logging, recovery
using shadow pages, and several query optimization strategies. The synergy
amongst different techniques often poses restrictions; the design and implementation
choices made at a certain module of the system may affect its interaction
with others, and can place constraints on decisions at other levels that
may not be immediately apparent.
The goal of this course is to investigate the traditional techniques
and their interactions by studying several seminal papers and survey studies
in the area. The course also involves a large project.
TOPICS COVERED
-
Introduction to Database systems - Goals and functions of DBMS, Transactions,
Reference architecture for a relational DBMS, etc.
-
"The Roots" - Overviews of the classic relational systems: System R, and
INGRES.
-
Architectural Foundations - Performance, availability, and reliability
characteristics of hardware and operating systems that impact the design
of a DBMS.
-
Buffer Management - Memory management for multi-user systems, DBMin algorithm,
implications of transaction semantics.
-
Access Paths and Indexes - Structures that are optimized for disk-resident
data: e.g., B+trees and Linear hashing
-
Query Processing - access path selection, join methods, optimization techniques,
sub-query and view processing.
-
Benchmarking and Performance - TPC and Wisconsin benchmarks, performance
measurement, and performance tuning.
-
Concurrency Control - Locking techniques, lock manager implementation,
comparison of pessimistic and optimistic techniques, concurrent access
to search structures, deadlock handling.
-
Logging and Crash Recovery - Write-Ahead-Logging, the ARIES recovery system,
shadow-based techniques, media failure.
-
Data mining for warehouses, emerging internet applications, and new data
representation standards
PREREQUISITES
The course is intended for graduate students and advanced undergraduates.
A good background in DBMS fundamentals is required, therefore, 15-415 (or
equivalent) is a desired prerequisite. Students should be comfortable
with the relational model, SQL, and the basic functions of database systems.
Students should also be capable of implementing a large, complex system
on UNIX in C or C++.
UNIVERSITY UNITS: 12
TEXT
Required:
-
Readings
in Database Systems, Third Edition - edited by Michael Stonebraker
and Joe Hellerstein, Morgan Kaufmann Publisher March 1998. Several of the
papers in this book are available through the ACM
digital library.
Recommended (but NOT required):
-
Gray, J., and Reuter, A., Transaction
Processing: Concepts and Techniques, Morgan Kaufmann, 1993.
-
The Benchmark
Handbook for Database and Transaction Processing Systems, Second
Edition - edited by Jim Gray, MorganKaufmann Publisher, 1993.
-
Jiawei Han and Micheline Kamber, Data
Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
-
Christos Faloutsos, Searching
Multimedia Databases by Content, Kluwer Academic Press, 1996
If you have not taken a database course before, good introductory database
textbooks are:
-
A.Silbershatz, H. Korth and S. Sudarshan, Database
System Concepts,4th edition, McGraw Hill Inc
-
R. Ramakrishnan and J. Gehrke, Database
Management Systems.
METHOD OF EVALUATION
The grading is as follows:
Project |
50% |
Midterm Exam |
20% |
Final Exam |
20% |
Homeworks |
10% |
Projects will be carried out in teams. A detailed handout about the project
will be distributed at the beginning of the course, along
with a list of suggested projects. The goal of the project is to give
the participants the opportunity to tackle a large, interesting problem,
which may lead to a publication.