Overview

A Database Management System (DBMS) is a software system designed to efficiently store, retrieve, manipulate, and query large amounts of data. Since the introduction of the relational data model in 1970, the database management system industry has grown to $100 billion dollars a year and increases by more that 25% every year. With the new and emerging internet applications posing new requirements in the DBMS design and implementation, the database market is expected to grow even faster, and database design and implementation techniques are constantly evolving to meet the new requirements.

Nevertheless, there is a basic and fairly standard set of techniques that have been developed to support high-performance database systems. These techniques include: B+trees, hash-based join algorithms, hierarchical and two-phase locking, two-phase commit, write-ahead-logging, recovery using shadow pages, and several query optimization strategies. The synergy amongst different techniques often poses restrictions; the design and implementation choices made at a certain module of the system may affect its interaction with others, and can place constraints on decisions at other levels that may not be immediately apparent.

The goal of this course is to investigate the traditional techniques and their interactions by studying several seminal papers and survey studies in the area while building a complete, scaled-down, relational database management system. Because the focus of the course is on the design as well as the implementation of DBMSs an emphasis will be placed on a large, semester-long DBMS implementation project. This intensive project will count for a substantial portion of the grade.

Units: 12

Topics Covered

  • Introduction to Database systems - Goals and functions of DBMS, Transactions, Reference architecture for a relational DBMS, etc.
  • "The Roots" - Overviews of the classic relational systems: System R, and INGRES.
  • Architectural Foundations - Performance, availability, and reliability characteristics of hardware and operating systems that impact the design of a DBMS.
  • Buffer Management - Memory management for multi-user systems, DBMin algorithm, implications of transaction semantics.
  • Access Paths and Indexes - Structures that are optimized for disk-resident data: e.g., B+trees and Linear hashing
  • Query Processing - access path selection, join methods, optimization techniques, sub-query and view processing.
  • Benchmarking and Performance - TPC and Wisconsin benchmarks, performance measurement, and performance tuning.
  • Concurrency Control - Locking techniques, lock manager implementation, comparison of pessimistic and optimistic techniques, concurrent access to search structures, deadlock handling.
  • Logging and Crash Recovery - Write-Ahead-Logging, the ARIES recovery system, shadow-based techniques, media failure.
  • Data mining for warehouses, emerging internet applications, and new data representation standards

Prerequisites

The course is intended for graduate students and advanced undergraduates. A good background in DBMS fundamentals is required, therefore, 15-415 (or equivalent) is a desired prerequisite. Students should be comfortable with the relational model, SQL, and the basic functions of database systems. Students should also be capable of implementing a large, complex system on UNIX in C or C++.

Evaluation

This course is being offered for 12 credits. The grading is as follows:
Project60%
Midterm Exam15%
Final Exam20%
Class Participation5%

Finding information or people