15-721 Assignment 2:
Heap Files
Out: September 24th -- Due: October 8th
This assignment has two major parts. You have to do the following:
- Familiarize yourself with the heap file and heap file page
interfaces. You should already be familiar with the buffer manager
and disk space manager interfaces from the previous assignment.
- Implement the HeapFile class--your solution for this
part should be contained in heapfile.C (along with any
modifications you may want to make in heapfile.h).
- Implement the HFPage class--your solution for this
part should be contained in hfpage.[Ch].
The main contact person for this assignment is Spiros Papadimitriou.
Please direct your questions, complaints and/or flames to
spapadim+@cs.cmu.edu!
From the previous assignment, you should be familiar with how the
buffer manager works. Make sure that you know how the heap file is
organized and works. You should also be familiar with the slotted
page model and how variable length records can be stored in such a
page. In addition, HTML documentation is available for Minibase
(mirrored locally on the course webpages).
There are four main classes with which you should be familiar:
HeapFile, HFPage, BufferMgr and
DB. A heap file is seen as a collection of records.
Internally, records are stored on a collection of HFPage
objects.
Copy all files from
/afs/cs/academic/class/15721-f01/mini_hwk/assign/Heap/src
into your own local working directory and study them carefully. The
files are:
- Makefile: A sample Makefile for you to compile your
project. Study this before you make changes. You should not need
to modify this, but if you do (for instance, if you want to split
the solution in more files than we suggest), please make sure you
hand it in as well!
- Several header files. You have to look at:
- heapfile.h: The class interface for the heap file,
which you will implement in the first part.
- hfpage.h: The class interface for individual heap
file pages, which you will implement in the second part.
- buf.h, db.h and new_error.h:
The buffer manager, db class and error protocol; you should
aready be familiar with these.
First of all, create the files for your solution, ie.
heapfile.C and hfpage.C. Then do not forget to run
`make depend'! You need to run this only once (or
whenever you change included files etc.) to produce
Makefile.depend.
Run `make heap' (or just `make'). This will compile
the .C files in your directory, and create a program called
heap. Run this command, and you should see output from
running all tests.
Besides the default heap target, the provided Makefile also
offers the following options:
- solution: This will produce an executable that has our
solution object files (located in libwith.a in the
assignment directories, in case you are curious). You can use this
as a reference implementation.
- heapfile: This will link in your code for the heapfile
(in heapfile.C) and our reference solution for the heap
file pages, so you can test your heap file implementation in
isolation.
- hfp: Same as heapfile, but for the heap file
page implementation.
Note that if you submit a modified Makefile, your default target must
be heap and produce an executable of the same
name--our automated testing scripts will expect this, so please
follow this!
Finally, as in the previous assignment, you should use the GNU
version of `make'!
In this part you will implement the overall heap file manager. You
can ignore concurrency control and recovery issues, and concentrate on
implementing the interface routines.
A heap file is an unordered set of records that supports the following
set of operations:
- Heap files can be created and destroyed.
- Existing heap files can be opened and closed.
- Records can be inserted and deleted.
- Each record is uniquely identified by a record ID (RID). A
specific record can be retrieved by using its record ID.
The above correspond to the public member functions we expect you to
implement--private members are entirely up to you. Note that the
destructor should simply call deleteFile(), which in turn
permanently wipes out the heap file from the database.
There are two main pieces of code to be written:
- The free space manager: You must deal with free space
intelligently, using a directory based structure (recommended) or a
linked list to identify pages with room for records. When a record
is deleted, you must update your information about available free
space, and when a record is inserted, you must try to use free space
on existing pages before allocating a new page.
- Implement the class Scan (instances of which are
returned by openScan()), which performs a scan on the heap
file. A scan iterator supports a simple getNext()
interface. Successive calls to the getNext() function
return successive elements in the heap file. You have to maintain
state information (which depends on your particular implementation
of your heapfile) in the scan class in order to achieve this effect.
In this part, you will implement the page structure for the heap file
layer (ie. the member function for the HFPage class). You
will be given libraries for the lower layers (Buffer Manager and Disk
Space Manager), code for the HeapFile layer, and some driver routines
to test that code.
Have a look at the file hfpage.h. It contains the interfaces
for the HFPage class. This class implements a "heap-file
page" object. Note that the protected data members of the page are
given to you. All you should need to do is implement the public member
functions. You should put all your code into the file hfpage.C.
A note on the slot directory: The first entry of the slot directory
(i.e., HFPage::slot[0]) is right at the end of the data area
of the page. If you want to add more slots to the page (and you will),
you will have to put them in the data area of the page, which means
two things:
- New slots will be at negative offsets in the slot directory
array, i.e. the next slot after 0 will be found at
HFPage::slot[-1], the one after that at
HFPage::slot[-2], etc.
- In order to add a record to a page, there has to be room for the record
itself in the data area, and also room for a new slot in the data
area (unless there happens to be pre-allocated slot that's empty).
Here is an overview of the API you have to implement:
- void HFPage::init(PageId pageNo): This member function is
used to initialize a new heap file page with page number
pageNo. It should set the following data members to
reasonable defaults: nextPage, prevPage,
slotCnt, curPage, freePtr,
freeSpace. (The nextPage and prevPage data
members are used for keeping track of pages in a HeapFile.)
A good default unknown value for a PageId is -1.
Note that freePtr is an offset into the data array, not
a pointer.
- void HFPage::dumpPage(): A debugging utility you can
use if you wish to print out the contents of a page. This member
function is not required for the assignment, but it's probably a
good idea to do it for your own debugging use.
- PageId HFPage::getNextPage(): This member function
should return the page ID stored in the nextPage data
member.
- PageId HFPage::getPrevPage() This member function
should return the page ID stored in the prevPage data
member.
- void HFPage::setNextPage(): This member function sets
the nextPage data member.
- void HFPage::setPrevPage(): This member function sets
the prevPage data member.
- Status HFPage::insertRecord(char * recPtr, int reclen,
RID& rid): This member function should add a new record to the
page. It returns OK if everything went OK, and
DONE if sufficient space does not exist on the page for the
new record. If it returns OK, it should set rid to
be the RID of the new record (otherwise it can leave rid
untouched.)
The Status enumerated type is defined in
new_error.h if you're curious about it. You may want to
look over that file (located in the assignment's include directory
in the course directories) and handle errors in a more informative
manner than suggested here.
The RID structure is defined to be:
typedef struct RID {
PageId pageNo;
int slotNo;
int operator== (const RID rid) const
{ return pageNo==rid.pageNo; slotNo==rid.slotNo; };
int operator!= (const RID rid) const
{ return pageNo!=rid.pageNo; slotNo!=rid.slotNo; };
};
- Status HFPage::deleteRecord(const RID& rid): This
member function deletes the record with RID rid from the
page. It returns OK if everything goes OK, or FAIL
otherwise (what could go wrong here?). This routine must compact
the remaining records on the page into a contiguous area of memory,
and make sure that the slot directory remains correct. You should
leave a ``hole'' in the slot array for the slot which pointed to the
deleted record, if necessary, to make sure that the rids of the
remaining records do not change.
- Status HFPage::firstRecord(RID& firstRid): This
routine should set firstRid to be the RID of the ``first''
record on the page. The order in which you return records from a
page is entirely up to you. If you find a first record, return
OK, else return DONE.
- Status HFPage::nextRecord(RID& curRid, RID& nextRid):
Given a current RID, curRid, this member function stores
the next RID on the page in the nextRid variable. Again,
the order your return records is up to you, but do make sure you
return each record exactly once if someone continually calls
nextRecord! If you find a next RID, return OK,
else return DONE.
- Status HFPage::getRecord(RID rid, char *recPtr, int&
recLen): Given a rid, this routine copies the associated record
into the memory address *recPtr. You may assume that the
memory pointed to *recPtr has been allocated by the caller.
recLen is set to the number of bytes that the record
occupies. If all goes well, return OK, else return
FAIL.
- Status HFPage::returnRecord(RID& rid, char*& recPtr,
int& recLen) This routine is very similar to
HFPage::getRecord, except in this case you do not copy the
record into a caller-provided pointer, but instead you set the
caller's recPtr to point directly to the record on the
page. Again, return either OK or FAIL.
- Status HFPage::returnOffset(RID rid, int& offset):
Given a RID, this routine returns the offset into
HFPage::data where the associated record is stored.
- bool HFPage::empty(void): Returns true if the page has
no records in it, and false otherwise.
- void HFPage::compact_slot_dir(): Remember that the
slot directory can only be compacted by removing a consecutive
sequence of empty slots that are at the very end of the slot
directory--otherwise, the RIDs of remaining records might be
changed!
You should turn in copies of your code (ie. at least the files
heapfile.[Ch] and hfpage.[Ch]). Please use the same
scheme as for the previous assignment, ie. copy your files into:
/afs/cs/academic/class/15721-f01/handin/[AndrewID]/Heap.
If you have not created any more files and/or modified the Makefile,
then you should be able to use the `make handin' shorthand.
If anything goes terribly wrong with the AFS handin directories, you
can email your files to one of the TAs before the deadline is
over. The uuencode target is provided once again, should you
need it (although standard MIME attachments are preferred).
The assignment is due at midnight on October 8th. The solution will
be made public after that time, and solutions turned in after that
time will not receive any credit. So be sure to turn in whatever you
have, even if it is not working fully, at that time.
This document was generated using the
LaTeX2HTML translator Version 99.1 release (March 30, 1999)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.