15-721 Assignment 5:
External Sorting

Out: November 7th -- Due: November 26th

Introduction

In this assignment, you will implement a general multi-way external sorting algorithm. You will work with the interface to the (supplied versions of the) buffer manager, the heap file class and the HFPage class.

Note that the sorting code in Minibase is more sophisticated than this assignment requires; so follow the description of sorting in this handout, rather than the HTML description of sorting in Minibase. The main contact person for this assignment is stavros@cs.cmu.edu.

Where to Find Makefiles, Code, etc.

Copy all the files from /afs/cs/academic/class/15721-f01/mini_hwk/assign/ExtSort/src into your own local directory. The files of interest to you are:

Other useful include files: heapfile.h, scan.h, tuple_utils.h, tuple.h, hfpage.h, buf.h and new_error.h are found in /afs/cs/academic/class/15721-f01/mini_hwk/assign/ExtSort/include.

As in previous assignments, the makefile has two targets: a default target which produces an executable named exsort with your code linked in and a target solution which produces an executable with the reference solution. Also, the makefile has the usual handin, uuencode, etc. targets. Finally, do not forget to make depend!

What You Have to Implement

You have to implement the following:

class Sort {
public:

Sort(
char*    inFile,                // Name of unsorted heapfile.

char*    outFile,               // Name of sorted heapfile.
 
int      len_in,                // Number of fields in input records.
  
AttrType  in[],                 // Array containing field types of input records.
                                // i.e. index of in[] ranges from 0 to (len_in - 1)

short    str_sizes[],           // Array containing field sizes of input records.

int      fld_no,                // The number of the field to sort on.
                                // fld_no ranges from 0 to (len_in - 1).

TupleOrder  sort_order,         // ASCENDING, DESCENDING
  
int         amt_of_buf,         // Number of buffer pages available for sorting.

Status&     s                   // Status of call
); 

~Sort();                        // destructor

};

Internal Design

You have to sort a file of fixed size records, stored in a heap file on disk, using a given amount of main memory. Let the amount of memory that is given be SORTPGNUM pages.

Pass 1: To simplify matters, we will assume that the first phase of sorting is performed by dynamically allocating this amount of main memory. Read the data from the unsorted heap file, one record at a time, into this `sorting area.' Do this until the sorting area is full (i.e., you've used up the available memory).

Sort this set of records by calling the qsort library function. Your call to qsort will take a pointer to your sorting area, the number of records contained therein, the size of each record, and a pointer to your comparison function. (The ordering function that you pass to qsort should call the CompareTupleWithTuple function that we supply in the tuple_utils module.) qsort does a generalized quick-sort, calling your comparison function to determine the order of any two records it needs to compare. Your comparison function should take two pointers to Tuples, and return -1, 0, or 1 in the manner of strcmp. See the man page on qsort for more information.

After calling qsort, write the sorted run out to a temporary heap file.

  1. Use one temporary file per run.
  2. To write the run out, simply create a new HeapFile object with a unique name, and repeatedly call the insertRecord method. The fact that all the records are the same size means that the HeapFile will maintain your sorted order. You will not need to keep track of the pages that are pinned and unpinned, but the idea is that only one page needs to be dirtied at a time.
  3. Destroy the temporary HeapFile object when you have finished writing it out. This closes the file, and unpins the file's header page.
  4. Remember the name you use for each temporary file. You will need it to re-open the file during the next pass.

Passes 2 and above: In the merging passes, you will primarily use frames in the buffer pool, rather than additional main memory, unlike Pass 1. Nonetheless, to reflect the fact that you are allowed to use SORTPGNUM pages of main memory, you should use SORTPGNUM-1 buffer frames as input buffers, and one buffer frame as an output buffer. The simplest way to do this is to open scans on SORTPGNUM-1 (heap files containing) runs from the previous pass. Heap file scans pin only one page at a time; as the scan `moves off' a page, the page is unpinned by the scan. Thus, as you scan the input runs, you have at most one page pinned per run. As for the output, you should create a new temporary file, and insert records into them from the input runs, `merging' the input runs. Again, as above, the HeapFile will preserve your order and will only use one page at a time.

The temporary files used to hold input runs should be deleted (with the deleteFile method) after the pass in which they are merged. The sorting is complete when you are left with just one file.

An important note on memory usage: We have deliberately simplified this assignment by providing you the HeapFile class. In the process, we have fudged the issue of memory usage a little.

A HeapFile object keeps one header page pinned for the duration of its existence, in addition to the page that holds the current record you are reading or writing. So the amount of buffer-pool memory you will actually use to read or write using a HeapFile is two pages at a time, rather than one page at a time, as we have assumed. In addition, the interface to read a record from a HeapFile requires you to copy the record to some extra memory--this also slightly increases the actual memory you will use during the merge passes.

The point is that your sorting code would be able to actually use only SORTPGNUM pages, simply by replacing HeapFile with a more space-efficient record-file class.

What to Turn In, and When

You should turn in copies of your code (ie. files sort.[Ch] and possibly Makefile if you need to modify it).

The assignment is due at midnight on November 26th. The solution will be made public after that time, and solutions turned in after that time will not receive any credit. So be sure to turn in whatever you have, even if it is not working fully, at that time.

About this document ...

15-721 Assignment 5:
External Sorting

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.47)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_navigation -split 0 ExtSort.tex

The translation was initiated by on 2001-11-07


2001-11-07