Carnegie Mellon University
15721 Database System Design and Implementation
Spring 2003 - C. Faloutsos
Homework 2 - Due: 4/10
0) Reminders:
0.1) Time estimates
-
Rough time estimate: approximately 6 days:
-
1 day: to become familiar with R-tree code
-
3 days: to design and implement the 'counter' per page
-
2 days: testing and debugging
1) R-trees and 'count' queries [100pts]
Use the R-tree package provided, and augment it with counters. The goal
is to accelerate 'count' queries, like 'how many galaxies are
inside the rectangle (xlow, xhigh, ylow, yhigh, zlow, zhigh)'. Thus,
for every R-tree node, make sure you have a counter with the points (or
rectangles) in that sub-tree. Of course, you'll need to modify the insertion
and deletion routines appropriately, to maintain the correct counts.
1.1) Details
-
INPUT DATA: at http://www-2.cs.cmu.edu/~christos/courses/721.S03/HW2/3d-100k.txt.gz
. FYI, it is 6.5Mb uncompressed (100,000 3-d points)
-
CODE: The code for the R-Tree is at http://www-2.cs.cmu.edu/~christos/courses/721.S03/HW2/DRtree.tar.gz
It is in C; 'gunzip; tar xvf' and do 'make demo'.
This creates the
bin/DRmain program and runs it on some small
datasets.
-
IGNORE the 'warnings' of the compiler and the 'make: Fatal error'.
-
Do 'bin/DRmain' and insert some points of your own, to become familiar
with the package.
-
the program expects rectangles - treat points as degenerate rectangles
(x_low = x_high etc).
-
the first line of the data file is header info.
-
in case of unexplained errors, do 'make spotless' , to delete all
libraries etc.
-
use 'DRmain -float' to insert floats instead of integers.
-
SPECS: We want to implement 'count' queries. Currently the R-tree
package supports 's' for range search, 'i' for insertion etc. You have
to implement 'c' for 'count queries' - then, your program should:
-
read the xlow, xhigh etc coordinates of the desired range
-
print the count of galaxies in the given range
-
IMPORTANT: your code should handle any dimensionality,
like the current R-tree does.
-
IMPORTANT: your code should explicitly store the counts of
rectangles in each sub-tree
-
if you run out of disk space, contact the instructor, and try your algorithms
on the first 9,999 galaxies.
3) What to Turn In
-
[70pts] Hard copy: a printout of your source code (you are
welcome to give only the parts dealing with 'count queries')
-
[30pts] Hard copy: results (= galaxy counts) from your program,
applied on several ranges - we'll announce the query ranges later.
For your information (no points for this part)
If you want really large datasets, and/or are interested in astronomy,
check the Sloan Digital Sky Survey (SDSS).
The project expects to have half a billion galaxies, with their coordinates,
spectra, images and much more.
Last modified by Christos Faloutsos, 3/16/2003