CARNEGIE MELLON UNIVERSITY
15-826/10-603 - Multimedia databases and data mining
Spring 2002
Homework 1 - Due: Feb. 5, in class.
General comments:
-
Expected time effort for this homework
-
Q1: 30' to figure out the queries; 1-5 hours to run them on the RDBM,
depending on previous experience.
-
Q2: 30'-1h to type and debug the code for the z-ordering; 1h-2h to study
the code for the Hilbert ordering; 3h-12h to design and debug the code
for the inverse-Hilbert ordering.
-
Points are in [bold pts] and add to 100.
-
Please turn in a TYPED
report.
Q1: SQL [45 pts]
Use SQL to find statistics for the vocabulary of 'Moby Dick' (from the
fascinating Gutenberg project). For your
convenience, the text is already transformed into one word per line, at
http://www.cs.cmu.edu/~christos/courses/826.S02/HW1/MOBY-DICK.txt.gz
with the MS-DOS newline convention. Load the data on a database system
of your choice (MS-Access is available on the clusters). For each
query give
both
the
resulting table as well as the
corresponding
SQL statement (cut and paste your SQL query: e.g.,
in MS-Access, use the 'SQL view' option)
We want to know
-
How many vocabulary words are in the text
-
[8 pts] for the query (you may use 'views' - in which case hand in all
the necessary material)
-
[7 pts] for the result
-
A list of vocabulary words, along with the occurrence frequency, sorted
in decreasing frequency order
-
[8 pts] for the query
-
[7 pts] for the results (to save paper, give only the top 20 most frequent
words and their frequencies)
-
How many words appear only once?
-
[0pts] Before doing any computations, guess the percentage of vocabulary
words that appear once only
-
[8 pts] for the query (again, you may use views, and/or your previous queries)
-
[7 pts] for the answer
-
[0pts] What is the actual percentage of vocabulary words that appear only
once? How close was your guess?
Q2: Hilbert and z-order [55pts]
Write the code to compute the z-value and the hilbert-value of a 2-d point,
as well as the inverses. The command-line syntax should be, e.g. for the
'zorder' version:
zorder -n <order-of-curve> <xvalue> <yvalue>
Thus:
zorder -n 2 0 0 # should return '0'
zorder -n 3 0 1 # should return '1'
horder -n 2 0 0 # should return '0'
Write four programs to do these computations; hand in your source code
on hard copy.
-
[5 pts] zorder should return the z-value of the given (x,y) point.
-
[5 pts] izorder should give the inverse:
izorder -n 5 0 # should return the 'x'
and 'y' values, ie, 0 0
izorder -n 2 15 # should return '3 3'
-
[10 pts] horder should compute the hilbert order of the given
point - specifically
horder -n 2 0 0 # should return '0'
horder -n 1 0 1 # should return '1'
horder -n 2 0 1 # should return '3'
STRONG HINTS: use or modify the code by Jagadish
[SIGMOD 90]; see the algorithm by Roseman+
[PODS 89]
-
[15 pts] ihorder should do the inverse - for example
ihorder -n 2 0 # should return '0 0'
ihorder -n 1 1 # should return '0 1'
ihorder -n 2 3 # should return '0 1'
-
[5pts] Give the results of your programs on the input file
http://www.cs.cmu.edu/~christos/courses/826.S02/HW1/inp.txt
Make sure you echo the input, so that it is clear which answer refers
to which question
-
[5 pts] Using your izorder, plot a z-curve of order 7 (128x128
grid) and hand in the plot.
-
[10 pts] Using your ihorder, plot a Hilbert curve of order 7,
and hand in the plot.
NOTES:
-
It is fine to copy and/or modify code from the web, or from existing papers
- but you have to make sure that your programs follow all the above
specifications.
-
Make sure you detect arithmetic overflows
Christos Faloutsos, 1/23/2002