CARNEGIE MELLON UNIVERSITY

15-826/10-603 - Multimedia databases and data mining

Spring 2002

Homework 1 - Due: Feb. 5, in class.

General comments:

Q1: SQL [45 pts]

Use SQL to find statistics for the vocabulary of 'Moby Dick' (from the fascinating Gutenberg project). For your convenience, the text is already transformed into one word per line, at
http://www.cs.cmu.edu/~christos/courses/826.S02/HW1/MOBY-DICK.txt.gz
with the MS-DOS newline convention. Load the data on a database system of your choice (MS-Access is available on the clusters).  For each query give both the resulting table  as well as the corresponding SQL statement (cut and paste your SQL query: e.g., in MS-Access, use the 'SQL view' option)

We want to know

  1. How many vocabulary words are in the text
    1. [8 pts] for the query (you may use 'views' - in which case hand in all the necessary material)
    2. [7 pts] for the result
  2. A list of vocabulary words, along with the occurrence frequency, sorted in decreasing frequency order
    1. [8 pts] for the query
    2. [7 pts] for the results (to save paper, give only the top 20 most frequent words and their frequencies)
  3. How many words appear only once?
    1. [0pts] Before doing any computations, guess the percentage of vocabulary words that appear once only
    2. [8 pts] for the query (again, you may use views, and/or your previous queries)
    3. [7 pts] for the answer
    4. [0pts] What is the actual percentage of vocabulary words that appear only once? How close was your guess?

Q2: Hilbert and z-order [55pts]

Write the code to compute the z-value and the hilbert-value of a 2-d point, as well as the inverses. The command-line syntax should be, e.g. for the 'zorder' version:
zorder -n <order-of-curve> <xvalue> <yvalue>
Thus:
zorder -n 2 0 0 # should return '0'
zorder -n 3 0 1 # should return '1'
horder -n 2 0 0 # should return '0'
Write four programs to do these computations; hand in your source code on hard copy.
  1. [5 pts] zorder should return the z-value of the given (x,y) point.
  2. [5 pts] izorder should give the inverse:
  3. [10 pts] horder should compute the hilbert order of the given point - specifically
  4. STRONG HINTS: use or modify the code by Jagadish [SIGMOD 90]; see the algorithm by Roseman+ [PODS 89]
  5. [15 pts] ihorder should do the inverse - for example
  6. [5pts] Give the results of your programs on the input file
    1. http://www.cs.cmu.edu/~christos/courses/826.S02/HW1/inp.txt
    Make sure you echo the input, so that it is clear which answer refers to which question
  7. [5 pts] Using your izorder, plot a z-curve of order 7 (128x128 grid) and hand in the plot.
  8. [10 pts] Using your ihorder, plot a Hilbert curve of order 7, and hand in the plot.


NOTES:


Christos Faloutsos, 1/23/2002