Increase in Similarity for Randomly Generated Data
For each leaf we chose 60 values representing 60 time points
(or different experiments). These values where either -1 (representing a
decrease in the gene expression), 0 (no change) or 1 (increase).
For a given set of randomly generated data, we computed their
similarity matrix, and then hierarchically clustered these data points.
Denote by Tr the resulting tree. Denote by S(Tr) the sum
of the similarity between adjacent leaves in the (current)
linear ordering of Tr. Denote by D(Tr) the sum
of the similarity between adjacent leaves after performing
our optimal ordering algorithm.
We denote the increase in similarity of D(Tr)
by I = (D(Tr) - S(Tr))/S(Tr). The next
figure shows how I changes as a function of the
number of leaves (n).
As can be seen, even for large number of leaves (1500), I
is on average quit big, indicating that optimal ordering
has a big impact on the
similarity of neighboring leaves in the linear ordering.