Complexes clustering
In this page we present results obtained using the 79 experiments
from Michael Eisen's
clustering paper .
This
dataset is a combination of several independent time series experiments.
We used the
Mips complexes database and looked at the 70 top level
complexes reported in that database. Each of these complexes is a
collection of genes known to participate in some complex creation. We used the
979 genes that appear in the Eisen's dataset and are categorized by the MIPS
complexes.
As indicated in the Eisen paper, when genes are compared across a number of
non identical conditions, noise that is
present in single observation does not contribute significantly to
the resulting similarity. Thus, we expect genes that participate in the
same complex construction to have similar expression patterns in this
large dataset. The following figures compare the results of hierarchical
clustering with (right) and without (left)
optimal leaf ordering. The smaller figures are
enlargements of the same cluster in the two figures. The numbers to the
right of the small figures represent the complex to which the gene belongs
Click here for a table that translates
the seven largest complexes numbers to the complex name in the MIPS database
(the rest of the numbers where given according to the order they appear in the
MIPS database, 10 to the first, 20 to the second etc.).
As can be seen, using optimal ordering,
genes that belong to the same complex (640) are
grouped much tighter together. This can help the user determine not only the
cluster but also which genes are at the 'center' of the cluster.
This demonstrates that using optimal ordering one can
arrive at clusters in which their 'center' (i.e. genes that appear in the
center of the cluster) is a better representation of the cluster. When a user
picks the clusters in hierarchical clustering, at least some of the genes
in each of the clusters are not highly correlated with the cluster itself
(since the number of clusters is limited and all genes are assigned to
at least one cluster). Using optimal ordering, such genes are usually placed
on the 'borders' of the clusters (since they are not highly correlated with
genes in the center of the cluster). Thus, the notion of 'center' gets a new
meaning. Genes that are placed in the center of the cluster when using
the optimal
ordering algorithm are genes which are highly correlated with other genes in
the cluster and thus with the cluster itself.
These are the genes the user should focus on.
| Hierarchical clustering |
Optimal ordering |
The two files needed to view the full results in TreeView
(including the gene names and group assiginments)
are: Click here for the .CDT file.
Click here for the .GTR file.