Tractable Group Detection on Large Link Data Sets
by Jeremy Kubica, Andrew Moore and Jeff Schneider
BibTeX:
@InProceedings{kubicaKgroups,
author = "Jeremy Kubica and Andrew Moore and Jeff Schneider",
title = "Tractable Group Detection on Large Link Data Sets",
Booktitle = "The Third IEEE International Conference on Data Mining",
Month = "November",
Year = "2003",
Pages = "573--576",
Editor = "Xindong Wu and Alex Tuzhilin and Jude Shavlik",
Publisher = "IEEE Computer Society"
}
Abstract:
Discovering underlying structure from co-occurrence data is
an important task in a variety of fields, including: insurance,
intelligence, criminal investigation, epidemiology, human resources,
and marketing. Previously Kubica et. al. presented the group
detection algorithm (GDA) - an algorithm for finding underlying
groupings of entities from co-occurrence data. This algorithm is
based on a probabilistic generative model and produces coherent groups
that are consistent with prior knowledge. Unfortunately, the
optimization used in GDA is slow, potentially making it infeasible
for many large data sets. To this end, we present k-groups - an algorithm
that uses an approach similar to that of k-means to significantly
accelerate the discovery of groups while retaining GDA's probabilistic
model. We compare the performance of GDA and k-groups on a variety of
data, showing that k-groups' sacrifice in solution quality is
significantly offset by its increase in speed..