12:00, Wed 28 May 1997, WeH 7220 Statistical Learning Algorithms Based on Bregman Distances John Lafferty We present a class of statistical learning algorithms formulated in terms of minimizing Bregman distances, a family of generalized entropy measures associated with convex functions. The inductive learning scheme is akin to growing a decision tree, with the Bregman distance filling the role of the impurity function in tree-based classifiers. Our approach is based on two components. In the feature selection step, each linear constraint in a pool of candidate features is evaluated by the reduction in Bregman distance that would result from adding it to the model. In the constraint satisfaction step, all of the parameters are adjusted to minimize the Bregman distance subject to the chosen constraints. We introduce a new iterative estimation algorithm for carrying out both the feature selection and constraint satisfaction steps, and outline a proof of the convergence of these algorithms. Informal Note: Lately, I often ask machine learning types if they've heard of Bregman distances. No one I've ever asked has! This family of "distances" might be worth knowing about, however, in light of Csiszar's results on axiomatic characterizations of statistical inference problems. They are also widely used in certain inverse problems that come up in computerized tomography. This is based on a paper I'll present at the upcoming Canadian Workshop on Information Theory at the Fields Institute in Toronto.