# Logistic Regression, AdaBoost and Bregman Distances

@article{Collins2004LogisticRA, title={Logistic Regression, AdaBoost and Bregman Distances}, author={Michael Collins and Robert E. Schapire and Yoram Singer}, journal={Machine Learning}, year={2004}, volume={48}, pages={253-285} }

We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms are iterative and can be… Expand

#### 631 Citations

A comparison of numerical optimizers for logistic regression

- Mathematics
- 2004

Logistic regression is a workhorse of statistics and is closely related to method s used in Machine Learning, including the Perceptron and the Support Vector Machine. This note compares eight… Expand

Bregman Divergences and Surrogates for Learning

- Mathematics, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2009

This paper addresses the problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004), and gives a minimization algorithm provably converging to the minimum of any such surrogate. Expand

On the Convergence Properties of Optimal AdaBoost

- Computer Science, Mathematics
- ArXiv
- 2012

This paper establishes the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004, and proves the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. Expand

Surrogate maximization/minimization algorithms for AdaBoost and the logistic regression model

- Mathematics, Computer Science
- ICML
- 2004

This paper solves the boosting problem by proposing SM algorithms for the corresponding optimization problem and derives an SM algorithm that can be shown to be identical to the algorithm proposed by Collins et al. (2002) based on Bregman distance. Expand

Intrinsic Geometries in Learning

- Mathematics, Computer Science
- ETVC
- 2008

This paper shows that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-linelearning algorithms, are spawns of a generalization of Amari's natural gradient to some particular non-Riemannian spaces. Expand

Ensembles of Partially Trained SVMs with Multiplicative Updates

- Mathematics, Computer Science
- IJCAI
- 2007

This paper shows that the multiplicative update of SVM can be formulated as a Bregman projection problem, and the learning rate can then be adapted automatically, and it is shown that the proposed ensemble has efficient training and comparable or even better accuracy than the best-tuned soft-margin SVM. Expand

Boosting With the L2 Loss

- Mathematics
- 2003

This article investigates a computationally simple variant of boosting, L2Boost, which is constructed from a functional gradient descent algorithm with the L2-loss function. Like other boosting… Expand

Bundle Methods for Regularized Risk Minimization

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2010

The theory and implementation of a scalable and modular convex solver which solves all these estimation problems, which can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties is described. Expand

Boosting as a Regularized Path to a Maximum Margin Classifier

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2004

It is built on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector, and shows that as the constraint is relaxed the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. Expand

Information Geometry of U-Boost and Bregman Divergence

- Mathematics, Computer Science
- Neural Computation
- 2004

It is observed that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation, which leads to a mild convergence property of the U-Boost algorithm as seen in the expectation-maximization algorithm. Expand

#### References

SHOWING 1-10 OF 106 REFERENCES

Additive models, boosting, and inference for generalized divergences

- Mathematics, Computer Science
- COLT '99
- 1999

A framework for designing incremental learning algorithms derived from generalized entropy functionals based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform is presented. Expand

Duality and Auxiliary Functions for Bregman Distances

- Mathematics
- 2001

Abstract : In this paper, the authors formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints. The duality result is then used… Expand

Special Invited Paper-Additive logistic regression: A statistical view of boosting

- Mathematics
- 2000

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data… Expand

Improved Boosting Algorithms Using Confidence-rated Predictions

- Computer Science
- COLT' 98
- 1998

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a… Expand

A decision-theoretic generalization of on-line learning and an application to boosting

- Computer Science
- EuroCOLT
- 1995

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand

Statistical Learning Algorithms Based on Bregman Distances

- Mathematics
- 1997

| We present a class of statistical learning algorithms formulated in terms of minimizing Bregman distances, a family of generalized entropy measures associated with convex functions. The inductive… Expand

Boosting as entropy projection

- Mathematics, Computer Science
- COLT '99
- 1999

It is shown how AdaBoost’s choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that thenew distribution is orthogonal to the vector of mistakes of the current weak hypothesis. Expand

Soft Margins for AdaBoost

- Mathematics, Computer Science
- Machine Learning
- 2004

It is found that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. Expand

Exponentiated Gradient Versus Gradient Descent for Linear Predictors

- Computer Science, Mathematics
- Inf. Comput.
- 1997

The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data. Expand

Arcing the edge

- 1997

Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease… Expand