Columbia Technology Ventures

Martingale boosting for accelerated machine learning with misclassification noise

This technology is a boosting algorithm for accelerated machine learning in the presence of misclassification noise.

Unmet Need: Machine learning algorithms for high-accuracy prediction in noisy data sets

Although boosting algorithms may reduce predictive error, they perform poorly when error or noise exists in a training data set. The poor performance of boosting procedures often results from over-fitting the training data set, since the later resampled training sets can over-emphasize examples that are noise. Thus, there is a need for boosting procedures that maintain good predictive characteristics when applied to noisy data sets.

The Technology: Boosting algorithm with optimal accuracy despite noise-ridden data

The Martingale boosting algorithm combines simple predictors into more sophisticated aggregate predictors for automated learning systems. Learning proceeds in stages, and at each stage, the algorithm segments training data examples into bins. The boosting algorithm chooses a base classifier for each bin and facilitates noise-tolerant prediction based on probability. This approach is relatively simple and easily understood, significantly reduces predictive error, and achieves optimal accuracy despite noise in data.

Applications:

  • Natural language processing
  • Information retrieval
  • Speech processing
  • Behavior prediction
  • Face recognition
  • Handwriting recognition

Advantages:

  • Yields significant reduction in predictive error
  • Achieves optimal accuracy despite noise within training data
  • Simple and easy to understand

Lead Inventor:

Roger N. Anderson, Ph.D.

Patent Information:

Patent Status

Related Publications:

Tech Ventures Reference: