This method involves an efficient algorithm that uses a discriminative probabilistic model to find optimal solutions useful in data mining, parsing, entity recognition, and general information extraction. It is able to improve the efficiency of CRF optimization by employing novel bounds on the conditional likelihood. These bounds allow for the derivation of a new optimization technique similar to Expectation-Maximization algorithms, replacing the need for gradient and second order optimization methods.
Frequently in contemporary datasets fields are left blank by human annotation. These missing or hidden training data plague standard optimization methods by creating conditional likelihood with local optima, thus yielding suboptimal solutions. Generally, this occurs because most currently available optimization tools are not specialized to handle conditional likelihood of the data. This technology utilizes a technique derived from conditional “Expectation-Maximization” algorithms, which improves its efficiency and ability to properly interpret local optima created by incomplete data. This translates to an algorithm with an improved ability to label and parse sequential data, decode web pages, and solve computer vision tasks. Tasks such as named entity recognition and information extraction are also improved.
This technology has been tested with computational modeling.
Patent Pending (US 20120317060)
Tech Ventures Reference: IR M11-115