Medical Text Conversion with Software Tool, AQUA: An Acquisitive Analyzer for Biomedical Text

Lead Inventors: Stephen Bennett Johnson, David Campbell, Eneida Mendonca, Robert Duffy, Chunhua Weng. Biomedical Software to convert Free-Form Prose from Medical Reports are Costly and Time Consuming to Develop Biomedical natural language parsing software converts free-form prose from medical reports to a structured form that can be processed by information retrieval and other analytic programs. This structured data is parsed based on its syntax in order to extract the essential facts in the input text. Most programs syntactically parse via a ""semantic lexicon,"" or complex set of parsing rules. This lexicon must be developed by expert linguists with biomedical knowhow, rendering its construction both costly and time consuming. Further, a distinct semantic lexicon must be developed for each particular type of input data. This invention addresses these limitations with an efficient algorithm for generating its own parsing rules. AcQUisitive Analyzer for Biomedical Text Uses Machine Learning Algorithm AQUA, the AcQUisitive Analyzer for Biomedical Text, uses a machine learning algorithm, transformation based learning, that automatically generates parsing rules based on sample ""training sets,"" specific to a type of biomedical input text. A scoring function is iteratively applied to select and build the set of parsing rules based on how accurately each rule transforms and parses the training set text. The training sets are manually parsed based on syntax alone - a job that requires only grammatical knowledge, not medical or linguistic expertise. Each training set is specific to a domain of medical text, and the AQUA algorithm can be applied to any type of input with a sufficient training set. Applications: • Adaptable software tool to convert medical reports (e.g., discharge reports, pathology reports) from natural language prose to other formats more amenable to linked systems such as literature or text searches, decision support, or other advanced analytics programs. • Improve accuracy and relevance of internet searches for healthcare topics. • Interpret health-related questions for computer-generated or artificial-intelligence-based systems to provide answers in a ""medical Q&A"" application or website. • Connect medical reports and narratives to electronic health records, and simplify information flow between medical professionals and patients. • Extract and flag current medications and relevant patient information for use by pharmacists. Advantages: • Can be applied to virtually any type of biomedical input text without changing the core software, simply by using a training set of sentences from the domain of desired input. • No linguistic expertise needed to prepare the test set of sentences used to train the algorithm to take new forms of input. Patent Status: Software Copyright Licensing Status: Available for Licensing Publications: David Campbell, Stephen Johnson. A transformational-based learner for dependency grammars in discharge summaries. Proceedings of the Association for Computational Linguistics Workshop on Natural Language Processing in the Biomedical Domain, 37-44. Philadelphia, July 2002.