Lead Inventors:
Hong Yu , Ph.D.;
James J. Cimino, M.D.;
Shih-fu Chang, Ph.D.
Search engine for medical database uses images and natural language processing:
The U.S. National Library of Medicine maintains a publicly searchable database (PubMed) of 14 million articles across 4,800 biomedical journals. This knowledgebase of clinical and basic science is rapidly increasing; doubling in size over the last decade. On-demand access to reports is a critical tool for researchers and clinicians. Yet, identifying relevant information is a major challenge in such a large repository. The most common method is a keyword search using the PubMed engine, which returns a list of articles. The investigator must then read abstracts and follow up by downloading fulltext versions. A variety of data mining techniques, most involving natural language processing (NLP), have been developed to summarize the article database and speed searches. Few, if any, are able to incorporate visual data, such as images and figures into their summaries. A comprehensive tool for information retrieval must include these, since the bulk of evidence is represented visually. The potential market for such a tool is enormous; PubMed has 250 million keyword searches a year and an estimated user base of 25 million people.
Medical search engine uses imaging processing technology to improve search results to include biomedical images :
A web based Question Answering with Biomedical images (QuABI) system was developed. It leverages sophisticated image processing and NLP technologies to address the visual modality in conjunction with article text. The user may specify a question, which is passed to a module which identifies the semantic meaning. Then, a search engine matches the question to relevant parts of articles and figures. A summarization is returned, which may cover various forms of evidence to answer the question. A unique aspect of this search engine is the identification of images which are similar to each other based on color, edges and textures. Thus, search results can be built using the visual evidence to corroborate findings; an analysis method which is currently inaccessible.
Applications:
• Answering ad-hoc biological questions ""Does p53 interact with RAD51?""
• Answering medical questions to aid diagnosis ""What does a poisonous spider bite look like?""
• Queries based on similarity to an image, e.g., a picture of a patient's rash uploaded by a physician.
Advantages:
• Expansion of search space to millions of pieces of visual evidence
• Rapid, story board representation of medical/biological topics using figures from articles
• Summarization of similarities and highlighting of differences between articles
• Improved diagnosis of disease using photographic evidence
• Constant, automatic update and refinement of visual database, using machine learning techniques
• Significant advantage over the free keyword based search offered by PubMed; possibly marketed as a premium subscription service for clinicians and pharmaceutical companies.
Patent Status: Patent Pending (US 11/506,060)
Licensing Status: Available for Licensing and Sponsored Research Support