Over 100 hours of video are uploaded to the Internet every minute, providing a vast amount of information for activities such as data mining and training computer-learning algorithms. This technology is a system that applies a classification algorithm to an unconstrained video file. It builds upon the IBM-Columbia multimedia event recounting (MER) system and solves the problem of automated video browsing and text summarization. The algorithm takes a video file as input and recognizes clusters that are close in time and correspond to a particular event. Textual outputs are generated which classify the visual, action, and audial characteristics of each segment in the video. These textual outputs can be used to automatically classify vast amounts of unconstrained video information. Thus, this technology may help video information become amenable to both simple querying and advanced data mining.
Currently, the only way videos can be indexed is through user-inputted text captions. If a video does not contain this extra information, it cannot be indexed. Using a set of design decisions based on the close connection between the ontology of semantic classifiers and the functional aspects of natural language, this technology can automatically summarize video files in one-sixth the usual time and produce text captions useful for understanding what a video is truly about.
The algorithm is informed throughout by human psychology and user studies, and has been validated with user input and training videos.
Patent Pending (WO/2005/072239)
Tech Ventures Reference: IR CU14303