This technology is a series of algorithms capable of transforming sound into a two-dimensional surface approximating the peak in the original spectro-temporal pattern.
As computers and computational devices become increasingly prevalent, the need for vocal recognition and control increases. For computational speech recognition, the computer must accurately distinguish human speech from background noises and recognize individual words. Given the high variability in both background noise and patterns of speech, a single technique for speech recognition is insufficient. The ability to mathematically characterize complex sounds to differentiate between individual words and background noise is necessary for speech recognition and may also be useful for the design of computer-synthesized sounds.
The invention is comprised of a series of algorithms for transforming sound into a two-dimensional surface of time and frequency. By utilizing Frequency Domain Linear Prediction (FDLP) and Perceptual Linear Prediction (PLP), complex sounds can be transformed into a two-dimensional surface of time and frequency. The result is a smooth continuous spectral profile at each time step. This technology can be used for numerous applications, including removal of background noise and computational speech recognition.
Daniel P.W. Ellis, Ph.D.
IR M04-009, M04-066
Licensing Contact: Greg Maskel