Columbia Technology Ventures

Generative Disco: AI for text-to-video for music visualization

This technology is an interactive AI software to create visuals for music and audio.

Unmet Need: User control of text-to-video generation for music visualization

Creating music visualizations can be complex and time consuming. Although automated video generation exists, these AI algorithms often do not give the user granular control of audiovisuals. Without this control, it is difficult to create music visuals that align with the underlying message in the music piece. Without the underlying message accurately portrayed, it can be difficult for creators to build coherent narratives and they often must turn to more resource-intensive manual approaches.

The Technology: Accessible AI platform for generating music visuals with fine user control

This technology is an interactive text-to-video AI platform called Generative Disco. It allows users to choose specific music intervals and define start and end image prompts. Combined with design prompts, the algorithm then generates video visuals that more closely match the user-defined underlying theme of the music. This generative AI platform has broad applications in the audio-visual world, and it can be applied to visualizing music as well as visualizing any general audio prompts, like speeches and performances.

Applications:

  • Music visualization tool for music videos and performance visuals
  • Audio visualization for speeches and performances
  • Video generation for animation, motion graphics, film, and social media
  • Research tool for creating image databases and synthetic datasets for machine learning models
  • Research tool that generalizes to creating visualizations for any kind of waveform input (e.g. music files, speech, brain EEGs)

Advantages:

  • Allows users interactive and fine-grained control over their music input
  • Provides brainstorming support with GPT-4 to assist with prompt exploration
  • Supports design patterns that help increase the coherency and motion quality of text-to-video for music visualization
  • Provides a user interface and full-stack generative system that is easy to use and expressive

Lead Inventor:

Vivian Liu

Related Publications:

Tech Ventures Reference: