Columbia Technology Ventures

Scalable software and hardware for deep neural network development

This technology is software and hardware for efficient initialization and optimization of deep neural network (DNN) parameters that can be used for training large-scale, optimally robust DNN models.

Unmet Need: Resource-efficient method for training large-scale neural networks in deep learning

To perform more complicated tasks, deep neural networks (DNNs) have been growing larger and more complex. With current randomization-based weight initialization and update methods, training the many parameters of these large-scale DNNs requires extensive amounts of computational resources, data, and time, which can hinder deep learning models under development from reaching target performance. At times, even with enough resources, current optimization methods do not guarantee the final model to be robust and best-performing at the task.

The Technology: Weights-distribution approach for resource-efficient optimization of deep neural networks

This technology is software that initializes and trains DNNs by utilizing the log-normal distribution of node connection weights that define optimally robust DNNs. This log-normal distribution was empirically verified by real-life large-scale DNNs of different architectures and sizes, ranging from millions to billions of parameters. By initializing the weights of a DNN with this distribution, the model starts at a state closer to its theoretical optimal state and, therefore, requires less training. During the initial stages of training, tuning the three parameters that define the log-normal weights distribution instead of all model parameters both optimizes the model and conserves computational resources. When iteratively updating model parameters, constraining weights to log-normal distribution also decreases the chances of the model becoming stuck with suboptimal parameters. In parallel, the technology’s hardware uses log-normal distributed, tunable connections between its layers to accelerate DNN training.

Applications:

  • Generative AI or large language models for chatbots
  • Image recognition for healthcare, facial recognition, and search
  • Self-driving cars
  • Natural language processing for digital assistants
  • Deep learning for research in fields with large amounts of data
  • DNNs for predictive analytics

Advantages:

  • Scalable to large DNNs
  • Generalizable to different network architectures
  • Efficient use of computational resources, data, and time
  • Cost-effective

Lead Inventor:

Venkat Venkatasubramanian, Ph.D.

Patent Information:

Patent Pending

Related Publications:

Tech Ventures Reference: