Scalable software and hardware for deep neural network development

This technology is software and hardware for efficient initialization and optimization of deep neural network (DNN) parameters that can be used for training large-scale, optimally robust DNN models.

Unmet Need: Resource-efficient method for training large-scale neural networks in deep learning

To perform more complicated tasks, deep neural networks (DNNs) have been growing larger and more complex. With current randomization-based weight initialization and update methods, training the many parameters of these large-scale DNNs requires extensive amounts of computational resources, data, and time, which can hinder deep learning models under development from reaching target performance. At times, even with enough resources, current optimization methods do not guarantee the final model to be robust and best-performing at the task.

The Technology: Weights-distribution approach for resource-efficient optimization of deep neural networks

This technology is software that initializes and trains DNNs by utilizing the log-normal distribution of node connection weights that define optimally robust DNNs. This log-normal distribution was empirically verified by real-life large-scale DNNs of different architectures and sizes, ranging from millions to billions of parameters. By initializing the weights of a DNN with this distribution, the model starts at a state closer to its theoretical optimal state and, therefore, requires less training. During the initial stages of training, tuning the three parameters that define the log-normal weights distribution instead of all model parameters both optimizes the model and conserves computational resources. When iteratively updating model parameters, constraining weights to log-normal distribution also decreases the chances of the model becoming stuck with suboptimal parameters. In parallel, the technology’s hardware uses log-normal distributed, tunable connections between its layers to accelerate DNN training.

Applications:

Generative AI or large language models for chatbots
Image recognition for healthcare, facial recognition, and search
Self-driving cars
Natural language processing for digital assistants
Deep learning for research in fields with large amounts of data
DNNs for predictive analytics

Advantages:

Scalable to large DNNs
Generalizable to different network architectures
Efficient use of computational resources, data, and time
Cost-effective

Lead Inventor:

Venkat Venkatasubramanian, Ph.D.

Patent Information:

Patent Pending

Related Publications:

Venkatasubramanian V, Sanjeevrajan N, Khandekar M. “Jaynes Machine: The universal microstructure of deep neural networks.” arXiv. 2023 Oct 10.

Tech Ventures Reference:

IR CU24101
Licensing Contact: Greg Maskel