This technology is software and hardware for efficient initialization and optimization of deep neural network (DNN) parameters that can be used for training large-scale, optimally robust DNN models.
To perform more complicated tasks, deep neural networks (DNNs) have been growing larger and more complex. With current randomization-based weight initialization and update methods, training the many parameters of these large-scale DNNs requires extensive amounts of computational resources, data, and time, which can hinder deep learning models under development from reaching target performance. At times, even with enough resources, current optimization methods do not guarantee the final model to be robust and best-performing at the task.
This technology is software that initializes and trains DNNs by utilizing the log-normal distribution of node connection weights that define optimally robust DNNs. This log-normal distribution was empirically verified by real-life large-scale DNNs of different architectures and sizes, ranging from millions to billions of parameters. By initializing the weights of a DNN with this distribution, the model starts at a state closer to its theoretical optimal state and, therefore, requires less training. During the initial stages of training, tuning the three parameters that define the log-normal weights distribution instead of all model parameters both optimizes the model and conserves computational resources. When iteratively updating model parameters, constraining weights to log-normal distribution also decreases the chances of the model becoming stuck with suboptimal parameters. In parallel, the technology’s hardware uses log-normal distributed, tunable connections between its layers to accelerate DNN training.
Venkat Venkatasubramanian, Ph.D.
Patent Pending
IR CU24101
Licensing Contact: Greg Maskel