Date of Completion
11-4-2020
Embargo Period
11-4-2020
Keywords
Neural Network, Parallelization, Network Pruning, Model Compression
Major Advisor
Sanguthevar Rajasekaran
Associate Advisor
Jinbo Bi
Associate Advisor
Qian Yang
Field of Study
Computer Science and Engineering
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
Deep neural networks (DNNs) have achieved significant success in many applications, such as computer vision, natural language processing, robots, and self-driving cars. With the growing demand for more complex real-world applications, more complicated neural networks have been proposed. However, high capacity models result in two major problems: long training times and high inference delays, making the neural networks hard to train and infeasible to deploy for time-intensive applications or resource-limited devices. In this work, we propose multiple techniques to accelerate the training and inference speed as well as model performance
The first technique we study is model parallelization on generative adversarial networks (GANs). Multiple orthogonal generators with shared memory are employed to capture the whole data distribution space. This method can not only improve the model performance but also alleviate the mode collapse problem that is common in GANs. The second technique we investigate is the automatic network pruning. To reduce the floating-point operations (FLOPs) to a proper level without compromising accuracy, we propose a better generalized and easy-to-use pruning method, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights. Weakly coupled gradient update rules are proposed to keep consistency with pruning tasks. The third technique is to remove the redundancy of the complicated model based on the need of applications. We treat the chemical reaction prediction as a translation problem and apply a low capacity neuron translation model to this problem. The fourth technique is to combine distillation with Differentiable Architecture Search to stabilize and improve the searching procedure. Intermediate results as well as the output logits are transferred from the teacher network to the student network. For the application of the speedup technique, we introduce neural network pruning into Materials Genomics. We propose attention based AutoPrune for the kernel pruning of a continuous filtering neural network for molecular property prediction and achieves better performance and more compact size.
Recommended Citation
Xiao, Xia, "Speedup Techniques for Deep Neural Networks" (2020). Doctoral Dissertations. 2640.
https://digitalcommons.lib.uconn.edu/dissertations/2640