Running Sparse and Low-Precision Neural Networks: When Algorithm Meets Hardware
Presentation Menu
Fast growth of the computation cost associated with training and testing of deep neural networks (DNNs) inspired various acceleration techniques. Reducing topological complexity and simplifying data representation of neural networks are two approaches that popularly adopted in deep learning society: many connections in DNNs can be pruned and the precision of synaptic weights can be reduced, respectively, incurring no or minimal impact on inference accuracy. However, the practical impacts of hardware design are often ignored in these algorithm-level techniques, such as the increase of the random accesses to memory hierarchy and the constraints of memory capacity. On the other side, the limited understanding about the computational needs at algorithm level may lead to unrealistic assumptions during the hardware designs. In this talk, we will discuss this mismatch and show how we can solve it through an interactive design practice across both software and hardware levels.