Yiyu Shi Professor University of Notre Dame United States 4 (Central U.S.) Email 2022 2023 Talk(s): Hardware/Software Co-Design Towards TinyML Hardware/Software Co-Design Towards TinyML × In the past a few years, powered by the strong need of edge intelligence, there has been an increasing interest in deploying deep neural networks on tiny hardware with limited computing power and energy (tinyML). A fundamental question needs to be addressed: given a specific edge intelligence task, what is the optimal neural architecture and the tailor-made hardware in terms of accuracy and efficiency? Earlier approaches attempted to address this question through hardware-aware neural architecture search (NAS), where fa fixed hardware design such as microcontrollers or light-weight CPUs are taken into consideration when designing neural architectures. However, we believe that the most powerful and elegant solutions should come from hardware that allows customization, such as FPGAs, ASICs, or Computing-in-Memory accelerators. For these platforms, we are the first to establish the concept of software/hardware co-design, which simultaneously explore neural architecture and the hardware design to identify the best pairs that maximize both test accuracy and hardware efficiency. In this talk, we will present the novel co-exploration frameworks for neural architecture and various hardware platforms including FPGA, ASIC and Computing-in-Memory, all of which are the first in the literature. We will demonstrate that our co-exploration concept greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs in tinyML. Hardware/Software Co-Design Towards Quantum Advantages Hardware/Software Co-Design Towards Quantum Advantages × Despite the pursuit of quantum advantages in various applications, the power of quantum computers in executing neural network has mostly remained unknown, primarily due to a missing tool that effectively designs a neural network suitable for quantum circuit. In this talk, I will present our open-source neural network and quantum circuit co-design framework, namely QuantumFlow, to address the issue. In QuantumFlow, we represent data as unitary matrices to exploit quantum power by encoding n = 2k inputs into k qubits and representing data as random variables to seamlessly connect layers without measurement. Coupled with a novel algorithm, the cost complexity of the unitary matrices-based neural computation can be reduced from O(n) in classical computing to O(polylog(n)) in quantum computing. I will further demonstrate the results on MNIST dataset using IBM quantum processors, which show that QuantumFlow can achieve an accuracy of 94.09% with a cost reduction of 10.85 × against the classical computer. This is the first time that quantum advantage is demonstrated practically on the inference of deep neural networks, and the tool has been accessed over 1,200 times within its first month of release. Hardware-aware Machine Learning for Biomedical Applications Hardware-aware Machine Learning for Biomedical Applications × With the prevalence of deep neural networks, machine intelligence has recently demonstrated performance comparable with, and in some cases superior to, that of human experts in medical imaging and computer assisted intervention. Such accomplishments can be largely credited to the ever-increasing computing power, as well as a growing abundance of medical data. As larger clusters of faster computing nodes become available at lower cost and in smaller form factors, more data can be used to train deeper neural networks with more layers and neurons, which usually translate to higher performance and at the same time higher computational complexity. For example, the widely used 3D U-Net for medical image segmentation has more than 16 million parameters and needs about 4.7×1013 floating point operations to process a 512×512×200 3D image. The large sizes and high computation complexity of neural networks have brought about various issues that need to be addressed by the joint efforts between hardware designers and medical practitioners towards hardware aware learning. In this talk, I will present novel solutions for the data acquisition and data processing stages in medical image computing respectively, using hardware-oriented schemes for lower latency, memory footprint and higher performance in embedded platforms. I will discuss how our hardware-aware machine learning approaches led to the realtime MRI segmentation for prosthetic valve implantation assistance, and enabled the world’s first AI assisted telementoring of cardiac surgery on April 3, 2019.