Presentation Type
Lecture

Machine Learning for VLSI Reliability, Power and Thermal Analysis (Tutorial)

Presenter
Title

Sheldon Tan

Country
USA
Affiliation
University of California, Riverside

Presentation Menu

Abstract

Recently machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive applications. Machine learning for electronic design automation (EDA) is also gaining significant Tutorial Titles (Maximum 3) traction as it provides new computing and optimization paradigms for many challenging design automation problems with complex nature. Today’s chip designers and EDA developers face several many challenges in advanced technologies from technology and physical levels to the circuit and multi-core chip levels. Given the complex nature of both modeling and online thermal/power/reliability control challenges in advanced technologies from one hand, there are a lot of potentials in using the latest advances in machine learning to tackle those hard problems towards developing intelligent runtime management schemes. In this tutorial, I present several novel machine-learning-based solutions to the after-mentioned EDA challenges. I will first focus on novel full-chip thermal and power map estimation techniques using the recurrent neural networks (RNN) and generative adversarial (GAN) network methods for commercial multi-core processors first. I will show for the first time the new capability of real-time full-chip thermal tracking for commercial microprocessors. I also show how to obtain accurate power density maps and thermal maps for those commercial chips with practical heat sinks for more advanced thermal/power/reliability management. I also present the recent proposed ThermGAN approach, which uses the adverbial generative learning method to estimate the full-chip thermal maps and compared favorably to the existing RNN based methods. For dynamic thermal/reliability management, I will present a recently proposed deep reinforced learning (DRL) based control method based on the reliability of workload-dependent true hotspot identification and modeling of commercial multi-core processors. For electromigration (EM) induced VLSI to interconnect reliability modeling and optimization, I will present the recently proposed EMGraph approach, which applied graph convolutional networks (GCN) to consider both node and edge embedding features, to estimate the transient EM stress of multi-segment interconnect trees.

Description