Sheldon Tan Professor University of California, Riverside United States 6 (Western U.S.) Email 2022 2023 Talk(s): Data-Driven Deep Learning for Full-Chip Thermal and Power Estimation for Commercial Multi-core Systems Data-Driven Deep Learning for Full-Chip Thermal and Power Estimation for Commercial Multi-core Systems × Recently machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive tasks. Machine learning for electronic design automation is also gaining traction as it provides new techniques for many of the challenging design automation problems with complex nature. Thermal and power modeling and online regulation for many-core and embedded systems have been intensively studied in the past. But real-time full-chip thermal estimation for commercial multi-core processors is still a challenging problem. In this talk, I will first show that one can build an accurate transient thermal map of commercial off-the-shelf multi-core microprocessors based on the given on-chip sensors and real-time utilization information based on a data-driven deep learning-based approach. The new models are directly based on the available real-time high-level chip utilization and thermal sensor information of commercial chips without any assumption of additional physical sensors requirements. I will explore different deep learning neural networks (DNN) architectures such as recurrent neural networks and generative adversarial networks and show how to frame the full-chip thermal map learning problems as the supervised data-driven learning processes for those DNN networks for commercial many-core processors. Our work is further enabled by an infrared thermography system, which can provide lucid thermal maps of commercial multi-core CPUs and many-core GPUs while nominal working conditions are maintained on the chip. How Machine Leaning Reshape VLSI Interconnect Reliability Modeling, Optimization and Management How Machine Leaning Reshape VLSI Interconnect Reliability Modeling, Optimization and Management × As machine learning, especially deep learning, has been proved to be effective for capturing spatial and temporal dynamics behaviors, it brings new opportunities for addressing those difficult tasks. In this talk, I will look at the emerging machine learning/deep learning-based approaches for the VLSI interconnect reliability modeling, optimization, and dynamic management. I first present a machine learning-based approach to model hydrostatic stress in the multi-segment interconnects based on the generative adversarial learning, in which we treat the stress modeling as a time-varying 2D image-to-image conversion problem and the resulting solution provide an order of magnitudes over existing numerical method and 10x over state of art semi-analytic method. Second, based on the observation that VLSI multi-segment interconnects trees can be naturally viewed as graphs. I will present a new graph convolution network (GCN) model, called EMgraph, to consider both node and edge embedding features, to estimate the transient EM stress of interconnecting trees. The new method can lead 10X speedup over GAN-based EM analysis with transferable knowledge to predict stress on new interconnect trees. To mitigate the long-term aging effects due to NBTI, EM, and HCI, I further present an accuracy reconfigurable stochastic computing (ARSC) framework for dynamic reliability and power management. Different than the existing stochastic computing works, where the accuracy versus power/energy trade-off is carried out in the design time, the new ARSC design can change the accuracy or bit-width of the data in the run-time so that it can accommodate the long-term aging effects by slowing the system clock frequency at the cost of accuracy while maintaining the throughput of the computing. Recent Advances in Electromigration Reliability Modeling and Full-Chip EMinduced IR Analysis Recent Advances in Electromigration Reliability Modeling and Full-Chip EMinduced IR Analysis × Electromigration (EM) remains the top killer for the copper-based interconnects in current and near-future advanced VLSI technologies. As the technologies scale, the allowable current density continues to decrease due to EM while the required current density to drive the gates increases. 2015 ITRS predicts that EM lifetime of interconnects of VLSI chips will be reduced by half for each generation of technology nodes. The most important observation from our recent study is that EM analysis needs to consider multi-wire segments in the same metal layer to be more accurate and less conservative for all the advanced deep micro techniques. Existing current density-based EM check can lead to significant over-design and loss of design optimization opportunities. Our lab’s recent mission is to change this widely adopted industry design practice to move into new generation EM modeling, assessment, and design techniques. In this talk, I will present recent research works in my research lab (VSCLAB) at UC Riverside. I will cover newly proposed physics-based electromigration (EM) models, especially the physics-based three-phase EM models and full-chip EM-induced IR drop analysis techniques. I will first present the recently proposed three-phase EM model, which much more accurately describes the EM failure process and post-voiding resistance change phenomena. I then introduce two new EM immortality check methods for general multi-segment interconnect wires, which can be viewed as the Blech Product to multi-branch interconnects. Then I will present a novel fast finite difference method (FDM) for EM stress analysis based on frequency domain model order reduction techniques. On top of this, I will present the recently proposed coupled EM-IR drop analysis tool, EMspice, for full-chip EM-induced IR drop analysis of power delivery networks and show how it integrates with Synopsys ICC design flow. EMspice incorporates the latest EM modeling and analysis techniques for multi-segment interconnects, such as EM immortality checks considering both nucleation and incubation phases, the interaction between IR drop, and EM aging effects in the post-void EM phase, EM recovery effects and temporal temperature effects etc. Last, not least, I will present recent work on EM-aware power grid design and optimization, which exploits the recently developed EM modeling and assessment techniques for multi-segment interconnect wires. Machine Learning for VLSI Reliability, Power and Thermal Analysis (Tutorial) Machine Learning for VLSI Reliability, Power and Thermal Analysis (Tutorial) × Recently machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive applications. Machine learning for electronic design automation (EDA) is also gaining significant Tutorial Titles (Maximum 3) traction as it provides new computing and optimization paradigms for many challenging design automation problems with complex nature. Today’s chip designers and EDA developers face several many challenges in advanced technologies from technology and physical levels to the circuit and multi-core chip levels. Given the complex nature of both modeling and online thermal/power/reliability control challenges in advanced technologies from one hand, there are a lot of potentials in using the latest advances in machine learning to tackle those hard problems towards developing intelligent runtime management schemes. In this tutorial, I present several novel machine-learning-based solutions to the after-mentioned EDA challenges. I will first focus on novel full-chip thermal and power map estimation techniques using the recurrent neural networks (RNN) and generative adversarial (GAN) network methods for commercial multi-core processors first. I will show for the first time the new capability of real-time full-chip thermal tracking for commercial microprocessors. I also show how to obtain accurate power density maps and thermal maps for those commercial chips with practical heat sinks for more advanced thermal/power/reliability management. I also present the recent proposed ThermGAN approach, which uses the adverbial generative learning method to estimate the full-chip thermal maps and compared favorably to the existing RNN based methods. For dynamic thermal/reliability management, I will present a recently proposed deep reinforced learning (DRL) based control method based on the reliability of workload-dependent true hotspot identification and modeling of commercial multi-core processors. For electromigration (EM) induced VLSI to interconnect reliability modeling and optimization, I will present the recently proposed EMGraph approach, which applied graph convolutional networks (GCN) to consider both node and edge embedding features, to estimate the transient EM stress of multi-segment interconnect trees. EM-Aware Design: from Physics to System Level (Tutorial) EM-Aware Design: from Physics to System Level (Tutorial) × In this tutorial, I will present some of the recent research works in my research lab (VSCLAB) at UC Riverside. First, I will review a recently proposed physics-based three-phase EM model for multi-segment interconnect wires, which consists of nucleation, incubation, and growth phases to completely model the EM failure processes in typical copper damascene interconnects. The new EM model can predict more accurate EM failure behaviors for multi-segment wires such as interconnects with reservoir and sink segments. Second, I will present newly proposed fast aging acceleration techniques for efficient EM failure detections and validation of practical VLSI chips. I will present the novel configurable reservoir/sink-structured interconnect designs in which the current in the sink segment can be activated/deactivated dynamically during operation. In this way, the stress conditions of the interconnect wires can be increased and the lifetime of the wires can be reduced significantly. Afterward, I will present the compact dynamic EM models for general multi-segment interconnect wires and voltage-based EM immortality check algorithm for general interconnect trees. Then I will present a fast 2D stress numerical analysis technique based on the Krylov subspace and finite difference time domain methods (FDTD) for general interconnect wire’s structure. The proposed numerical analysis method can lead to 100X speedup over the simple FDTD method and can be applied to any interconnect structures for all the EM wear-out phases. Then I will focus on the system-level dynamic reliability management (DRM) techniques based on the newly proposed physics-based EM models. I will show several recent works of the EM-aware DRM for lifetime optimizations for dark-silicon, embedder/real-time systems, and 3D ICs to improve the TSV reliability. Last, not least, I will present the work to consider special temperature gradient impacts on EM due to the Joule heating effect. The spatial temperature gradient induced metal atom migration effects, also called temperature migration (TM), was shown to be as significant as the EM itself as technology advances. In our work, I will show how to consider TM effects in both existing EM immortality check and semi-analytic based approaches for the first time. Reliable Power Grid Network Design and Optimization Considering Physics-Based EM Models (Tutorial) Reliable Power Grid Network Design and Optimization Considering Physics-Based EM Models (Tutorial) × Long-term reliabilities such as electromigration (EM) induced failures in integrated circuits are expected to grow rapidly with shrinking feature sizes in new technology nodes and novel solutions to address reliability at different levels. This talk presents a new power grid network design and optimization technique that considers the new EM immortality constraint due to EM void saturation volume for multi-segment interconnects. When a void is formed, it is considered to be a failure in traditional EM models. However, this is quite a conservative assumption as a void may never grow to sufficient volume to make a significant change to the wire resistance. By considering saturation volume for multisegment interconnects wires, we can remove such conservativeness in the EMaware on-chip power grid design. Along with another new proposed immortality constraint for the EM nucleation phase for multi-segment wires, I will show that both EM immortality constraints can be naturally integrated into the existing programming-based power grid optimization framework. To mitigate the overly conservative nature of the optimization formulation, I will further explore two strategies: first, we size up failed wires to meet one of the immortality conditions subject to design rules; second, I will consider the EM-induced aging effects on power supply networks for a target lifetime, which allows some short-lifetime wires to fail and optimizes the rest of the wires. Last, not least, I will present our recent work using deep neural networks to model the full-chip EM-induced IR drop and leveraging of the differential feature of DNN for fast sensibility calculation for sensitivity-guided full-chip EM-aware power grid optimization.