IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites
Cyber-physical systems (CPS) are engineered systems that are built from, and depend upon, the seamless integration of computation and physical components [NSF]. Embedded systems comprising of hardware and software systems are the major enabling technology for these cyber-physical systems. Today, CPSs can be found in security-sensitive areas such as aerospace, automotive, energy, healthcare, manufacturing transportation, entertainment, and consumer appliances. Compared to the traditional information processing systems, due to the tight interactions between cyber and physical components in CPSs and closed-loop control from sensing to actuation, new vulnerabilities are emerging from the boundaries between various layers and domains. In this CEDA distinguished lecture, Prof. Al Faruque will discuss how new vulnerabilities are emerging at the intersection of various components and subsystems and their various hardware, software, and physical layers. Several recent examples from various cyber-physical systems will be presented in this talk. To understand these new vulnerabilities a very different set of methodologies and tools are needed. Defenses against these vulnerabilities demand also new hardware/software co-design approaches. The talk will highlight recent developments in this regard. The major goal of this talk will be to highlight various research challenges and the need for novel scientific solutions from the larger research community.
More information provided here.
Recently machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive tasks. Machine learning for electronic design automation is also gaining traction as it provides new techniques for many of the challenging design automation problems with complex nature.
In this talk, I will present our recent work from VSCLAB at UC Riverside on machine learning-based thermal map and power density map estimation methods for commercial multi-core CPUs. First I will look at the real-time full-chip thermal map estimation problem for commercial multi-chip processors. In our work, instead of using traditional functional unit powers as input, the new models are directly based on the on-chip real-time high-level chip utilization and thermal sensor information of commercial chips without any assumption of additional physical sensors requirement. We first framed the problem as the static or transient mapping between the chip utilizations and thermal maps. To build the transient thermal model, we utilized temporal-aware long-short-term-memory (LSTM) neural networks with system-level variables such as chip frequency, voltage, and instruction counts as inputs. Instead of a pixel-wise heatmap estimation, we used 2D spatial discrete cosine transformation (DCT) on the heatmaps so that they can be expressed with just a few dominant DCT coefficients. Second, we explored generative learning for the full-chip thermal map estimation problem. In our work, we treated the thermal modeling problem as an image-generation problem using the generative neural networks. The resulting thermal map estimation method, called ThermGAN can provide tool-accurate full-chip transient thermal maps from the given performance monitor traces of commercial off-the-shelf multi-core processors. Third, I will present a new full-chip power map estimation method for commercial multi-core processors.
In this lecture, I will talk about two overarching research goals we have been pursuing for several years. The first goal is to explore the limits of energy per operation when running AI algorithms such as deep learning (DL). In-memory computing (IMC) is a non-von Neumann compute paradigm that keeps alive the promise for 1fJ/Operation for DL. Attributes such as synaptic efficacy and plasticity can be implemented in place by exploiting the physical attributes of memory devices such as phase-change memory. I will provide an overview of the most advanced IMC chips based on phase-change memory integrated in 14nm CMOS technology node. The second goal is to develop algorithmic and architectural building blocks for a more efficient and general AI. I will introduce the paradigm of neuro-vector symbolic architecture (NVSA) that could address problems such as continual learning and visual abstract reasoning. I will also showcase the role of IMC in realizing some of the critical compute blocks for NVSA.
The continued scaling of horizontal and vertical physical features of silicon-based complementary metal-oxide-semiconductor (CMOS) transistors, termed as “More Moore”, has a limited runway and would eventually be replaced with “Beyond CMOS” technologies. There has been a tremendous effort to follow Moore’s law but it is currently approaching atomistic and quantum mechanical physics boundaries. This has led to active research in other non-CMOS technologies such as memristive devices, carbon nanotube field-effect transistors, quantum computing, etc. Several of these technologies have been realized on practical devices with promising gains in yield, integration density, runtime performance, and energy efficiency. Their eventual adoption is largely reliant on the continued research of Electronic Design Automation (EDA) tools catering to these specific technologies. Indeed, some of these technologies present new challenges to the EDA research community, which are being addressed through a series of innovative tools and techniques. In this tutorial, we will particularly cover the two phases of EDA flow, logic synthesis, and technology mapping, for two types of emerging technologies, namely, in-memory computing and quantum computing.
30-minute demo 10-minute Q&A
The CAD for Trust and Assurance website is an academic dissemination effort by researchers in the field of hardware security. The goal is to assemble information on all CAD for trust/assurance activities in academia and industry in one place and share them with the broader community of researchers and practitioners in a timely manner, with an easy-to-search and easy-to-access interface. We’re including information on many major CAD tools the research community has developed over the past decade, including open-source license-free or ready-for-licensing tools, associated metrics, relevant publications, and video-demos. We are also delighted to announce that a series of virtual CAD for Assurance tool training webinars starting in February 2021.
Additional information on these webinars is available at the CAD for Assurance website at https://cadforassurance.org/.
Data transfer between processors and memory is a major bottleneck in improving application performance on traditional computing hardware. This talk will showcase several representative cross-layer IMC-based design efforts.
Placement for very large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve around 40× speedup in global placement without quality degradation compared to the state-of-the-art multithreaded placer RePlAce. We believe this work shall open up new directions for revisiting classical EDA problems with advancements in AI hardware and software.
In the routing tree construction, both wirelength (WL) and pathlength (PL) are of importance. Among all methods, PD-II and SALT are the two most prominent ones. However, neither PD-II nor SALT always dominates the other one in terms of both WL and PL for all nets. In addition, estimating the best parameters for both algorithms is still an open problem. In this paper, we model the pins of a net as point cloud and formalize a set of special properties of such point cloud. Considering these properties, we propose a novel deep neural net architecture, TreeNet, to obtain the embedding of the point cloud. Based on the obtained cloud embedding, an adaptive workflow is designed for the routing tree construction. Experimental results show that the proposed TreeNet is superior to other mainstream models for the point cloud on classification tasks. Moreover, the proposed adaptive workflow for the routing tree construction outperforms SALT and PD-II in terms of both efficiency and effectiveness.
Intermittently executing deep neural network (DNN) inference powered by ambient energy, paves the way for sustainable and intelligent edge applications. Neural architecture search (NAS) has achieved great success in automatically finding highly accurate networks with low latency. However, we observe that NAS attempts to improve inference latency by primarily maximizing data reuse, but the derived solutions when deployed on intermittent systems may be inefficient, such that the inference may not satisfy an end-to-end latency requirement and, more seriously, they may be unsafe given an insufficient energy budget. This work proposes iNAS, which introduces intermittent execution behavior into NAS. In order to generate accurate neural networks and corresponding intermittent execution designs that are safe and efficient, iNAS finds the right balance between data reuse and the costs related to progress preservation and recovery, while ensuring the power-cycle energy budget is not exceeded. The solutions found by iNAS and an existing HW-NAS were evaluated on a Texas Instruments device under intermittent power, across different datasets, energy budgets and latency requirements. Experimental results show that in all cases the iNAS solutions safely meet the latency requirements, and substantially improve the end-to-end inference latency compared to the HW-NAS solutions.
Active LiDAR and stereo vision are the most commonly used depth sensing techniques in autonomous vehicles. Each of them alone has weaknesses in terms of density and reliability and thus cannot perform well on all practical scenarios. Recent works use deep neural networks (DNNs) to exploit their complementary properties, achieving a superior depth-sensing. However, these state-of-the-art solutions are not satisfactory on real-time responsiveness due to the high computational complexities of DNNs. In this paper, we present FastFusion, a fast deep stereo-LiDAR fusion framework for real-time high-precision depth estimation. FastFusion provides an efficient two-stage fusion strategy that leverages binary neural network to integrate stereoLiDAR information as input and use cross-based LiDAR trust aggregation to further fuse the sparse LiDAR measurements in the back-end of stereo matching. More importantly, we present a GPU-based acceleration framework for providing a low latency implementation of FastFusion, gaining both accuracy improvement and real-time responsiveness. In the experiments, we demonstrate the effectiveness and practicability of FastFusion, which obtains a significant speedup over state-of-the-art baselines while achieving comparable accuracy on depth sensing.
Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, systems-on-chip (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal, since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions.
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of Systemon-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks.
Accurate power modeling is crucial for energy-efficient CPU design and runtime management. An ideal power modeling framework needs to be accurate yet fast, achieve high temporal resolution (ideally cycle-accurate) yet with low runtime computational overheads, and easily extensible to diverse designs through automation. Simultaneously satisfying such conflicting objectives is challenging and largely unattained despite significant prior research. In this talk, I will introduce our work APOLLO with multiple key attributes. First, it supports fast and accurate design-time power model simulation, handling millions-of-cycles benchmarks in minutes with an emulator. Second, it incorporates an unprecedented low-cost runtime on-chip power meter in CPU RTL for per-cycle power tracing. Third, the development process of this method is fully automated and applies to any given design solution. This method has been validated on high-volume commercial microprocessors Neoverse N1 and Cortex-A77.
The microarchitecture design of a processor has been increasingly difficult due to the large design space and time-consuming verification flow. Previously, researchers rely on prior knowledge and cycle-accurate simulators to analyze the performance of different microarchitecture designs but lack sufficient discussions on methodologies to strike a good balance between power and performance. This work proposes an automatic framework to explore microarchitecture designs of the RISCV Berkeley Out-of-Order Machine (BOOM), termed as BOOM-Explorer, achieving a good trade-off on power and performance. Firstly, the framework utilizes an advanced microarchitecture-aware active learning (MicroAL) algorithm to generate a diverse and representative initial design set. Secondly, a Gaussian process model with deep kernel learning functions (DKL-GP) is built to characterize the design space. Thirdly, correlated multi-objective Bayesian optimization is leveraged to explore Pareto-optimal designs. Experimental results show that BOOM-Explorer can search for designs that dominate previous arts and designs developed by senior engineers in terms of power and performance within a much shorter time.
We propose a novel hardware and software co-exploration framework for efficient neural architecture search (NAS). Different from existing hardware-aware NAS which assumes a fixed hardware design and explores the NAS space only, our framework simultaneously explores both the architecture search space and the hardware design space to identify the best neural architecture and hardware pairs that maximize both test accuracy and hardware efficiency. Such a practice greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. The framework iteratively performs a twolevel (fast and slow) exploration. Without lengthy training, the fast exploration can effectively fine-tune hyperparameters and prune inferior architectures in terms of hardware specifications, which significantly accelerates the NAS process. Then, the slow exploration trains candidates on a validation set and updates a controller using the reinforcement learning to maximize the expected accuracy together with the hardware efficiency. In this article, we demonstrate that the co-exploration framework can effectively expand the search space to incorporate models with high accuracy, and we theoretically show that the proposed two-level optimization can efficiently prune inferior solutions to better explore the search space. The experimental results on ImageNet show that the co-exploration NAS can find solutions with the same accuracy, 35.24% higher throughput, 54.05% higher energy efficiency, compared with the hardware-aware NAS.
In the world of IoT, both humans and objects are continuously connected, collecting and communicating data, in a rising number of applications including industry 4.0, biomedical, environmental monitoring, smart houses and offices. Local computation in the edge has become a necessity to limit data traffic. Additionally embedding AI processing in the edge adds potentially high levels of smart autonomy to these IoT 2.0 systems. Progress in nanoelectronic technology allows to do this in power- and hardware-efficient architectures and designs. This keynote gives an overview of key solutions, but also describes the main limitations and risks, exploring the edge of edge AI.
“This video material was produced for and used at the DATE2022 event. EDAA vzw, the owner of the copyright for this material, has authorized free reuse with the inclusion of this paragraph.”
ACRC WEBINAR by Prof. Anupam Chattopadhyay from Nanyang Technological University, Singapore
AI, Machine Learning, Deep Learning: Where Are the Real Opportunities for the EDA Industry?" presented by Kurt Keutzer, Thursday, December 9, 2021 at 58th DAC.
Joe Costello gives his keynote, "When the Winds of Change Blow, Some People Build Walls and Some People Build Windmills," on Wednesday December 8, 2021, at 58th DAC.
Keynote Speaker Bill Dally give his presentation, "GPUs, Machine Learning, and EDA," on Tuesday, December 7, 2021 at 58th DAC.
As practical limits for transistor miniaturization are reached, alternative approaches for improving integrated-circuit functionality and energy efficiency at acceptable cost will be necessary to meet growing demand for information and communication technology. This presentation will cover some examples of semiconductor device innovation to enable ubiquitous information systems in the future.
Jeff Dean gives Keynote, "The Potential of Machine Learning for Hardware Design," on Monday, December 6, 2021 at 58th DAC.
Within the past decade, the number of IoT devices introduced in the market has increased dramatically. This trend is expected to continue at a rapid pace. However, the massive deployment of IoT devices has led to significant security and privacy concerns given that security is often treated as an afterthought for IoT systems. Security issues may come at different levels, from deployment issues that leave devices exposed to the internet with default credentials, to implementation issues where manufacturers incorrectly employ existing protocols or develop proprietary ones for communications that have not been examined for their sanity. While existing cybersecurity and network security solutions can help protect IoT, they often suffer from limited on-board/on-chip resources. To mitigate this problem, researchers have developed multiple solutions based on a top-down (relying on the cloud for IoT data processing and authentication) or a bottom-up (leveraging hardware modifications for efficient cybersecurity protection). In this talk, I will first introduce the emerging security and privacy challenges in the IoT domain. I will then focus on the bottom-up solutions on IoT protection and will present our recent research effort in microarchitecture-supported IoT runtime attack detection and device attestation. The developed methods will lead to a design-for-security flow towards trusted IoT and their applications.
EDA tools have been used routinely in the digital design flow for decades, but despite valiant efforts from the research community, analog design has stubbornly resisted automation. Several recent developments are helping turn the tide, driving wider adoption of automation tools within the analog design flow. This talk explains the reasons for this change, and then describes recent efforts in analog layout automation with particular focus on the ALIGN (Analog Layout, Intelligently Generated from Netlists) project. ALIGN is a joint university-industry effort that is developing an open-source analog layout flow, leveraging a blend of traditional algorithmic methods with machine learning based approaches. ALIGN targets a wide variety of designs – low frequency analog circuits, wireline circuits for high-speed links, RF/wireless circuits, and power delivery circuits – under a single framework. The flow is structured modularly and is being built to cater to a wide range of designer expertise: the novice designer could use it in “push-button” mode, automatically generating GDSII layout from a SPICE netlist, while users with greater levels of expertise could bypass parts of the flow to incorporate their preferences and constraints.
The talk will present an overview of both the technical challenges and logistical barriers to building an open-source tool flow while respecting the confidentiality requirements of secured IP information. Finally, the application of ALIGN to a variety of designs will be demonstrated.
Data transfer between processors and memory is a major bottleneck in improving application-level performance. This is particularly true for data intensive tasks such as many machine learning and security applications. In-memory computing, where certain data processing is performed directly in the memory array, can be an effective solution to address this bottleneck. Associative memory (AM), a type of memory that can efficiently “associate” an input query with appropriate data words/locations in the memory, is a powerful in-memory computing core. Nonetheless harnessing the benefits of AM requires cross-layer efforts spanning from devices and circuits to architectures and systems. In this talk, I will showcase several representatives cross-layer AM based design efforts. In particular, I will highlight how different non-volatile memory technologies (such as RRAM, FeFET memory and Flash) can be exploited to implement various types of AM (e.g., exact and approximate match, ternary and multi-bit data representation, and different distance functions). I will use several popular machine learning and security applications to demonstrate how they can profit from these different AM designs. End-to-end (from device to application) evaluations will be analyzed to reveal the benefits contributed by each design layer, which can serve as guides for future research efforts.
As semiconductor technology enters the sub-14nm era, geometry, process, voltage and temperature (PVT) variability in devices can affect the performance, functionality, and power of circuits, especially in new Artificial Intelligent (AI) accelerators. This is where predictive failure analytics is extremely critical. It can identify the failure issues related to logic and memory circuits and drive the circuits in energy efficient area.
This talk describes how key statistical techniques and new algorithms can be effectively used to analyze and build robust circuits. These algorithms can be used to analyze decoders, latches, and volatile as well as non-volatile memories. In addition, how these methodologies can be extended to “reliability prediction” and “hardware corroboration” is demonstrated. Logistic regression-based machine learning techniques are employed for modeling the circuit response and speeding up the importance sample points simulations. To avoid overfitting, a cross-validation based regularization framework for ordered feature selection is demonstrated.
Also, techniques to generate accurate parasitic capacitance modeling along with PVT variations for sub-22nm technologies and their incorporation into a physics-based statistical analysis methodology for accurate Vmin analysis are described. In addition, extension of these techniques based on machine learning e.g KNN is highlighted. Finally, the talk summarizes important issues in this field.
The recent artificial intelligence (AI) boom has been primarily driven by three confluence forces: algorithms, data, and computing power enabled by modern integrated circuits (ICs), including specialized AI accelerators. This talk will present a closed-loop perspective for synergistic AI and agile IC design with two main themes, AI for IC and IC for AI. As semiconductor technology enters the era of extreme scaling, IC design and manufacturing complexities become extremely high. More intelligent and agile IC design technologies are needed than ever to optimize performance, power, manufacturability, design cost, etc., and deliver equivalent scaling to Moore’s Law. This talk will present some recent results leveraging modern AI and machine learning advancement with domain-specific customizations for agile IC design and manufacturing closure. Meanwhile, customized ICs, including those with beyond-CMOS technologies, can drastically improve AI performance and energy efficiency by orders of magnitude. I will present some recent results on hardware/software co-design for high-performance and energy-efficient optical neural networks. Closing the virtuous cycle between AI and IC holds great potential to advance the state-of-the-art of each other significantly.
Side-Channel Analysis - Debdeep Mukhopadhyay (IIT Kharagpur) 30-min demo with a 10-min Q&A CAD Tool: https://cadforassurance.org/tools/sca/exp-fault/ MIMI - Patanjali SLPSK and Jonathan Cruz (UF) 30-min demo with a 10-min Q&A CAD Tool: https://cadforassurance.org/tools/ic-trust-verification/mimi/
Artificial Intelligence (A) ubiquitously impacts almost all research societies including electronic design automation (EDA). Many scholars with mathematic and modeling backgrounds have shifted their focuses onto applying AI technologies to their research or directly working on AI problems. As a researcher with a Ph.D. training of EDA and circuit designs, I started my AI-relevant research since late 2000s, i.e., neuromorphic computing that implements hardware to accelerate computation of biologically plausible learning models. In this talk, I will review the development process of my research from neuromorphic computing to a broader scope of AI, including machine learning accelerator designs, neural network quantization and pruning, neural architectural search, federated learning, and neural network robustness, privacy, security, etc., and how I benefit from my EDA background.