IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites
With the globalization of microelectronics manufacturing, the security of the IC supply chain has manifested itself as a major consideration of modern hardware systems. As an example, large SoCs often integrate third party intellectual property or 3PIPs, which may be procured from untrusted entities that can potentially compromise the security of the entire system. Over the past two decades, researchers have unearthed several vulnerabilities due to such security risks, including Hardware Trojan insertion, IC counterfeiting, overproduction, and reverse engineering. These vulnerabilities could cause billions of dollars in economic loss, as well as result in critical threats to national security. To address these concerns, researchers in academia, government, and industry have proposed several countermeasures at various levels, from RTL to layout to devices, including logic obfuscation, IC camouflaging, and IC watermarking. This panel will discuss these challenges and potential solutions and forecast new frontiers to explore by the hardware security community.
The increasing prevalence of chronic diseases, an aging population, and a shortage of healthcare professionals have prompted the widespread adoption of mobile and implantable devices to effectively manage various health conditions. In recent years, there is growing interest to leverage the rapid advances in artificial intelligence (AI) to enhance the performance of these devices, resulting in better patient outcomes, reduced healthcare costs, and improved patient autonomy. Due to privacy, security, and safety considerations, inferences must often be done on the edge, with limited hardware resources. This is compounded by inter-patient and intra-patient variability, heavy dependence on medical domain knowledge, and a lack of diversified training data. In this talk, we will demonstrate how techniques such as hardware and neural architecture co-design, personalized meta-learning, and fairness-aware pruning can transform the landscape of mobile and implantable devices. Additionally, we will showcase the world's first smart ImplantableCardioverter Defibrillator (ICD) design enabled by our research
"This video material was produced for and used at the DATE 2023 conference. EDAA vzw, the owner of the copyright for this material, has authorized free reuse with the inclusion of this paragraph."
The emergence of “Very Large-Scale Integration (VLSI)” in the late 1970s created a groundswell of feverish innovation. Inspired by the vision laid out in Mead and Conway’s “Introduction to VLSI Design”, numerous researchers embarked on venues to unleash the capabilities offered by integrated circuit technology. The introduction of design rules, separating manufacturing from design, combined with an intermediate abstraction language (CIF) and a silicon brokerage service (MOSIS) gave access to silicon to a large population of eager designers. The magic however expanded way beyond these circuit enthusiasts and attracted a whole generation of software experts to help automate the design process, giving rise to concepts such as layout generation, logic synthesis, and silicon compilation. It is hard to overestimate the impact that this revolution has had on information technology and society at large.
About fifty years later, Integrated Circuits are everywhere. Yet, the process of creating these amazing devices feels somewhat tiring. CMOS scaling, the engine behind the evolution in complexity over all these decades, is slowing down and will most likely peter out in about a decade. So has innovation in design tools and methodologies. As a consequence, the lure of IC design and design tool development has faded, causing a talent shortage worldwide. Yet, at the same time, this moment of transition offers a world of opportunity and excitement. Novel technologies and devices, integrated into three-dimensional artifacts are emerging and are opening the door for truly transformational applications such as brain-machine interfaces and swarms of nanobots. Machine learning, artificial intelligence, and optical and quantum computing present novel models of computation surpassing the instruction-set processor paradigm. With this comes a need again to re-invent the design process, explicitly exploiting the capabilities offered by this next generation of computing systems. In summary, it is time to put the magic in design again.
Check out the DATE 2024 Call for Papers.
Cyber-physical systems (CPS) are engineered systems that are built from, and depend upon, the seamless integration of computation and physical components [NSF]. Embedded systems comprising of hardware and software systems are the major enabling technology for these cyber-physical systems. Today, CPSs can be found in security-sensitive areas such as aerospace, automotive, energy, healthcare, manufacturing transportation, entertainment, and consumer appliances. Compared to the traditional information processing systems, due to the tight interactions between cyber and physical components in CPSs and closed-loop control from sensing to actuation, new vulnerabilities are emerging from the boundaries between various layers and domains. In this CEDA distinguished lecture, Prof. Al Faruque will discuss how new vulnerabilities are emerging at the intersection of various components and subsystems and their various hardware, software, and physical layers. Several recent examples from various cyber-physical systems will be presented in this talk. To understand these new vulnerabilities a very different set of methodologies and tools are needed. Defenses against these vulnerabilities demand also new hardware/software co-design approaches. The talk will highlight recent developments in this regard. The major goal of this talk will be to highlight various research challenges and the need for novel scientific solutions from the larger research community.
More information provided here.
Recently machine learning, especially deep learning is gaining much attention due to the breakthrough performance in various cognitive tasks. Machine learning for electronic design automation is also gaining traction as it provides new techniques for many of the challenging design automation problems with complex nature.
In this talk, I will present our recent work from VSCLAB at UC Riverside on machine learning-based thermal map and power density map estimation methods for commercial multi-core CPUs. First I will look at the real-time full-chip thermal map estimation problem for commercial multi-chip processors. In our work, instead of using traditional functional unit powers as input, the new models are directly based on the on-chip real-time high-level chip utilization and thermal sensor information of commercial chips without any assumption of additional physical sensors requirement. We first framed the problem as the static or transient mapping between the chip utilizations and thermal maps. To build the transient thermal model, we utilized temporal-aware long-short-term-memory (LSTM) neural networks with system-level variables such as chip frequency, voltage, and instruction counts as inputs. Instead of a pixel-wise heatmap estimation, we used 2D spatial discrete cosine transformation (DCT) on the heatmaps so that they can be expressed with just a few dominant DCT coefficients. Second, we explored generative learning for the full-chip thermal map estimation problem. In our work, we treated the thermal modeling problem as an image-generation problem using the generative neural networks. The resulting thermal map estimation method, called ThermGAN can provide tool-accurate full-chip transient thermal maps from the given performance monitor traces of commercial off-the-shelf multi-core processors. Third, I will present a new full-chip power map estimation method for commercial multi-core processors.
In this lecture, I will talk about two overarching research goals we have been pursuing for several years. The first goal is to explore the limits of energy per operation when running AI algorithms such as deep learning (DL). In-memory computing (IMC) is a non-von Neumann compute paradigm that keeps alive the promise for 1fJ/Operation for DL. Attributes such as synaptic efficacy and plasticity can be implemented in place by exploiting the physical attributes of memory devices such as phase-change memory. I will provide an overview of the most advanced IMC chips based on phase-change memory integrated in 14nm CMOS technology node. The second goal is to develop algorithmic and architectural building blocks for a more efficient and general AI. I will introduce the paradigm of neuro-vector symbolic architecture (NVSA) that could address problems such as continual learning and visual abstract reasoning. I will also showcase the role of IMC in realizing some of the critical compute blocks for NVSA.
The continued scaling of horizontal and vertical physical features of silicon-based complementary metal-oxide-semiconductor (CMOS) transistors, termed as “More Moore”, has a limited runway and would eventually be replaced with “Beyond CMOS” technologies. There has been a tremendous effort to follow Moore’s law but it is currently approaching atomistic and quantum mechanical physics boundaries. This has led to active research in other non-CMOS technologies such as memristive devices, carbon nanotube field-effect transistors, quantum computing, etc. Several of these technologies have been realized on practical devices with promising gains in yield, integration density, runtime performance, and energy efficiency. Their eventual adoption is largely reliant on the continued research of Electronic Design Automation (EDA) tools catering to these specific technologies. Indeed, some of these technologies present new challenges to the EDA research community, which are being addressed through a series of innovative tools and techniques. In this tutorial, we will particularly cover the two phases of EDA flow, logic synthesis, and technology mapping, for two types of emerging technologies, namely, in-memory computing and quantum computing.
30-minute demo 10-minute Q&A
The CAD for Trust and Assurance website is an academic dissemination effort by researchers in the field of hardware security. The goal is to assemble information on all CAD for trust/assurance activities in academia and industry in one place and share them with the broader community of researchers and practitioners in a timely manner, with an easy-to-search and easy-to-access interface. We’re including information on many major CAD tools the research community has developed over the past decade, including open-source license-free or ready-for-licensing tools, associated metrics, relevant publications, and video-demos. We are also delighted to announce that a series of virtual CAD for Assurance tool training webinars starting in February 2021.
Additional information on these webinars is available at the CAD for Assurance website at https://cadforassurance.org/.
Data transfer between processors and memory is a major bottleneck in improving application performance on traditional computing hardware. This talk will showcase several representative cross-layer IMC-based design efforts.
Placement for very large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve around 40× speedup in global placement without quality degradation compared to the state-of-the-art multithreaded placer RePlAce. We believe this work shall open up new directions for revisiting classical EDA problems with advancements in AI hardware and software.
In the routing tree construction, both wirelength (WL) and pathlength (PL) are of importance. Among all methods, PD-II and SALT are the two most prominent ones. However, neither PD-II nor SALT always dominates the other one in terms of both WL and PL for all nets. In addition, estimating the best parameters for both algorithms is still an open problem. In this paper, we model the pins of a net as point cloud and formalize a set of special properties of such point cloud. Considering these properties, we propose a novel deep neural net architecture, TreeNet, to obtain the embedding of the point cloud. Based on the obtained cloud embedding, an adaptive workflow is designed for the routing tree construction. Experimental results show that the proposed TreeNet is superior to other mainstream models for the point cloud on classification tasks. Moreover, the proposed adaptive workflow for the routing tree construction outperforms SALT and PD-II in terms of both efficiency and effectiveness.
Intermittently executing deep neural network (DNN) inference powered by ambient energy, paves the way for sustainable and intelligent edge applications. Neural architecture search (NAS) has achieved great success in automatically finding highly accurate networks with low latency. However, we observe that NAS attempts to improve inference latency by primarily maximizing data reuse, but the derived solutions when deployed on intermittent systems may be inefficient, such that the inference may not satisfy an end-to-end latency requirement and, more seriously, they may be unsafe given an insufficient energy budget. This work proposes iNAS, which introduces intermittent execution behavior into NAS. In order to generate accurate neural networks and corresponding intermittent execution designs that are safe and efficient, iNAS finds the right balance between data reuse and the costs related to progress preservation and recovery, while ensuring the power-cycle energy budget is not exceeded. The solutions found by iNAS and an existing HW-NAS were evaluated on a Texas Instruments device under intermittent power, across different datasets, energy budgets and latency requirements. Experimental results show that in all cases the iNAS solutions safely meet the latency requirements, and substantially improve the end-to-end inference latency compared to the HW-NAS solutions.
Active LiDAR and stereo vision are the most commonly used depth sensing techniques in autonomous vehicles. Each of them alone has weaknesses in terms of density and reliability and thus cannot perform well on all practical scenarios. Recent works use deep neural networks (DNNs) to exploit their complementary properties, achieving a superior depth-sensing. However, these state-of-the-art solutions are not satisfactory on real-time responsiveness due to the high computational complexities of DNNs. In this paper, we present FastFusion, a fast deep stereo-LiDAR fusion framework for real-time high-precision depth estimation. FastFusion provides an efficient two-stage fusion strategy that leverages binary neural network to integrate stereoLiDAR information as input and use cross-based LiDAR trust aggregation to further fuse the sparse LiDAR measurements in the back-end of stereo matching. More importantly, we present a GPU-based acceleration framework for providing a low latency implementation of FastFusion, gaining both accuracy improvement and real-time responsiveness. In the experiments, we demonstrate the effectiveness and practicability of FastFusion, which obtains a significant speedup over state-of-the-art baselines while achieving comparable accuracy on depth sensing.
Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, systems-on-chip (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal, since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions.
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of Systemon-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks.
Accurate power modeling is crucial for energy-efficient CPU design and runtime management. An ideal power modeling framework needs to be accurate yet fast, achieve high temporal resolution (ideally cycle-accurate) yet with low runtime computational overheads, and easily extensible to diverse designs through automation. Simultaneously satisfying such conflicting objectives is challenging and largely unattained despite significant prior research. In this talk, I will introduce our work APOLLO with multiple key attributes. First, it supports fast and accurate design-time power model simulation, handling millions-of-cycles benchmarks in minutes with an emulator. Second, it incorporates an unprecedented low-cost runtime on-chip power meter in CPU RTL for per-cycle power tracing. Third, the development process of this method is fully automated and applies to any given design solution. This method has been validated on high-volume commercial microprocessors Neoverse N1 and Cortex-A77.
The microarchitecture design of a processor has been increasingly difficult due to the large design space and time-consuming verification flow. Previously, researchers rely on prior knowledge and cycle-accurate simulators to analyze the performance of different microarchitecture designs but lack sufficient discussions on methodologies to strike a good balance between power and performance. This work proposes an automatic framework to explore microarchitecture designs of the RISCV Berkeley Out-of-Order Machine (BOOM), termed as BOOM-Explorer, achieving a good trade-off on power and performance. Firstly, the framework utilizes an advanced microarchitecture-aware active learning (MicroAL) algorithm to generate a diverse and representative initial design set. Secondly, a Gaussian process model with deep kernel learning functions (DKL-GP) is built to characterize the design space. Thirdly, correlated multi-objective Bayesian optimization is leveraged to explore Pareto-optimal designs. Experimental results show that BOOM-Explorer can search for designs that dominate previous arts and designs developed by senior engineers in terms of power and performance within a much shorter time.
We propose a novel hardware and software co-exploration framework for efficient neural architecture search (NAS). Different from existing hardware-aware NAS which assumes a fixed hardware design and explores the NAS space only, our framework simultaneously explores both the architecture search space and the hardware design space to identify the best neural architecture and hardware pairs that maximize both test accuracy and hardware efficiency. Such a practice greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. The framework iteratively performs a twolevel (fast and slow) exploration. Without lengthy training, the fast exploration can effectively fine-tune hyperparameters and prune inferior architectures in terms of hardware specifications, which significantly accelerates the NAS process. Then, the slow exploration trains candidates on a validation set and updates a controller using the reinforcement learning to maximize the expected accuracy together with the hardware efficiency. In this article, we demonstrate that the co-exploration framework can effectively expand the search space to incorporate models with high accuracy, and we theoretically show that the proposed two-level optimization can efficiently prune inferior solutions to better explore the search space. The experimental results on ImageNet show that the co-exploration NAS can find solutions with the same accuracy, 35.24% higher throughput, 54.05% higher energy efficiency, compared with the hardware-aware NAS.
In the world of IoT, both humans and objects are continuously connected, collecting and communicating data, in a rising number of applications including industry 4.0, biomedical, environmental monitoring, smart houses and offices. Local computation in the edge has become a necessity to limit data traffic. Additionally embedding AI processing in the edge adds potentially high levels of smart autonomy to these IoT 2.0 systems. Progress in nanoelectronic technology allows to do this in power- and hardware-efficient architectures and designs. This keynote gives an overview of key solutions, but also describes the main limitations and risks, exploring the edge of edge AI.
“This video material was produced for and used at the DATE2022 event. EDAA vzw, the owner of the copyright for this material, has authorized free reuse with the inclusion of this paragraph.”
ACRC WEBINAR by Prof. Anupam Chattopadhyay from Nanyang Technological University, Singapore
AI, Machine Learning, Deep Learning: Where Are the Real Opportunities for the EDA Industry?" presented by Kurt Keutzer, Thursday, December 9, 2021 at 58th DAC.
Joe Costello gives his keynote, "When the Winds of Change Blow, Some People Build Walls and Some People Build Windmills," on Wednesday December 8, 2021, at 58th DAC.
Keynote Speaker Bill Dally give his presentation, "GPUs, Machine Learning, and EDA," on Tuesday, December 7, 2021 at 58th DAC.
As practical limits for transistor miniaturization are reached, alternative approaches for improving integrated-circuit functionality and energy efficiency at acceptable cost will be necessary to meet growing demand for information and communication technology. This presentation will cover some examples of semiconductor device innovation to enable ubiquitous information systems in the future.
Jeff Dean gives Keynote, "The Potential of Machine Learning for Hardware Design," on Monday, December 6, 2021 at 58th DAC.
Within the past decade, the number of IoT devices introduced in the market has increased dramatically. This trend is expected to continue at a rapid pace. However, the massive deployment of IoT devices has led to significant security and privacy concerns given that security is often treated as an afterthought for IoT systems. Security issues may come at different levels, from deployment issues that leave devices exposed to the internet with default credentials, to implementation issues where manufacturers incorrectly employ existing protocols or develop proprietary ones for communications that have not been examined for their sanity. While existing cybersecurity and network security solutions can help protect IoT, they often suffer from limited on-board/on-chip resources. To mitigate this problem, researchers have developed multiple solutions based on a top-down (relying on the cloud for IoT data processing and authentication) or a bottom-up (leveraging hardware modifications for efficient cybersecurity protection). In this talk, I will first introduce the emerging security and privacy challenges in the IoT domain. I will then focus on the bottom-up solutions on IoT protection and will present our recent research effort in microarchitecture-supported IoT runtime attack detection and device attestation. The developed methods will lead to a design-for-security flow towards trusted IoT and their applications.
EDA tools have been used routinely in the digital design flow for decades, but despite valiant efforts from the research community, analog design has stubbornly resisted automation. Several recent developments are helping turn the tide, driving wider adoption of automation tools within the analog design flow. This talk explains the reasons for this change, and then describes recent efforts in analog layout automation with particular focus on the ALIGN (Analog Layout, Intelligently Generated from Netlists) project. ALIGN is a joint university-industry effort that is developing an open-source analog layout flow, leveraging a blend of traditional algorithmic methods with machine learning based approaches. ALIGN targets a wide variety of designs – low frequency analog circuits, wireline circuits for high-speed links, RF/wireless circuits, and power delivery circuits – under a single framework. The flow is structured modularly and is being built to cater to a wide range of designer expertise: the novice designer could use it in “push-button” mode, automatically generating GDSII layout from a SPICE netlist, while users with greater levels of expertise could bypass parts of the flow to incorporate their preferences and constraints.
The talk will present an overview of both the technical challenges and logistical barriers to building an open-source tool flow while respecting the confidentiality requirements of secured IP information. Finally, the application of ALIGN to a variety of designs will be demonstrated.
Data transfer between processors and memory is a major bottleneck in improving application-level performance. This is particularly true for data intensive tasks such as many machine learning and security applications. In-memory computing, where certain data processing is performed directly in the memory array, can be an effective solution to address this bottleneck. Associative memory (AM), a type of memory that can efficiently “associate” an input query with appropriate data words/locations in the memory, is a powerful in-memory computing core. Nonetheless harnessing the benefits of AM requires cross-layer efforts spanning from devices and circuits to architectures and systems. In this talk, I will showcase several representatives cross-layer AM based design efforts. In particular, I will highlight how different non-volatile memory technologies (such as RRAM, FeFET memory and Flash) can be exploited to implement various types of AM (e.g., exact and approximate match, ternary and multi-bit data representation, and different distance functions). I will use several popular machine learning and security applications to demonstrate how they can profit from these different AM designs. End-to-end (from device to application) evaluations will be analyzed to reveal the benefits contributed by each design layer, which can serve as guides for future research efforts.
As semiconductor technology enters the sub-14nm era, geometry, process, voltage and temperature (PVT) variability in devices can affect the performance, functionality, and power of circuits, especially in new Artificial Intelligent (AI) accelerators. This is where predictive failure analytics is extremely critical. It can identify the failure issues related to logic and memory circuits and drive the circuits in energy efficient area.
This talk describes how key statistical techniques and new algorithms can be effectively used to analyze and build robust circuits. These algorithms can be used to analyze decoders, latches, and volatile as well as non-volatile memories. In addition, how these methodologies can be extended to “reliability prediction” and “hardware corroboration” is demonstrated. Logistic regression-based machine learning techniques are employed for modeling the circuit response and speeding up the importance sample points simulations. To avoid overfitting, a cross-validation based regularization framework for ordered feature selection is demonstrated.
Also, techniques to generate accurate parasitic capacitance modeling along with PVT variations for sub-22nm technologies and their incorporation into a physics-based statistical analysis methodology for accurate Vmin analysis are described. In addition, extension of these techniques based on machine learning e.g KNN is highlighted. Finally, the talk summarizes important issues in this field.