Mapping emerging logic and memory solutions to large language models
Presentation Menu
Of great interest is to determine how an emerging memory device may ultimately impact at-scale workloads. A primary emphasis of this presentation is end-to-end benchmarking of large language models assuming ferroelectric-based hybrid memories that allow data to be stored in both volatile and non-volatile modes. Said modalities may be invaluable when considering the efficacy of in-memory computing solutions — which may be degraded by write events that adversely impact device endurance and/or incur high energies/latencies. We benchmark DRAM-like 1FeFET-1C solutions (that allow data to be written to a non-volatile FeFET or directly to the capacitor, thereby avoiding an FeFET state transition) in the context of transformer model workloads. We compare 1FeFET-1C solutions, to (a) FeFET-CMOS hybrids, (b) RRAM-CMOS hybrids, (c) RRAM-only solutions, and (d) a CMOS-based H100 GPU. We also frame our results in the context of MLPerf power studies performed via industry/SRC supported research [1]. [1] A. Tschand et al., "MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI," 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), Las Vegas, NV, USA, 2025, pp. 1201-1216, doi: 10.1109/HPCA61900.2025.00092.