Paper

Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing

Publication Date:
Publication Date
6 November 2014

paper Menu

Abstract

High reliability, availability, and serviceability are critical for modern large-scale computing systems. As an effective error recovery mechanism, checkpointing has been widely used in such systems for their survival from unexpected failures. The conventional checkpointing schemes, however, are time-consuming due to the limited I/O bandwidth between the DRAM-based main memory and the backup storage. To mitigate the checkpoint overhead, we propose a fast local checkpointing scheme by leveraging Multi-Level Cell (MLC) STT-RAM. We take advantage of the unique features of MLC STT-RAM to accelerate local checkpointing. Our experimental results show that the average performance overhead is less than 1% in a multi-programmed four-core process node with a 1-second local checkpoint interval. The evaluation results also demonstrate that using MLC STT-RAM is an energy-efficient solution.

Country
HKG
Affiliation
Hong Kong University of Science & Technology
IEEE Region
Region 10 (Asia and Pacific)
Country
USA
Affiliation
University of California, Santa Barbara
IEEE Region
Region 06 (Western U.S.)
Email