Make Every Bit of Bandwidth Count: AI Systems Innovation for Accelerating AI in the Era of Cloud Computing
Presentation Menu
The world of computing has seen significant changes in the increasing adoption of AI and its wide deployment onto the cloud. The computation patterns driven by modern machine learning models (such as deep neural networks, transformers, large-language models) often require accessing large volumes of data, which poses significant challenges for both programming and the underlying computing infrastructures. On the one hand, the programming needs to manage the complexity of moving data across different storage hierarchy (including remote storage nodes). On the other hand, the computing infrastructure keeps getting more complex with each upgrade: larger memory capacity, faster data access speed, and higher data I/O bandwidth. In this talk, we argue that many such design options are, however, not necessarily best suited for existing AI workloads’ acceleration. In fact, we demonstrate that there are still ample opportunities for AI workload acceleration on existing computing infrastructure, provided we can design effective AI system tools to characterize the system bottlenecks, and novel hardware and software techniques to fully utilize the given system’s I/O bandwidth. This opens a set of new design optimization options for acceleration AI workloads on the cloud.