AI for HPC Systems

Course Objective
This training program aims to build a practical understanding of how modern AI workloads are developed and executed on high-performance computing (HPC) systems. Participants will gain insight into the hardware, software, and system-level foundations that enable large-scale AI applications, ranging from computer vision and multimodal systems to large language models.
The program is designed to provide both foundational knowledge and hands-on experience in AI on HPC environments. Participants will learn how to train, optimize, deploy, and scale AI workloads while developing practical skills in distributed computing, model serving, inference optimization, and AI compiler technologies that bridge machine learning frameworks and modern hardware platforms.
Learning Outcomes
By the end of this training, participants will understand the architecture and operation of modern supercomputing systems and how AI workloads in vision, language, and multimodal domains are executed efficiently on large-scale computing infrastructure. They will be able to work confidently within HPC Linux environments, manage computational resources using Slurm, and scale AI training from single-GPU systems to multi-GPU and multi-node clusters.
Participants will gain practical experience in building LLM-based applications, including retrieval-augmented generation (RAG) pipelines and agentic AI workflows. They will learn techniques for optimizing inference performance through quantization, batching, and model compression while also developing an introductory understanding of AI compiler technologies such as MLIR and LLVM, enabling them to appreciate how modern AI models are mapped and optimized for hardware execution.