Hardware-Software Co-Design Course

AI for HPC Systems

poster image 1

Course Objective

This training program aims to build a practical understanding of how modern AI workloads are developed and executed on high-performance computing (HPC) systems. Participants will gain insight into the hardware, software, and system-level foundations that enable large-scale AI applications, ranging from computer vision and multimodal systems to large language models.

The program is designed to provide both foundational knowledge and hands-on experience in AI on HPC environments. Participants will learn how to train, optimize, deploy, and scale AI workloads while developing practical skills in distributed computing, model serving, inference optimization, and AI compiler technologies that bridge machine learning frameworks and modern hardware platforms.

Learning Outcomes

By the end of this training, participants will understand the architecture and operation of modern supercomputing systems and how AI workloads in vision, language, and multimodal domains are executed efficiently on large-scale computing infrastructure. They will be able to work confidently within HPC Linux environments, manage computational resources using Slurm, and scale AI training from single-GPU systems to multi-GPU and multi-node clusters.

Participants will gain practical experience in building LLM-based applications, including retrieval-augmented generation (RAG) pipelines and agentic AI workflows. They will learn techniques for optimizing inference performance through quantization, batching, and model compression while also developing an introductory understanding of AI compiler technologies such as MLIR and LLVM, enabling them to appreciate how modern AI models are mapped and optimized for hardware execution.

Training Modules

Module 1: Introduction to Supercomputing Systems

1.1 Overview of High-Performance Computing and its Role in AI

1.2 Cluster Components: Nodes, CPUs, GPUs, Memory, Storage, and Interconnects

1.3 Working in an HPC Linux Environment

1.4 Submitting and Managing Jobs with Slurm

Module 2: AI Vision Systems for HPC Workloads

2.1 Foundations of Deep Learning for Computer Vision

2.2 Detection, Segmentation, and Classification Models

2.3 Data Pipelines and Preprocessing for Large Datasets

2.4 Training and Evaluating Vision Models on HPC Platforms

Module 3: Large Language Models and Multimodal AI

3.1 Transformer Architectures and Foundations of LLMs

3.2 Multimodal Models Combining Vision and Language

3.3 Fine-Tuning and Parameter-Efficient Methods (LoRA)

3.4 Retrieval-Augmented Generation (RAG)

3.5 Agentic AI and Tool-Using Workflows

3.6 Evaluation and Real-World Applications

Module 4: Distributed AI Training on HPC Clusters

4.1 Principles of Distributed and Parallel Training

4.2 Data, Model, and Pipeline Parallelism

4.3 Multi-GPU and Multi-Node Training Frameworks

4.4 Scaling Behavior and Communication Overhead

Module 5: AI Deployment and Inference Optimization

5.1 Model Serving Architectures and Inference Engines

5.2 Quantization, Pruning, and Model Compression

5.3 Optimizing Throughput, Latency, and Batching

5.4 Deployment on Cluster and Edge Targets

Module 6: AI Applications for Industry and Research

6.1 Identifying Valuable AI Use Cases in Industry and Research

6.2 Case Studies in Vision, Language, and Multimodal Systems

6.3 Moving from Prototype to Deployment

Module 7: AI Compilers, MLIR/LLVM, and Runtime Systems

7.1 The Role of Compilers in the AI Software Stack

7.2 Introduction to MLIR and LLVM for AI Workloads

7.3 Graph Optimization Techniques

7.4 Runtime Systems for AI Execution

7.5 Hardware Mapping and Accelerator Support