Hardware-Software Co-Design Course

Hardware-Software Co-Design for AI Accelerators and Compilers on RISC-V Platforms

poster image 1
poster image 2

Course Objective

This course aims to equip students with the knowledge and skills required for end-to-end hardware–software co-design on open RISC-V platforms, with a strong focus on AI/ML workloads. Students will build a solid foundation in computer architecture, digital design, and SoC integration with open-source RISC-V hardware while training in the design and implementation of custom AI accelerators, including integration into full RISC-V SoC prototypes using LiteX and FPGA platforms.

The course provides hands-on experience in low-level programming, including bare-metal drivers, Linux kernel modules, and user-space interfaces for hardware accelerators. Participants will develop expertise in compiler toolchains and MLIR-based frameworks, enabling them to extend compilers, add intrinsics, and design accelerator-specific MLIR dialects. Through practical deployment of AI/ML applications on specialized open hardware using runtime frameworks such as IREE, TVM, and TensorFlow Lite, the course creates a skilled talent pipeline of engineers and researchers who can bridge hardware, compilers, and AI runtimes, aligned with industrial projects and collaborations.

Course Outcomes

Upon completion of this course, students will demonstrate familiarity with the complete hardware/software stack – from RISC-V CPU microarchitecture and SoC buses to operating systems, drivers, and compilers. They will be able to design, implement, and integrate custom AI accelerators into RISC-V SoCs and validate them on FPGA platforms, while developing and testing bare-metal HALs, Linux kernel drivers, and user-space APIs to control accelerators.

Students will extend LLVM/Clang and MLIR toolchains with custom instructions, intrinsics, and dialects, enabling compiler-level optimization for specialized hardware. They will deploy and optimize AI/ML workloads (e.g., CNNs, RNNs, transformers) on RISC-V + accelerator platforms using IREE/MLIR and runtime frameworks, while analyzing and evaluating performance, scheduling, and memory trade-offs for AI inference across CPU and custom hardware backends. Finally, students will contribute to industry-relevant open-source projects, CI pipelines, and collaborative research in AI compilers, firmware, and runtime systems.

Course Modules

Module 1: Computer Architecture and RISC-V Foundations

1.1 Digital Systems

1.2 Processor Architecture Basics

1.3 RISC-V ISA and Microarchitecture

1.4 Performance, Pipelines, Memory Hierarchy, and I/O

1.5 SoC Buses, Interconnects, and Memory Mapping

1.6 RVV Introduction (Configuration + Core Instruction Groups)

Module 2: Modern C++ and High-Performance Programming for RISC-V

2.1 Modern RISC-V Toolchains and Build Systems (GCC/Clang/LLD)

2.2 Functions, Pointers, References

2.3 Standard Template Library (STL)

2.4 Object Oriented Programming

2.5 Template Basics and Miscellaneous

2.6 High-Performance Kernels, STL Utilities, and Benchmark Harnesses

Module 3: Linux, Drivers, and Accelerator Integration

3.1 RISC-V Linux Bring-up, Userspace Execution, and C++ Build Ecosystem

3.2 Device Tree, Memory System, and Accelerator Hardware Contracts

3.3 Custom Accelerator Architectures, SDK Patterns, and Linux Integration

3.4 Profiling and Optimization on Real RISC-V and Accelerator Hardware

Module 4: AI/ML Models and Hardware-Aware Deployment

4.1 Foundations of AI/ML Models & Edge Constraints

4.2 Transformers, LLMs, and Kernel Mapping

4.3 Deployment Frameworks and Execution Paths

4.4 Scheduling and Memory Optimization on Hardware

4.5 Profiling and Optimization of ML Workloads

Module 5: Compiler Design and Accelerator Toolchains

5.1 RISC-V Advanced Toolchain Internals (LLVM/GCC)

5.2 Adding Custom Instructions and Intrinsics

5.3 MLIR Dialect Design for an Accelerator

5.4 Integration with AI Compiler Stacks (IREE/TVM)

5.5 Compiler Optimizations for Accelerator Utilization

5.6 Resource-Aware Compilation (Cost Models + Constraints)

Module 6: System Prototyping Projects

6.1 SoC Bring-up on FPGA (Hello World)

6.2 Integrate Custom Accelerator

6.3 Driver Development and Testing

6.4 Compiler and Toolchain Integration

6.5 Deploy an AI Model on the Accelerator

6.6 Evaluation and Presentation