Hardware-Software Co-Design Course

Hardware-Software Co-Design for AI Accelerators and Compilers on RISC-V Platforms

Course Objective

This course aims to equip students with the knowledge and skills required for end-to-end hardware–software co-design on open RISC-V platforms, with a strong focus on AI/ML workloads. Students will build a solid foundation in computer architecture, digital design, and SoC integration with open-source RISC-V hardware while training in the design and implementation of custom AI accelerators, including integration into full RISC-V SoC prototypes using LiteX and FPGA platforms.

The course provides hands-on experience in low-level programming, including bare-metal drivers, Linux kernel modules, and user-space interfaces for hardware accelerators. Participants will develop expertise in compiler toolchains and MLIR-based frameworks, enabling them to extend compilers, add intrinsics, and design accelerator-specific MLIR dialects. Through practical deployment of AI/ML applications on specialized open hardware using runtime frameworks such as IREE, TVM, and TensorFlow Lite, the course creates a skilled talent pipeline of engineers and researchers who can bridge hardware, compilers, and AI runtimes, aligned with industrial projects and collaborations.

Course Outcomes

Upon completion of this course, students will demonstrate familiarity with the complete hardware/software stack – from RISC-V CPU microarchitecture and SoC buses to operating systems, drivers, and compilers. They will be able to design, implement, and integrate custom AI accelerators into RISC-V SoCs and validate them on FPGA platforms, while developing and testing bare-metal HALs, Linux kernel drivers, and user-space APIs to control accelerators.

Students will extend LLVM/Clang and MLIR toolchains with custom instructions, intrinsics, and dialects, enabling compiler-level optimization for specialized hardware. They will deploy and optimize AI/ML workloads (e.g., CNNs, RNNs, transformers) on RISC-V + accelerator platforms using IREE/MLIR and runtime frameworks, while analyzing and evaluating performance, scheduling, and memory trade-offs for AI inference across CPU and custom hardware backends. Finally, students will contribute to industry-relevant open-source projects, CI pipelines, and collaborative research in AI compilers, firmware, and runtime systems.

Course Modules

Module 1: Foundations of RISC-V and Digital Design

▼

1.1 Computer Architecture Basics

1.2 RISC-V ISA & Microarchitecture

1.3 Computer architecture in C++

1.4 Digital Logic Design in Verilog

1.5 SoC Buses and Interconnects (AXI/AHB/TileLink)

1.6 Memory Hierarchy and Addressing in SoCs

1.7 Intro to LiteX SoC Framework

Module 2: Accelerator Design & Integration (Hardware Level)

▼

2.1 Accelerator Development Hardware & Software

2.2 Integration via Standard Interfaces

2.3 Memory-Mapped I/O & Address Allocation

2.4 SoC Build and Peripheral Integration

Module 3: Software Stack Development (Drivers and OS Integration)

▼

3.1 Bare-Metal Drivers and HAL

3.2 C/C++ for Hardware Control

3.3 Integrating Peripherals into Boot/BIOS

3.4 Linux Kernel Module Development

3.5 Device Tree and Driver Binding

3.6 User-Space Access (mmap/ioctl/sysfs)

3.7 Testing and CI Practices

Module 4: Foundations of AI/ML Workloads and Offloading Strategies

▼

4.1 ML and AI Basics for Edge Devices

4.2 Real-Time and Streaming ML

4.3 Large Language & Vision-Language Models (LLMs/VLMs)

4.4 Model Deployment Frameworks

4.5 Scheduling and Memory Management

4.6 Profiling and Optimization

Module 5: Compiler Toolchains & MLIR for Accelerators

▼

5.1 RISC-V Toolchain Internals

5.2 Adding Custom Instructions/Intrinsics

5.3 MLIR Dialect Design for the Accelerator

5.4 Integration with AI Compiler Stacks (IREE/TVM)

5.5 Compiler Optimizations for Accelerator Utilization

5.6 Resource Awareness in Compilation

Module 6: System Prototyping Projects

▼

6.1 SoC Bring-up on FPGA (Hello World)

6.2 Integrate Custom Accelerator

6.3 Driver Development and Testing

6.4 Compiler and Toolchain Integration

6.5 Deploy an AI Model on the Accelerator

6.6 Evaluation and Presentation