2025-11-17 · HuggingFace

Easily Build and Share ROCm Kernels with Hugging Face

infrastructure

Source: HuggingFace Date: 2025-11-17 URL: https://huggingface.co/blog/build-rocm-kernels

Summary

Integration tutorial: end-to-end guide for building, packaging, and sharing custom ROCm GPU kernels on HuggingFace Hub using the kernels library and kernel-builder tooling. Uses Nix for reproducible builds, HIP for GPU code, and PyTorch C++ extensions for Python bindings. Featured kernel: RadeonFlow GEMM (FP8 block-wise matrix multiplication for AMD MI300X), winner of the AMD Developer Challenge 2025 Grand Prize. Shared kernels load with from kernels import get_kernel.

Implications

HF as open-source ML hub. HF Hub now functions as a distribution layer for GPU kernel code, not just model weights. This matters because custom CUDA/HIP kernels (FlashAttention, quantization kernels, custom GEMM) are often the performance-critical component in production inference — making them as shareable as model weights lowers the barrier to kernel reuse across the community.

Open-weights ecosystem health. The AMD MI300X focus signals growing ROCm investment in the open ML community. If custom kernels for AMD hardware become as discoverable and shareable as NVIDIA-optimized kernels, AMD’s cost advantage in cloud AI compute (MI300X vs. H100 pricing) becomes more accessible to teams that previously couldn’t optimize for non-NVIDIA hardware.

← all signals