[all-commits] [llvm/llvm-project] 2f627c: [NVPTX] Support for dense and sparse MMA intrinsic...

Fri Nov 21 04:14:14 PST 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 2f627c1878a3dba594c872773107c556992af3a1
      https://github.com/llvm/llvm-project/commit/2f627c1878a3dba594c872773107c556992af3a1
  Author: Kirill Vedernikov <kvedernikov at nvidia.com>
  Date:   2025-11-21 (Fri, 21 Nov 2025)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsNVVM.td
    M llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
    A llvm/test/CodeGen/NVPTX/wmma-ptx88-sm120a.py
    M llvm/test/CodeGen/NVPTX/wmma.py

  Log Message:
  -----------
  [NVPTX] Support for dense and sparse MMA intrinsics with block scaling. (#163561)

This change adds dense and sparse MMA intrinsics with block scaling. The
implementation is based on [PTX ISA version
9.0](https://docs.nvidia.com/cuda/parallel-thread-execution/). Tests for
new intrinsics are added for PTX 8.7 and SM 120a and are generated by
`llvm/test/CodeGen/NVPTX/wmma-ptx87-sm120a.py`. The tests have been
verified with ptxas from CUDA-13.0 release.
Dense MMA intrinsics with block scaling were supported by
@schwarzschild-radius.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications