[all-commits] [llvm/llvm-project] d3edc9: [MLIR][GPU] subgroup_mma fp64 extension - take 2 (...

Giacomo Castiglioni via All-commits all-commits at lists.llvm.org
Mon Dec 1 04:40:21 PST 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d3edc94d113d2d30a7a26fa4d72496ac0b9256b8
      https://github.com/llvm/llvm-project/commit/d3edc94d113d2d30a7a26fa4d72496ac0b9256b8
  Author: Giacomo Castiglioni <giacastiglioni at gmail.com>
  Date:   2025-12-01 (Mon, 01 Dec 2025)

  Changed paths:
    M mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h
    M mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
    M mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
    M mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
    M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
    M mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
    M mlir/test/Dialect/GPU/invalid.mlir
    A mlir/test/Integration/GPU/CUDA/TensorCore/sm80/wmma-matmul-f64.mlir

  Log Message:
  -----------
  [MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061)

This PR re-lands #165873.

This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).

The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list