[all-commits] [llvm/llvm-project] d3edc9: [MLIR][GPU] subgroup_mma fp64 extension - take 2 (...
Giacomo Castiglioni via All-commits
all-commits at lists.llvm.org
Mon Dec 1 04:40:21 PST 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d3edc94d113d2d30a7a26fa4d72496ac0b9256b8
https://github.com/llvm/llvm-project/commit/d3edc94d113d2d30a7a26fa4d72496ac0b9256b8
Author: Giacomo Castiglioni <giacastiglioni at gmail.com>
Date: 2025-12-01 (Mon, 01 Dec 2025)
Changed paths:
M mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h
M mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
M mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
M mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
M mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
M mlir/test/Dialect/GPU/invalid.mlir
A mlir/test/Integration/GPU/CUDA/TensorCore/sm80/wmma-matmul-f64.mlir
Log Message:
-----------
[MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061)
This PR re-lands #165873.
This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).
The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list