[Mlir-commits] [mlir] f1f194b - [mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Aug 21 19:46:09 PDT 2025
Author: Yang Bai
Date: 2025-08-21T21:46:06-05:00
New Revision: f1f194bf10e6ce180bbb199fa219c4d1ec67290f
URL: https://github.com/llvm/llvm-project/commit/f1f194bf10e6ce180bbb199fa219c4d1ec67290f
DIFF: https://github.com/llvm/llvm-project/commit/f1f194bf10e6ce180bbb199fa219c4d1ec67290f.diff
LOG: [mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)
### Problem
PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir
The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.
### Fix
This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:
- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.
Co-authored-by: Yang Bai <yangb at nvidia.com>
Added:
Modified:
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
Removed:
################################################################################
diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3cfbd898e49e2..e516118f75207 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -532,6 +532,9 @@ void GpuToLLVMConversionPass::runOnOperation() {
// Vector transfer ops with rank > 1 should be lowered with VectorToSCF.
vector::populateVectorTransferLoweringPatterns(patterns,
/*maxTransferRank=*/1);
+ // Transform N-D vector.from_elements to 1-D vector.from_elements before
+ // conversion.
+ vector::populateVectorFromElementsLoweringPatterns(patterns);
if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
return signalPassFailure();
}
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index 317bfc2970cf5..50ac1c60184eb 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -27,6 +27,7 @@
#include "mlir/Dialect/Math/IR/Math.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/NVGPU/IR/NVGPUDialect.h"
+#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
#include "mlir/Transforms/DialectConversion.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
@@ -369,6 +370,9 @@ struct LowerGpuOpsToNVVMOpsPass final
{
RewritePatternSet patterns(m.getContext());
populateGpuRewritePatterns(patterns);
+ // Transform N-D vector.from_elements to 1-D vector.from_elements before
+ // conversion.
+ vector::populateVectorFromElementsLoweringPatterns(patterns);
if (failed(applyPatternsGreedily(m, std::move(patterns))))
return signalPassFailure();
}
More information about the Mlir-commits
mailing list