[llvm] [mlir] [MLIR][AMDGPU] Adding dynamic size check to avoid subword buffer load (PR #135014)
Zhuoran Yin via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 15 10:24:31 PDT 2025
================
@@ -149,6 +278,8 @@ struct AmdgpuTransferReadToLoadPass final
void runOnOperation() override {
RewritePatternSet patterns(&getContext());
populateAmdgpuTransferReadToLoadPatterns(patterns);
- walkAndApplyPatterns(getOperation(), std::move(patterns));
+ if (failed(applyPatternsGreedily(getOperation(), std::move(patterns)))) {
----------------
jerryyin wrote:
The populated IR becomes way cleaner with the greedy rewriter. It is able to fold the arith computes nicely into a constant. For example, previous with the walkAndApplyPattern I had:
```mlir
%cst = arith.constant 0.000000e+00 : f32
%base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x8xf32, #amdgpu.address_space<fat_raw_buffer>> -> memref<f32, #amdgpu.address_space<fat_raw_buffer>>, index, index, index, index, index
%0 = affine.apply #map()[%arg1]
%1 = affine.max #map1()[%strides#0, %sizes#0, %strides#1, %sizes#1]
%c4 = arith.constant 4 : index
%2 = arith.subi %1, %0 : index
%3 = arith.cmpi ule, %2, %c4 : index
%c4_0 = arith.constant 4 : index
%4 = arith.muli %2, %c4_0 : index
%c1 = arith.constant 1 : index
%5 = arith.remui %4, %c1 : index
%c0 = arith.constant 0 : index
%6 = arith.cmpi ne, %5, %c0 : index
%7 = arith.andi %3, %6 : i1
%8 = scf.if %7 -> (vector<4xf32>) {
```
With greedy rewriter, it will figure out that for fp32 load this always evaluate to false so:
```mlir
%false = arith.constant false
%0 = scf.if %false -> (vector<4xf32>) {
```
https://github.com/llvm/llvm-project/pull/135014
More information about the llvm-commits
mailing list