[Mlir-commits] [mlir] [mlir][AMDGPU] Add canonicalizer for folding casts into gather_to_lds (PR #150503)

Thu Jul 24 12:27:23 PDT 2025

================
@@ -130,3 +130,17 @@ func.func @dead_atomic_add(%arg0: memref<4xf32>, %arg1: f32) {
   amdgpu.raw_buffer_atomic_fadd {boundsCheck = true} %arg1 -> %arg0[%c4_i32] : f32 -> memref<4xf32>, i32
   func.return
 }
+
+// -----
+
+// CHECK-LABEL: func @fold_gather_to_lds_of_cast
+func.func @fold_gather_to_lds_of_cast(%global: memref<128x72xf32, 1>, %lds: memref<64x64xf32, 3>) {
+// CHECK-SAME: %[[GLOBAL:[A-Za-z0-9]+]]: memref<128x72xf32, 1>
+  %c0 = arith.constant 0 : index
+  %0 = memref.cast %global : memref<128x72xf32, 1> to memref<?x?xf32, 1>
+  // CHECK: amdgpu.gather_to_lds %[[GLOBAL]]
+  // CHECK-SAME: : f32, memref<128x72xf32, 1>
+  amdgpu.gather_to_lds %0[%c0, %c0], %lds[%c0, %c0]
----------------
kuhar wrote:

Should we also test for the case that modifies the Dst?

https://github.com/llvm/llvm-project/pull/150503