[Mlir-commits] [mlir] [mlir][xegpu] Add dynamic memref support in transpose optimization. (PR #170218)

Wed Dec 3 08:54:30 PST 2025

================
@@ -278,3 +278,32 @@ gpu.func @array_length(%arg0: vector<8x16xf16>, %arg1: memref<256x256xf16>, %arg
   gpu.return
 }
 }
+
+// -----
+// CHECK-LABEL: gpu.func @dynamic_memref(
+// CHECK-SAME:   %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?xf16>, %{{.*}}: vector<8x16xf16>) -> vector<8x16xf32> {
+// CHECK-DAG:     %[[C16:.*]] = arith.constant 16 : index
+// CHECK-DAG:     %[[C32:.*]] = arith.constant 32 : index
+// CHECK-NEXT:    %[[PTR:.*]] = memref.extract_aligned_pointer_as_index %[[ARG0]] : memref<?x?xf16> -> index
+// CHECK-NEXT:    %[[T0:.*]] = arith.index_cast %[[PTR]] : index to i64
+// CHECK-NEXT:    %[[T1:.*]] = xegpu.create_nd_tdesc %[[T0]], shape : [64, %[[C32]]], strides : [%[[C32]], 1] : i64
+// CHECK-SAME:      -> !xegpu.tensor_desc<16x8xi32, #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1]>>
+// CHECK-NEXT:    %[[T2:.*]] = xegpu.load_nd %[[T1]][%{{.*}}, %[[C16]]]  {layout_result_0 =
+// CHECK-SAME:      #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1]>} : !xegpu.tensor_desc<16x8xi32,
+// CHECK-SAME:      #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1]>> -> vector<16x8xi32>
+// CHECK-NEXT:    %{{.*}} = vector.bitcast %[[T2]] {layout_result_0 =
+// CHECK-SAME:      #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>} : vector<16x8xi32> to vector<16x16xf16>
+#a = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+#b = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>
+#bt = #xegpu.layout<lane_layout = [1, 16], lane_data = [2, 1]>
+gpu.module @xevm_module {
+gpu.func @dynamic_memref(%arg0: memref<?x?xf16>, %arg1: vector<8x16xf16>) -> vector<8x16xf32> {
+  %c0 = arith.constant 0 : index
+  %c32 = arith.constant 32 : index
+  %0 = xegpu.create_nd_tdesc %arg0, shape : [64, 64], strides : [64, 1] : memref<?x?xf16> -> !xegpu.tensor_desc<16x16xf16, #b>
----------------
charithaintc wrote:

> this is just a test case. in real usecases, these will be kernel arguments meaning they will runtime values.
> 
> > They are typically used when the source is not memref but a pointer as a lowest IR form.
> 
> not really. this form of memref is really useful when we need to test with dynamic batch sizes that are changed from host side. This PR is needed exactly for that btw, in my FA testing I use small batch sizes but for perf runs I use large ones. Dynamic memrefs allow me to do this very easily. I think you are referring to unranked memref case (`memref<*xf16>`). interestingly xegpu op definition does not allow unranked memref for some reason.



https://github.com/llvm/llvm-project/pull/170218