[Mlir-commits] [llvm] [mlir] [XeGPU][Transform] Add XeGPU array length optimization pass (PR #194062)
Md Abdullah Shahneous Bari
llvmlistbot at llvm.org
Fri May 1 17:50:34 PDT 2026
================
@@ -0,0 +1,134 @@
+// RUN: mlir-opt --xegpu-array-length-optimization --split-input-file %s | FileCheck %s
+
+
+// CHECK-LABEL: func.func @test_load_nd_with_extract_slice
+// CHECK-SAME: (%[[ARG0:.*]]: memref<4096x4096xf16>)
+func.func @test_load_nd_with_extract_slice(%arg0: memref<4096x4096xf16>) -> vector<16x16xf16> {
+ %c0 = arith.constant 0 : index
+
+ // CHECK: %[[TDESC:.*]] = xegpu.create_nd_tdesc %[[ARG0]]
+ // CHECK-SAME: memref<4096x4096xf16> -> !xegpu.tensor_desc<32x16xf16, #xegpu.block_tdesc_attr<array_length = 2 : i64>>
+ %tdesc = xegpu.create_nd_tdesc %arg0 : memref<4096x4096xf16> -> !xegpu.tensor_desc<32x32xf16>
+
+ // CHECK: %[[LOAD:.*]] = xegpu.load_nd %[[TDESC]][%{{.*}}, %{{.*}}]
+ // CHECK-SAME: !xegpu.tensor_desc<32x16xf16, #xegpu.block_tdesc_attr<array_length = 2 : i64>> -> vector<64x16xf16>
+ %load = xegpu.load_nd %tdesc[%c0, %c0] : !xegpu.tensor_desc<32x32xf16> -> vector<32x32xf16>
+
+ // Extract first 16x16 block (memory layout: [0:16][0:16])
+ // In memory layout this is first half of FCD
+ // In register layout this stays [0:16][0:16]
+ // CHECK: %[[EXTRACT0:.*]] = vector.extract_strided_slice %[[LOAD]]
+ // CHECK-SAME: {offsets = [0, 0], sizes = [16, 16], strides = [1, 1]}
+ %extract0 = vector.extract_strided_slice %load {offsets = [0, 0], sizes = [16, 16], strides = [1, 1]} : vector<32x32xf16> to vector<16x16xf16>
+
+ return %extract0 : vector<16x16xf16>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @test_load_nd_with_second_extract
+// CHECK-SAME: (%[[ARG0:.*]]: memref<4096x4096xf16>)
+func.func @test_load_nd_with_second_extract(%arg0: memref<4096x4096xf16>) -> vector<16x16xf16> {
+ %c0 = arith.constant 0 : index
+
+ // CHECK: %[[TDESC:.*]] = xegpu.create_nd_tdesc %[[ARG0]]
+ // CHECK-SAME: memref<4096x4096xf16> -> !xegpu.tensor_desc<32x16xf16, #xegpu.block_tdesc_attr<array_length = 2 : i64>>
+ %tdesc = xegpu.create_nd_tdesc %arg0 : memref<4096x4096xf16> -> !xegpu.tensor_desc<32x32xf16>
+
+ // CHECK: %[[LOAD:.*]] = xegpu.load_nd %[[TDESC]][%{{.*}}, %{{.*}}]
+ // CHECK-SAME: !xegpu.tensor_desc<32x16xf16, #xegpu.block_tdesc_attr<array_length = 2 : i64>> -> vector<64x16xf16>
+ %load = xegpu.load_nd %tdesc[%c0, %c0] : !xegpu.tensor_desc<32x32xf16> -> vector<32x32xf16>
+
+ // Extract second 16x16 block (memory layout: [0:16][16:32])
+ // In memory layout this is second half of FCD
+ // In register layout this should be [32:48][0:16] (second array element)
+ // array_index = 16 / 16 = 1
+ // new_offset0 = 0 + (1 * 32) = 32
+ // new_offset1 = 16 % 16 = 0
+ // CHECK: %[[EXTRACT1:.*]] = vector.extract_strided_slice %[[LOAD]]
+ // CHECK-SAME: {offsets = [32, 0], sizes = [16, 16], strides = [1, 1]}
+ %extract1 = vector.extract_strided_slice %load {offsets = [0, 16], sizes = [16, 16], strides = [1, 1]} : vector<32x32xf16> to vector<16x16xf16>
----------------
mshahneo wrote:
I think you mean "offset in FCD is a multiple of sg_size", addressed.
https://github.com/llvm/llvm-project/pull/194062
More information about the Mlir-commits
mailing list