[Mlir-commits] [mlir] [MLIR] [XeGPU] Add distribution patterns for vector transpose, bitcast & mask ops in sg to wi pass (PR #187392)

Wed Mar 25 15:49:24 PDT 2026

================
@@ -461,6 +461,88 @@ gpu.func @vector_multi_reduction_dim0_distributed_dim1_reduction(%laneid: index)
   gpu.return
 }
 
+// CHECK-LABEL: gpu.func @vector_transpose
+// CHECK:         %[[SRC:.*]] = "some_op"()
+// CHECK:         %[[CAST:.*]] = builtin.unrealized_conversion_cast %[[SRC]] : vector<16x2xf32> to vector<1x2xf32>
+// CHECK-NEXT:    %[[T:.*]] = vector.transpose %[[CAST]], [1, 0] : vector<1x2xf32> to vector<2x1xf32>
+// CHECK-NEXT:    gpu.return
+gpu.func @vector_transpose() {
+  %cst = "some_op"()
+    {layout_result_0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1], order = [0, 1]>}
+    : () -> (vector<16x2xf32>)
+  %transpose = vector.transpose %cst, [1, 0]
+    {
+      layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+    }
+    : vector<16x2xf32> to vector<2x16xf32>
+  gpu.return
+}
+
+// CHECK-LABEL: gpu.func @vector_bitcast
+// CHECK:         %[[SRC:.*]] = "some_op"()
+// CHECK:         %[[CAST:.*]] = builtin.unrealized_conversion_cast %[[SRC]] : vector<4x32xi8> to vector<4x2xi8>
+// CHECK-NEXT:    %[[BC:.*]] = vector.bitcast %[[CAST]] : vector<4x2xi8> to vector<4x1xi16>
+// CHECK-NEXT:    gpu.return
+gpu.func @vector_bitcast() {
+  %cst = "some_op"()
+    {layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 2]>}
+    : () -> (vector<4x32xi8>)
+  %bitcast = vector.bitcast %cst
+    {
+      layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>
+    }
+    : vector<4x32xi8> to vector<4x16xi16>
+  gpu.return
+}
+
+// CHECK-LABEL: gpu.func @create_mask_1d
+//  CHECK-SAME: (%[[M0:.*]]: index)
+//       CHECK:   %[[LANE:.*]] = gpu.lane_id
+//       CHECK:   %[[NEW_BOUND:.*]] = affine.apply
+//       CHECK:   %[[MASK:.*]] = vector.create_mask %[[NEW_BOUND]] : vector<1xi1>
+//       CHECK:   gpu.return
+gpu.func @create_mask_1d(%m0: index) {
+  %mask = vector.create_mask %m0
+    {layout_result_0 = #xegpu.layout<lane_layout = [16], lane_data = [1]>}
+    : vector<16xi1>
+  gpu.return
+}
+
+// CHECK-LABEL: gpu.func @constant_mask_1d
+//       CHECK:   %[[LANE:.*]] = gpu.lane_id
+//       CHECK:   %[[NEW_BOUND:.*]] = affine.apply
+//       CHECK:   %[[MASK:.*]] = vector.create_mask %[[NEW_BOUND]] : vector<1xi1>
+//       CHECK:   gpu.return
+gpu.func @constant_mask_1d() {
----------------
Jianhui-Li wrote:

I agree that create_mask_1d_scalable() can be removed. 

Nice to see you add create_mask_2d()!

https://github.com/llvm/llvm-project/pull/187392