[Mlir-commits] [mlir] [mlir][vector] VectorEmulateNarrowType uses deinterleave (PR #94946)

Wed Jun 12 03:24:26 PDT 2024

================
@@ -300,16 +298,25 @@ func.func @aligned_trunci_2d(%a: vector<8x32xi32>) -> vector<8x32xi4> {
 // CHECK-NOT:       vector.shli
 // CHECK-NOT:       vector.ori
 // CHECK:           arith.trunci
+// CHECK:           vector.deinterleave
   %0 = arith.trunci %a : vector<8x32xi32> to vector<8x32xi4>
   return %0 : vector<8x32xi4>
 }
 
+// CHECK-LABEL: func.func @aligned_trunci_nd(
+func.func @aligned_trunci_nd(%a: vector<3x8x32xi32>) -> vector<3x8x32xi4> {
+  // CHECK: arith.trunci
+  // CHECK: vector.deinterleave
+  %0 = arith.trunci %a : vector<3x8x32xi32> to vector<3x8x32xi4>
+  return %0 : vector<3x8x32xi4>
+}
+
 // CHECK-LABEL: func.func @i4_transpose(
 func.func @i4_transpose(%a: vector<8x16xi4>) -> vector<16x8xi4> {
 // CHECK-SAME:    %[[IN:.*]]: vector<8x16xi4>) -> vector<16x8xi4> {
 // CHECK:           %[[EXT:.*]] = vector.interleave
 // CHECK:           %[[TRANS:.*]] = vector.transpose %[[EXT]], [1, 0] : vector<8x16xi8> to vector<16x8xi8>
-// CHECK:           %[[TRUNC:.*]] = arith.trunci %[[TRANS]] : vector<16x8xi8> to vector<16x8xi4>
----------------
mub-at-arm wrote:

As in the IR output or llvm-lit output? The llvm-lit test fails with the above version of the test, since there is no lowering to deinterleave. The IR output lowers to deinterleave like so:
```mlir
    %cst = arith.constant dense<4> : vector<8x8xi8>
    %0 = vector.bitcast %arg0 : vector<8x16xi4> to vector<8x8xi8>
    %1 = arith.shli %0, %cst : vector<8x8xi8>
    %2 = arith.shrsi %1, %cst : vector<8x8xi8>
    %3 = arith.shrsi %0, %cst : vector<8x8xi8>
    %4 = vector.interleave %2, %3 : vector<8x8xi8> -> vector<8x16xi8>
    %5 = vector.transpose %4, [1, 0] : vector<8x16xi8> to vector<16x8xi8>
    %6 = arith.trunci %5 : vector<16x8xi8> to vector<16x8xi4>
    return %6 : vector<16x8xi4>
```
With this patch however, the trunci is rewritten to its new deinterleave lowering. For example:
```mlir
    %cst = arith.constant dense<4> : vector<16x4xi8>
    %cst_0 = arith.constant dense<15> : vector<16x4xi8>
    %cst_1 = arith.constant dense<4> : vector<8x8xi8>
    %0 = vector.bitcast %arg0 : vector<8x16xi4> to vector<8x8xi8>
    %1 = arith.shli %0, %cst_1 : vector<8x8xi8>
    %2 = arith.shrsi %1, %cst_1 : vector<8x8xi8>
    %3 = arith.shrsi %0, %cst_1 : vector<8x8xi8>
    %4 = vector.interleave %2, %3 : vector<8x8xi8> -> vector<8x16xi8>
    %5 = vector.transpose %4, [1, 0] : vector<8x16xi8> to vector<16x8xi8>
    %res1, %res2 = vector.deinterleave %5 : vector<16x8xi8> -> vector<16x4xi8>
    %6 = arith.andi %res1, %cst_0 : vector<16x4xi8>
    %7 = arith.shli %res2, %cst : vector<16x4xi8>
    %8 = arith.ori %6, %7 : vector<16x4xi8>
    %9 = vector.bitcast %8 : vector<16x4xi8> to vector<16x8xi4>
```

https://github.com/llvm/llvm-project/pull/94946