[Mlir-commits] [mlir] [mlir][ArmSME] Use liveness information in the tile allocator (PR #90448)
Andrzej WarzyĆski
llvmlistbot at llvm.org
Fri May 10 03:18:39 PDT 2024
================
@@ -156,28 +234,123 @@ func.func @run_out_of_tiles_but_avoid_spill(%a: vector<[4]xf32>, %b: vector<[4]x
// -----
-// Incorrect result! Everything other than zero assigned to tile 1 (which means values that are still live are overwritten).
-//
-// CHECK-BAD-LABEL: @avoidable_spill
-// CHECK-BAD: arm_sme.zero {tile_id = 0 : i32}
-// CHECK-BAD: arm_sme.get_tile {tile_id = 1 : i32}
-// CHECK-BAD-COUNT-4: arm_sme.move_vector_to_tile_slice {{.*}} {tile_id = 1 : i32}
+// We should be able to avoid spills like this, but logic handling this case is
+// not implemented yet. Note tile ID >= 16 means a spill/in-memory tile.
+
+// CHECK-LIVE-RANGE-LABEL: @avoidable_spill
+// CHECK-LIVE-RANGE: ========== Coalesced Live Ranges:
+// CHECK-LIVE-RANGE: ^bb2:
+// CHECK-LIVE-RANGE-NEXT: || test.some_use
+// CHECK-LIVE-RANGE-NEXT: ||S arm_sme.move_vector_to_tile_slice
+// CHECK-LIVE-RANGE-NEXT: |||S arm_sme.move_vector_to_tile_slice
+// CHECK-LIVE-RANGE-NEXT: ||||S arm_sme.move_vector_to_tile_slice
+// CHECK-LIVE-RANGE-NEXT: |||||S arm_sme.move_vector_to_tile_slice
+// CHECK-LIVE-RANGE-NEXT: ||E||| test.some_use
+// CHECK-LIVE-RANGE-NEXT: || E|| test.some_use
+// CHECK-LIVE-RANGE-NEXT: || E| test.some_use
+// CHECK-LIVE-RANGE-NEXT: || E test.some_use
+// CHECK-LIVE-RANGE-NEXT: || arith.addi
+// CHECK-LIVE-RANGE-NEXT: EE cf.br
+
+// Note in the live ranges (above) there is two constant live-ins (first two ranges),
+// which gives six overlapping live ranges. The allocator currently will spill the
+// first constant (which results in a real spill at it's use), however, this could
+// be avoided by using the knowledge that at the first "test.some_use" there's
+// actually only two live ranges (so we can fix this be duplicating the constant).
+
+// CHECK-LABEL: @avoidable_spill
func.func @avoidable_spill(%a: vector<[4]xf32>, %b: vector<[4]xf32>, %c: vector<[4]xf32>, %d: vector<[4]xf32>) {
+ // CHECK: arm_sme.zero {tile_id = 16 : i32} : vector<[4]x[4]xf32>
%zero = arm_sme.zero : vector<[4]x[4]xf32>
%tile = arm_sme.get_tile : vector<[4]x[4]xf32>
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c10 = arith.constant 10 : index
scf.for %i = %c0 to %c10 step %c1 {
+ // So spilled here (unnecessarily).
+ // The arm_sme.zero op could be moved into the loop to avoid this.
"test.some_use"(%zero) : (vector<[4]x[4]xf32>) -> ()
%tile_a = arm_sme.move_vector_to_tile_slice %a, %tile, %c0 : vector<[4]xf32> into vector<[4]x[4]xf32>
%tile_b = arm_sme.move_vector_to_tile_slice %b, %tile, %c0 : vector<[4]xf32> into vector<[4]x[4]xf32>
%tile_c = arm_sme.move_vector_to_tile_slice %c, %tile, %c0 : vector<[4]xf32> into vector<[4]x[4]xf32>
%tile_d = arm_sme.move_vector_to_tile_slice %d, %tile, %c0 : vector<[4]xf32> into vector<[4]x[4]xf32>
+ // %zero is still live here (due the the backedge)
"test.some_use"(%tile_a) : (vector<[4]x[4]xf32>) -> ()
"test.some_use"(%tile_b) : (vector<[4]x[4]xf32>) -> ()
"test.some_use"(%tile_c) : (vector<[4]x[4]xf32>) -> ()
"test.some_use"(%tile_d) : (vector<[4]x[4]xf32>) -> ()
}
return
}
+
+// -----
+
+// This test is a follow up to the test of the same name in `tile-allocation-copies.mlir`.
+// This shows the live ranges (which are why we need to split the conditional branch).
+
+// CHECK-LIVE-RANGE-LABEL: @cond_branch_with_backedge
+// CHECK-LIVE-RANGE: ^bb1:
+// CHECK-LIVE-RANGE--NEXT: ||| | arith.cmpi
+// CHECK-LIVE-RANGE--NEXT: EEE E cf.cond_br
+//
+// CHECK-LIVE-RANGE--NEXT: ^[[BB3_COPIES:[[:alnum:]]+]]:
+// CHECK-LIVE-RANGE--NEXT: ||| ES arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: E|| |S arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: E| ||S arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: E |||S arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: EEEE cf.br
+//
+// It is important to note that the first three live ranges in ^bb1 do not end
+// at the `cf.cond_br` they are live-out via the backedge bb1 -> bb2 -> bb1.
+// This means that if we placed the `arm_sme.tile_copies` before the `cf.cond_br`
+// then those live ranges would not end at the copies, resulting in unwanted
+// overlapping live ranges (and hence tile spills).
+//
+// With the conditional branch split and the copies placed in the BB3_COPIES
+// block the first three live ranges end at the copy operations (as the
+// BB3_COPIES block is on the path out of the loop and has no backedge). This
+// means there is no overlaps and the live ranges all merge, as shown below.
+//
+// CHECK-LIVE-RANGE: ========== Coalesced Live Ranges:
+// CHECK-LIVE-RANGE: ^bb1:
+// CHECK-LIVE-RANGE--NEXT: |||| arith.cmpi
+// CHECK-LIVE-RANGE--NEXT: EEEE cf.cond_br
+//
+// CHECK-LIVE-RANGE--NEXT: ^[[BB3_COPIES]]:
+// CHECK-LIVE-RANGE--NEXT: |||| arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: |||| arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: |||| arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: |||| arm_sme.copy_tile
+// CHECK-LIVE-RANGE--NEXT: EEEE cf.br
+
+// CHECK-LABEL: @cond_branch_with_backedge
+// CHECK-NOT: tile_id = 16
+// CHECK: arm_sme.get_tile {tile_id = 0 : i32} : vector<[4]x[4]xf32>
+// CHECK: arm_sme.get_tile {tile_id = 1 : i32} : vector<[4]x[4]xf32>
+// CHECK: arm_sme.get_tile {tile_id = 2 : i32} : vector<[4]x[4]xf32>
+// CHECK: arm_sme.get_tile {tile_id = 3 : i32} : vector<[4]x[4]xf32>
+// CHECK: arm_sme.move_vector_to_tile_slice {{.*}} {tile_id = 0 : i32} : vector<[4]xf32> into vector<[4]x[4]xf32>
+// CHECK-NOT tile_id = 16
+func.func @cond_branch_with_backedge(%slice: vector<[4]xf32>) {
----------------
banach-space wrote:
Live ranges for this function:
```
========== Initial Live Ranges:
SME Tile Liveness: @cond_branch_with_backedge
Key:
S - Start
E - End
| - Live
^bb0:
S arm_sme.get_tile
|S arm_sme.get_tile
||S arm_sme.get_tile
|||S arm_sme.get_tile
|||| arith.constant
|||| arith.constant
|||| arith.constant
E|||S arm_sme.copy_tile
EEEE cf.br
^bb1:
||| | arith.cmpi
EEE E cf.cond_br
^bb2:
||| ES arm_sme.copy_tile
E|| |S arm_sme.copy_tile
E| ||S arm_sme.copy_tile
E |||S arm_sme.copy_tile
EEEE cf.br
^bb3:
EEE E cf.br
^bb4:
||| E S arm_sme.move_vector_to_tile_slice
||| | arith.addi
||| ES arm_sme.copy_tile
EEE E cf.br
^bb5:
EEEE func.return
==========
```
Looking at this part specifically:
```
E|||S arm_sme.copy_tile
EEEE cf.br
^bb1:
||| | arith.cmpi
```
Shouldn't the right-most column on the line with `arith.cmpi` be one the same column as where this live range starts (just before `^bb1`):
```
E|||S arm_sme.copy_tile
```
I'm mostly trying to understand the generated output.
https://github.com/llvm/llvm-project/pull/90448
More information about the Mlir-commits
mailing list