[Mlir-commits] [mlir] Enable LICM for ops with only read side effects in scf.for (PR #120302)

Mon Jan 6 16:26:47 PST 2025

ardaunal wrote:

I changed the approach as we discussed on [Speculative LICM?](https://discourse.llvm.org/t/speculative-licm/80977).

Following is different:
- Loop is no longer wrapped within a guard.
- Ops with only read side effect are hoisted with a guard. Else statement of this guard has the **ub.poison** op having the same type(s) with the op being hoisted.
- Pure ops are hoisted without a guard unless an op was hoisted with a guard before. Otherwise, pure op is hoisted with a guard. This is needed not to have interleaving branches such as:
```module {
  func.func @test_speculatable_op_with_read_side_effect_success_with_dependents(%arg0: index, %arg1: index, %arg2: index) -> i32 {
    %c0_i32 = arith.constant 0 : i32
    %cst = arith.constant dense<42> : tensor<64xi32>
    %c42 = arith.constant 42 : index
    %0 = "test.always_speculatable_op"() : () -> i32
    %1 = arith.cmpi ult, %arg0, %arg1 : index
    %2 = scf.if %1 -> (i32) {
      %8 = "test.speculatable_op_with_memread"(%cst, %c42) : (tensor<64xi32>, index) -> i32
      scf.yield %8 : i32
    } else {
      %8 = ub.poison : i32
      scf.yield %8 : i32
    }
    %3 = arith.addi %0, %2 : i32
    %4 = arith.cmpi ult, %arg0, %arg1 : index
    %5 = scf.if %4 -> (i32) {
      %8 = "test.speculatable_op_with_memread"(%cst, %c42) : (tensor<64xi32>, index) -> i32
      scf.yield %8 : i32
    } else {
      %8 = ub.poison : i32
      scf.yield %8 : i32
    }
    %6 = arith.addi %3, %5 : i32
    %7 = scf.for %arg3 = %arg0 to %arg1 step %arg2 iter_args(%arg4 = %c0_i32) -> (i32) {
      %8 = arith.index_cast %arg3 : index to i32
      %9 = arith.addi %6, %8 : i32
      scf.yield %9 : i32
    }
    return %7 : i32
  }
}
```

so that CSE and canonicalizer can do their job to get the following instead:

```module {
  func.func @test_speculatable_op_with_read_side_effect_success_with_dependents(%arg0: index, %arg1: index, %arg2: index) -> i32 {
    %0 = ub.poison : i32
    %c0_i32 = arith.constant 0 : i32
    %cst = arith.constant dense<42> : tensor<64xi32>
    %c42 = arith.constant 42 : index
    %1 = "test.always_speculatable_op"() : () -> i32
    %2 = arith.cmpi ult, %arg0, %arg1 : index
    %3 = scf.if %2 -> (i32) {
      %5 = "test.speculatable_op_with_memread"(%cst, %c42) : (tensor<64xi32>, index) -> i32
      %6 = arith.addi %1, %5 : i32
      %7 = "test.speculatable_op_with_memread"(%cst, %c42) : (tensor<64xi32>, index) -> i32
      %8 = arith.addi %6, %7 : i32
      scf.yield %8 : i32
    } else {
      scf.yield %0 : i32
    }
    %4 = scf.for %arg3 = %arg0 to %arg1 step %arg2 iter_args(%arg4 = %c0_i32) -> (i32) {
      %5 = arith.index_cast %arg3 : index to i32
      %6 = arith.addi %3, %5 : i32
      scf.yield %6 : i32
    }
    return %4 : i32
  }
}
```

- There is only one new interface function `moveOutOfLoopWithGuard` which is implemented by **scf.for** for now. Implementation for **affine.for** should be similar.

https://github.com/llvm/llvm-project/pull/120302