[PATCH] D108114: [LoopPeel] Peel if it turns invariant loads dereferenceable.

Tue Aug 17 13:22:56 PDT 2021

fhahn added a comment.

In D108114#2949657 <https://reviews.llvm.org/D108114#2949657>, @reames wrote:

> Before reviewing the patch, a high level question.  Fair warning, I am a bit worried about cost heuristic changes here, they tend to be delicate.
>
> Why do we need to peel this?  LICM is generally good at using speculation to hoist, and trivial unswitching is good at versioning the conditions to remove them.  I'd expect to see the motivating case already handled by some combination of existing transforms.  (You might have to iterate them a few times.  So is this working around a pass ordering issue?)

Do you have any pointers at the kind of speculation LICM could be doing? I could not spot anything that would be applicable after an initial look.

I think the motivating case could indeed be handled by a set of existing passes, which was my first try. I think it would look roughly like the following:

1. run LICM to hoist out the invariant loads feeding the branch in the header,
2. run indvars to turn the check fed by the hoisted loads into an invariant check,
3. unswitch
4. LICM to hoist the now unconditional loads feeding the second branch in the unswitched loop,
5. run indvars to turn the second branch condition into an invariant.
6. unswitch

(there's a small caveat that this pass order does not catch the specific test case in `peel-multiple-unreachable-exits-for-vectorization.ll`, but the unreduced motivating std::vector::at example, which I just noticed)

The problem with that particular order is that it clashes with the existing order which is `LICM,Unswitch` in one loop pass manager, followed by `Indvars` and others in a separate loop pass manager. The passes are split up in different loop pass managers because we run function passes in between them (InstCombine & SimplifyCFG). Adjusting the pipeline seems like it would be quite a big shakeup and peeling off the first iteration seemed like a less invasive change and less work for the optimizations overall. Note that we would have to run `LICM,indvars,unswitch` once for each `std::vector::at` call/runtime check.

What do you think? I think eliminating the separation between the 2 loop pass managers would be beneficial in its own right, but SimplifyCFG and InstCombine seems like a substantial gap to bridge.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108114/new/

https://reviews.llvm.org/D108114