[PATCH] D134803: [LoopPeeling] Support peeling loops with non-latch exits

Wed Oct 19 01:57:10 PDT 2022

nhaehnle added a comment.

The common issue with duplication of convergent instructions is when two threads that previously would execute the same static convergent instruction end up executing different static convergent instructions after a transform. This is clearly not the case with loop peeling, so it really ought to be okay.

That said, there's the underlying meta-problem that we haven't yet adopted a good definition of `convergent`, meaning that given a piece of LLVM IR, it is not possible to tell which threads of a group are supposed to communicate in a given convergent instruction. It is entirely possible that there is some implicit assumption in the code about how convergent instructions behave that happens to get broken by a downstream transform now that the post-loop-peel CFG is different.

One interesting observation is that if we take the current definition of `convergent` as gospel (which we really shouldn't, but for the sake of the argument), then loop peeling with convergent ops is actually forbidden, and peeling loops with additional exits may make the problem worse. Specifically:

  preheader:
    br loop
  loop:
    convergentop
    br i1 cc1, exit, latch
  latch:
    br i1 cc2, loop, exit
  exit:

Peeling once results in:

    convergentop'
    br i1 cc1', exit, latch.peel     <-- X
  latch.peel:
    br i1 cc2, preheader, exit    <-- X
  preheader:
    br loop
  loop:
    convergentop
    br i1 cc1, exit, latch
  latch:
    br i1 cc2, loop, exit
  exit:

The `convergentop` inside the loop now has the two branches marked X as additional control dependencies, which is forbidden by current LangRef. It is possible that this wasn't an issue in the past because the peeled latch branch is most likely constant anyway and gets optimized away, and now there's some new loop that gets peeled with a secondary exit whose condition isn't constant.

Keep in mind that this is all speculation, and personally, I find it bizarre. The definition of convergent that I would prefer doesn't have this issue, but it may well be that something in the downstream PTX flow is sensitive to this somehow today. It would be good to properly root-cause the issue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134803/new/

https://reviews.llvm.org/D134803