[PATCH] D134803: [LoopPeeling] Support peeling loops with non-latch exits

Tue Oct 25 09:47:29 PDT 2022

tra added a comment.

In D134803#3881760 <https://reviews.llvm.org/D134803#3881760>, @nikic wrote:

> In D134803#3869970 <https://reviews.llvm.org/D134803#3869970>, @tra wrote:
>
>> As I suspected, the hanging code is ~2.5MB of unoptimized IR, reduced to ~1MB after optimizations. The IR is thread-sync heavy, and it's hard to tell if anything is obviously wrong after it gets further massaged by ptxas into GPU instructions. :-(
>
> In this test case, is there only a single additional loop being peeled? If there are multiple, is it possible to localize which one is the problematic one? Can you identify which of the removed conditions in canPeel() are the ones that are relevant? Does the problematic loop contain convergent instructions as speculated before?

The bug triggers in IR generated at runtime for a fairly convoluted computation on sparse matrices. Attempts to reduce the test case result in substantially different generated IR and make the issue disappear.

To answer your questions:

- There are multiple loops.
- It's hard-to-imposible to tell which loop causes the problem, due to the test nature making it hard to precisely control generated IR. It's also not feasible to detect the issue in the generated SASS by eyeballing it. I did manage to do it on few occasions in the past, but it was on a much smaller code.
- Hence I can't tell where things went wrong.
- The code does have a lot of calls to `@llvm.nvvm.barrier0` intrinsic which does have convergent attribute and does need control flow to remain 'structured', so that the GPU code could be correctly instrumented to re-converge after diverged conditional branches.

With D136643 <https://reviews.llvm.org/D136643> we should be able to observe what exactly changes in the IR in that test and that may narrow down the scope to a subset of the loops involved.

> Just some suggestions on how the problem could be narrowed a bit further.
>
>> Would it be possible to introduce a cc1 option disabling this kind of peeling, so we have some sort of workaround until we figure out a better way to deal with the issue?
>
> For the record, D136643 <https://reviews.llvm.org/D136643> has the opt option (I assume that's what you meant, a cc1 option would be unusual). I think that patch is too complicated as-is, but generally having a temporary option is fine.

This option is a workaround, not intended for general use. It needs to be a CC1 option as we need to pass it to a GPU cc1 sub-compilation, but we do want your code to work as is during the host compilation. And yes, the patch is probably one of the more complicated on/off knobs I've seen. :-) If there's an easier way to control things, I'd be all for it, but I'll take whatever works w/o disrupting other LLVM users.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134803/new/

https://reviews.llvm.org/D134803