[PATCH] D135451: [TTI] New PPC target hook enableUncondDivisionSpeculation

Tue Oct 11 08:04:40 PDT 2022

bmahjour added a comment.

In D135451#3844084 <https://reviews.llvm.org/D135451#3844084>, @efriedma wrote:

>> As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86).
>
> The LLVM IR rule is more driven by the behavior of the instructions on various targets. If a target only has a trapping divide, we'd need to wrap it in control flow to implement a non-trapping divide.  And particularly for signed divide, that check isn't cheap. We tend to prefer poison where it makes sense (for example, out-of-bounds shifts).
>
> Frontends can always use control flow to get whatever user-visible behavior they want.  (For example, the Rust divide operator panics if you divide by zero.)
>
>> To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI)
>
> We try to avoid making core IR semantics depend on TTI.  Not that we can completely ignore target differences when writing IR optimizations, but we want to keep IR understandable without reference to target-specific semantics.
>
> I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle".  But I don't want to go there.  If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

I'm sorry but I don't quite follow the logic that tries to justify the current design. On the one hand the IR rules are driven by target requirements, and at the same time you avoid making IR semantics dependent on TTI (which is the proper way to query about target requirements). They seem like recipes for the exact problem we are dealing with here, ie there are target-specific assumptions baked into the IR that are not configurable.

>>> In the original example, instead of trying to make the divide hoistable, you could teach LLVM to peel the first iteration of the loop, then CSE the divide.
>>
>> I don't think that would work in general, for example if the loop had unknown bounds, because peeling in such cases would still require the peeled iteration to be conditionally executed.
>
> Any loop can be peeled (as long as the body doesn't contain some exotic construct that inhibits cloning); it's basically just cloning the loop body.  And if the divide dominates the latch before peeling, it will dominate the peeled loop after peeling.
>
> The "general" problem is really the case where peeling is too expensive.

consider this loop:

  for (i = 0; i < n; i++) {
    v += x/y;
  }

after peeling:

  if (n > 0) {
    v += x/y;
  }
  for (i = 1; i < n; i++) {
    v += x/y;
  }

The peeled divide that is guarded by the `if (n > 0)` conditional does not dominate the divide that's in the loop body. Even if we try to consider control flow equivalence between that guard and the loop guard, there could still be cases where the dominance cannot be safely determined (eg. non-affine loops).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135451/new/

https://reviews.llvm.org/D135451