[llvm-dev] Remove zext-unfolding from InstCombine

Thu Jul 21 09:51:41 PDT 2016

Hi Matthias,

I've been fixing similar problems recently (eg,
https://llvm.org/bugs/show_bug.cgi?id=28476 ). I think your example (the
test in zext-or-icmp.ll) is another missed application of DeMorgan's Law:

char foo(char a, char b) {
  return (((a & 1) == 0) | (b == 0));
}

char goo(char a, char b) {
  return !(((a & 1) != 0) & (b != 0));
}

Make sure I didn't botch that translation: the 1st is the C equivalent of
zext-or-icmp.ll, and the 2nd is what we really want it to be optimized to
instead of what's shown in that test case.

define i8 @foo(i8 %a, i8 %b) {
  %and = and i8 %a, 1
  %cmp = icmp eq i8 %b, 0
  %xor = xor i8 %and, 1
  %zext = zext i1 %cmp to i8
  %or = or i8 %zext, %xor
  ret i8 %or
}

define i8 @goo(i8 %a, i8 %b) {
  %cmp = icmp ne i8 %b, 0
  %zext = zext i1 %cmp to i8
  %and = and i8 %zext, %a
  %xor = xor i8 %and, 1
  ret i8 %xor
}

The 2nd function is the better IR (one less logic op), so you need to add a
transform to produce that and get rid of the existing limitation. Any
InstCombine folds that rely on bypassing an earlier fold are wrong IMO.

On Thu, Jul 21, 2016 at 6:07 AM, Matthias Reisinger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> I have a question regarding a transformation that is carried out in
> InstCombine, which has been introduced by r48715. It unfolds expressions of
> the form `zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to
> expose pairs of `zext(icmp)`. In a subsequent iteration these `zext(icmp)`
> pairs could then (possibly) be optimized by another optimization (which has
> already been there before r48715), to replace them by bitwise or integer
> operations. The code in question is located in `visitZExt()` in
> InstCombineCasts.cpp:
>
> ```
> if (ICmpInst *ICI = dyn_cast<ICmpInst>(Src))
>   return transformZExtICmp(ICI, CI);
>
> BinaryOperator *SrcI = dyn_cast<BinaryOperator>(Src);
> if (SrcI && SrcI->getOpcode() == Instruction::Or) {
>   // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least one
>   // of the (zext icmp) will be transformed.
>   ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
>   ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
>   if (LHS && RHS && LHS->hasOneUse() && RHS->hasOneUse() &&
>       (transformZExtICmp(LHS, CI, false) ||
>        transformZExtICmp(RHS, CI, false))) {
>     Value *LCast = Builder->CreateZExt(LHS, CI.getType(), LHS->getName());
>     Value *RCast = Builder->CreateZExt(RHS, CI.getType(), RHS->getName());
>     return BinaryOperator::Create(Instruction::Or, LCast, RCast);
>   }
> }
> ```
>
> The first two lines do the actual `zext(icmp)` optimization. The remaining
> lines have been added by r48715 and do the mentioned unfolding. However, in
> `foldCastedBitwiseLogic()` there happens a reverse of the above unfolding,
> which transforms `logic(cast(A), cast(B))` to `cast(logic(A, B))`:
>
> ```
> if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) &&
>     shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
>   Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
>                                       I.getName());
>   return CastInst::Create(CastOpcode, NewOp, DestTy);
> }
> ```
>
> The `(!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))` part has been
> added by r48715 in order to ensure that InstCombine does not enter an
> endless loop, switching back and forth between the two transformations.
> There was also added an InstCombine test case `zext-or-icmp.ll` that would
> enter such an endless loop, if this check would not be present. However, it
> also generally permits `foldCastedBitwiseLogic()` from folding operations
> with two icmps as operands. Something like
>
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = zext i1 %1 to i8
> %3 = icmp slt i64 %a, %c
> %4 = zext i1 %3 to i8
> %5 = and i8 %2, %4
> ```
>
> would not be able to be transformed, even though the contained `zext icmp`
> pairs will also not be able to be optimized by `transformZExtICmp()` above,
> and the casts will survive. Therefore, in r275989 (
> https://reviews.llvm.org/D22511) I tried to narrow the check in
> `foldCastedBitwiseLogic()` to allow the folding of casts in the presence of
> icmps. For the above example this would result in:
>
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = icmp slt i64 %a, %c
> %3 = and i1 %1, %2
> %4 = zext i1 %3 to i8
> ```
>
> Despite `zext-or-icmp.ll` did not enter an endless loop with my change, it
> later turned out the assumptions that I took in r275989 were too optimistic
> and endless loops might still occur, which has been pointed out by Benjamin
> Kramer in
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html
> and it has therefore been reverted.
>
> I apologize in advance for this possibly naive question, but I was
> wondering whether it is feasible to remove the unfolding introduced by
> r48715 in order to enable a cast-folding like on the example above. I've
> also already tested this and run all of LLVM's test cases via the `make
> check` target and it seems that the only test case relies on this unfolding
> is the mentioned `zext-or-icmp.ll`. I don't know if Evan Chang is still
> responsible for this code region, but since he is the original author of
> the commit, I thought it would be safe to include him here (I hope it's the
> right email address).
>
> Thank you in advance for your thoughts on this!
>
> Best regards,
> Matthias
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/37232269/attachment-0001.html>