[llvm] r275989 - [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))

Wed Jul 20 09:06:42 PDT 2016

This test case was indeed enlightning, thank you. Here's what I found (you may also skip the details and directly jump to my conclusion below):

Detailed description
====================

In the provided test case we have the following:

```
 %tobool.i = icmp eq i8 %1, 0
 %2 = or i32 %shr.i.i.i, 1
 %3 = icmp eq i32 %2, 5
 %4 = select i1 %tobool.i, i1 %3, i1 true
 %frombool.i = zext i1 %4 to i8
```

At some point it would be transformed into this:

```
%2 = icmp eq i32 %1, 5
%not.tobool.i = icmp ne i8 %has_double.0.i, 0
%3 = or i1 %2, %not.tobool.i
%frombool.i = zext i1 %3 to i8
```

It now has the form `zext(or(icmp, icmp))`. In `visitZExt()` in InstCombineCasts.cpp this would run throgh the below code:

```
if (SrcI && SrcI->getOpcode() == Instruction::Or) {
  // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least one
  // of the (zext icmp) will be transformed.
  ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
  ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
  if (LHS && RHS && LHS->hasOneUse() && RHS->hasOneUse() &&
      (transformZExtICmp(LHS, CI, false) ||
       transformZExtICmp(RHS, CI, false))) {
    Value *LCast = Builder->CreateZExt(LHS, CI.getType(), LHS->getName());
    Value *RCast = Builder->CreateZExt(RHS, CI.getType(), RHS->getName());
    return BinaryOperator::Create(Instruction::Or, LCast, RCast);
  }
}
```

Now for `transformZExtICmp(%not.tobool.i, %frombool.i, false)` it would see that this `zext icmp` would be able to be eliminated and therefore it hoists the `zext` in front of the `icmp` instructions:

```
%2 = icmp eq i32 %1, 5
%3 = zext i1 %2 to i8
%not.tobool.i = icmp ne i8 %has_double.0.i, 0
%not.tobool.i1 = zext i1 %not.tobool.i to i8
%frombool.i = or i8 %3, %not.tobool.i1
```

Now, in `foldCastedBitwiseLogic()` we have a reverse transformation that tries to unfold expressions like this by sinking the zext after the `or` which could lead to an endless loop switching back and forth between these transformations -- which happens for the given case. As described in r276105, I try to avoid such an endless loop by checking in `shouldOptimizeCast()` if a `zext icmp` pair might possibly be the result of the above unfolding transformation. Therefore, `shouldOptimizeCast()` also uses `transformZExtIcmp()`. Now the interesting part is that in `shouldOptimizeCast()` we get a different result from `transformZExtICmp(%not.tobool.i, %not.tobool.i1, false)` as for the `transformZExtICmp(%not.tobool.i, %frombool.i, false)` from above, despite I originally thought they basically reference the same instructions. However, it seems that the instructions have slightly changed since the unfolding, leading to a different result in `transformZExtICmp`. This means, we now conclude in `foldCastedBitwiseLogic()` that we can safely fold the casts back to `zext(or(icmp, icmp))`. Later, it seems that this would again get unfolded in `visitZExt()`, etc., leading the an endless loop.

Conclusion
==========

To summarize this: For the given test case, I do not yet fully know the exact reason why our check cannot prevent this endless loop. However, it's more ore less apparent that we cannot guarantee after the transformation from `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` that these `zext(icmp)` are not changed in the meantime by another InstCombine optimization before they are passed to `foldCastedBitwiseLogic()`. That means, the results of `transformZExtICmp()` in `shouldOptimizeCast()` here might differ from those in `visitZExt()`. Now I see that the original check in `foldCastedBitwiseLogic()` which generally forbids `icmp` instructions makes a lot of sense.

However, what I do not yet understand is this check in `visitZExt()` that guards the unfolding transformation. Why is it restricted to `or` instructions only? I also wonder if the unfolding really valuable? - which, I admit, might be a naive question :) It is only tested in the `zext-or-icmp.ll`, and when I remove this unfolding it seems that it doesn't influence the other test cases (I ran all LLVM test cases via the check target).

Best,
Matthias

> Am 20.07.2016 um 14:13 schrieb Matthias Reisinger <matthias.j.reisinger at gmail.com>:
> 
> Thank for bringing this to our notice and for providing the test case, I’ll investigate this. I apologize for the inconvenience.
> 
> Best,
> Matthias
> 
>> Am 20.07.2016 um 14:04 schrieb Tobias Grosser <tobias at grosser.es>:
>> 
>> Thank you Benjamin. Sorry for the breakage.
>> 
>> Best,
>> Tobias
>> 
>> On Wed, Jul 20, 2016, at 01:47 PM, Benjamin Kramer via llvm-commits
>> wrote:
>>> This sends 'opt -instcombine' into an infinite loop on the attached
>>> test case, can you take a look? I reverted this change (and r276105)
>>> in r276106.
>>> 
>>> On Tue, Jul 19, 2016 at 6:39 PM, Tobias Grosser via llvm-commits
>>> <llvm-commits at lists.llvm.org> wrote:
>>>> Author: grosser
>>>> Date: Tue Jul 19 11:39:17 2016
>>>> New Revision: 275989
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=275989&view=rev
>>>> Log:
>>>> [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))
>>>> 
>>>> Summary:
>>>> Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:
>>>> 
>>>>> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.
>>>> 
>>>> Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:
>>>> 
>>>> `if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`
>>>> 
>>>> This check seems to sort out more cases than necessary because:
>>>> - the reverse transformation is obviously done for `or` instructions only
>>>> - and also not every `zext icmp` pair is necessarily the result of this reverse transformation
>>>> 
>>>> Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).
>>>> 
>>>> As an example, consider the following IR snippet
>>>> 
>>>> ```
>>>> %1 = icmp sgt i64 %a, %b
>>>> %2 = zext i1 %1 to i8
>>>> %3 = icmp slt i64 %a, %c
>>>> %4 = zext i1 %3 to i8
>>>> %5 = and i8 %2, %4
>>>> ```
>>>> 
>>>> which would now be transformed to
>>>> 
>>>> ```
>>>> %1 = icmp sgt i64 %a, %b
>>>> %2 = icmp slt i64 %a, %c
>>>> %3 = and i1 %1, %2
>>>> %4 = zext i1 %3 to i8
>>>> ```
>>>> 
>>>> This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.
>>>> 
>>>> Reviewers: grosser, vtjnash, majnemer
>>>> 
>>>> Subscribers: majnemer, llvm-commits
>>>> 
>>>> Differential Revision: https://reviews.llvm.org/D22511
>>>> 
>>>> Contributed-by: Matthias Reisinger
>>>> 
>>>> Modified:
>>>>    llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
>>>>    llvm/trunk/test/Transforms/InstCombine/zext.ll
>>>> 
>>>> Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp?rev=275989&r1=275988&r2=275989&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (original)
>>>> +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Tue Jul 19 11:39:17 2016
>>>> @@ -1212,6 +1212,13 @@ bool InstCombiner::shouldOptimizeCast(Ca
>>>>       isa<CmpInst>(CastSrc) && CI->getDestTy()->isVectorTy())
>>>>     return false;
>>>> 
>>>> +  // Don't optimize the cast if it is a (zext icmp) that can already be
>>>> +  // eliminated.
>>>> +  if (auto *ZExt = dyn_cast<ZExtInst>(CI))
>>>> +    if (auto *ICmp = dyn_cast<ICmpInst>(CastSrc))
>>>> +      if (transformZExtICmp(ICmp, *ZExt, false))
>>>> +        return false;
>>>> +
>>>>   return true;
>>>> }
>>>> 
>>>> @@ -1260,8 +1267,7 @@ Instruction *InstCombiner::foldCastedBit
>>>>   Value *Cast1Src = Cast1->getOperand(0);
>>>> 
>>>>   // fold logic(cast(A), cast(B)) -> cast(logic(A, B))
>>>> -  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) &&
>>>> -      shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
>>>> +  if (shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
>>>>     Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
>>>>                                         I.getName());
>>>>     return CastInst::Create(CastOpcode, NewOp, DestTy);
>>>> 
>>>> Modified: llvm/trunk/test/Transforms/InstCombine/zext.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/zext.ll?rev=275989&r1=275988&r2=275989&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/Transforms/InstCombine/zext.ll (original)
>>>> +++ llvm/trunk/test/Transforms/InstCombine/zext.ll Tue Jul 19 11:39:17 2016
>>>> @@ -73,3 +73,41 @@ define <2 x i64> @fold_xor_zext_sandwich
>>>>   ret <2 x i64> %zext2
>>>> }
>>>> 
>>>> +; Assert that zexts in logic(zext(icmp), zext(icmp)) can be folded
>>>> +; CHECK-LABEL: @fold_logic_zext_icmp(
>>>> +; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
>>>> +; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
>>>> +; CHECK-NEXT:    [[AND:%.*]] = and i1 [[ICMP1]], [[ICMP2]]
>>>> +; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[AND]] to i8
>>>> +; CHECK-NEXT:    ret i8 [[ZEXT]]
>>>> +define i8 @fold_logic_zext_icmp(i64 %a, i64 %b, i64 %c) {
>>>> +  %1 = icmp sgt i64 %a, %b
>>>> +  %2 = zext i1 %1 to i8
>>>> +  %3 = icmp slt i64 %a, %c
>>>> +  %4 = zext i1 %3 to i8
>>>> +  %5 = and i8 %2, %4
>>>> +  ret i8 %5
>>>> +}
>>>> +
>>>> +; Assert that zexts in logic(zext(icmp), zext(icmp)) are also folded accross
>>>> +; nested logical operators.
>>>> +; CHECK-LABEL: @fold_nested_logic_zext_icmp(
>>>> +; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
>>>> +; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
>>>> +; CHECK-NEXT:    [[AND:%.*]] = and i1 [[ICMP1]], [[ICMP2]]
>>>> +; CHECK-NEXT:    [[ICMP3:%.*]] = icmp eq i64 %a, %d
>>>> +; CHECK-NEXT:    [[OR:%.*]] = or i1 [[AND]], [[ICMP3]]
>>>> +; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[OR]] to i8
>>>> +; CHECK-NEXT:    ret i8 [[ZEXT]]
>>>> +define i8 @fold_nested_logic_zext_icmp(i64 %a, i64 %b, i64 %c, i64 %d) {
>>>> +  %1 = icmp sgt i64 %a, %b
>>>> +  %2 = zext i1 %1 to i8
>>>> +  %3 = icmp slt i64 %a, %c
>>>> +  %4 = zext i1 %3 to i8
>>>> +  %5 = and i8 %2, %4
>>>> +  %6 = icmp eq i64 %a, %d
>>>> +  %7 = zext i1 %6 to i8
>>>> +  %8 = or i8 %5, %7
>>>> +  ret i8 %8
>>>> +}
>>>> +
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>> Email had 1 attachment:
>>> + bugpoint-reduced-simplified.ll
>>>  3k (application/octet-stream)
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160720/e9bd4e78/attachment.html>