[PATCH] D22511: [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))

Tue Jul 19 06:49:01 PDT 2016

mreisinger created this revision.
mreisinger added reviewers: grosser, vtjnash.
mreisinger added a subscriber: llvm-commits.

Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:

> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.

Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:

`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`

This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation

Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).

As an example, consider the following IR snippet

```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```

which would now be transformed to

```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```

This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which would for example hinder ScalarEvolution and Polly.

https://reviews.llvm.org/D22511

Files:
  lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
  test/Transforms/InstCombine/zext.ll

Index: test/Transforms/InstCombine/zext.ll
===================================================================

--- test/Transforms/InstCombine/zext.ll
+++ test/Transforms/InstCombine/zext.ll
@@ -73,3 +73,41 @@
   ret <2 x i64> %zext2
 }
 
+; Assert that zexts in logic(zext(icmp), zext(icmp)) can be folded
+; CHECK-LABEL: @fold_logic_zext_icmp(
+; CHECK-NEXT:    %1 = icmp sgt i64 %a, %b
+; CHECK-NEXT:    %2 = icmp slt i64 %a, %c
+; CHECK-NEXT:    %3 = and i1 %1, %2
+; CHECK-NEXT:    %4 = zext i1 %3 to i8
+; CHECK-NEXT:    ret i8 %4
+define i8 @fold_logic_zext_icmp(i64 %a, i64 %b, i64 %c) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = and i8 %2, %4
+  ret i8 %5
+}
+
+; Assert that zexts in logic(zext(icmp), zext(icmp)) are also folded accross
+; nested logical operators.
+; CHECK-LABEL: @fold_nested_logic_zext_icmp(
+; CHECK-NEXT:    %1 = icmp sgt i64 %a, %b
+; CHECK-NEXT:    %2 = icmp slt i64 %a, %c
+; CHECK-NEXT:    %3 = and i1 %1, %2
+; CHECK-NEXT:    %4 = icmp eq i64 %a, %d
+; CHECK-NEXT:    %5 = or i1 %3, %4
+; CHECK-NEXT:    %6 = zext i1 %5 to i8
+; CHECK-NEXT:    ret i8 %6
+define i8 @fold_nested_logic_zext_icmp(i64 %a, i64 %b, i64 %c, i64 %d) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = and i8 %2, %4
+  %6 = icmp eq i64 %a, %d
+  %7 = zext i1 %6 to i8
+  %8 = or i8 %5, %7
+  ret i8 %8
+}
+
Index: lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
===================================================================
--- lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -1212,6 +1212,13 @@
       isa<CmpInst>(CastSrc) && CI->getDestTy()->isVectorTy())
     return false;
 
+  // Don't optimize the cast if it is a (zext icmp) that can already be
+  // eliminated.
+  if (auto *ZExt = dyn_cast<ZExtInst>(CI))
+    if (auto *ICmp = dyn_cast<ICmpInst>(CastSrc))
+      if (transformZExtICmp(ICmp, *ZExt, false))
+        return false;
+
   return true;
 }
 
@@ -1260,8 +1267,7 @@
   Value *Cast1Src = Cast1->getOperand(0);
 
   // fold logic(cast(A), cast(B)) -> cast(logic(A, B))
-  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) &&
-      shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
+  if (shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
     Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
                                         I.getName());
     return CastInst::Create(CastOpcode, NewOp, DestTy);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D22511.64482.patch
Type: text/x-patch
Size: 2583 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160719/12287c5d/attachment.bin>