[llvm] r368685 - [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible

Roman Lebedev via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 13 05:49:06 PDT 2019


Author: lebedevri
Date: Tue Aug 13 05:49:06 2019
New Revision: 368685

URL: http://llvm.org/viewvc/llvm-project?rev=368685&view=rev
Log:
[InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible

Summary:
This is rather unconventional..

As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.

As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.

We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,

But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.

For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S

Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].

Reviewers: spatel, nikic, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65530

Modified:
    llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
    llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h
    llvm/trunk/test/Transforms/InstCombine/canonicalize-clamp-with-select-of-constant-threshold-pattern.ll
    llvm/trunk/test/Transforms/InstCombine/xor-of-icmps-with-extra-uses.ll

Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp?rev=368685&r1=368684&r2=368685&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (original)
+++ llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Tue Aug 13 05:49:06 2019
@@ -2651,7 +2651,11 @@ static Instruction *foldXorToXor(BinaryO
   return nullptr;
 }
 
-Value *InstCombiner::foldXorOfICmps(ICmpInst *LHS, ICmpInst *RHS) {
+Value *InstCombiner::foldXorOfICmps(ICmpInst *LHS, ICmpInst *RHS,
+                                    BinaryOperator &I) {
+  assert(I.getOpcode() == Instruction::Xor && I.getOperand(0) == LHS &&
+         I.getOperand(1) == RHS && "Should be 'xor' with these operands");
+
   if (predicatesFoldable(LHS->getPredicate(), RHS->getPredicate())) {
     if (LHS->getOperand(0) == RHS->getOperand(1) &&
         LHS->getOperand(1) == RHS->getOperand(0))
@@ -2706,14 +2710,35 @@ Value *InstCombiner::foldXorOfICmps(ICmp
     // TODO: If OrICmp is false, the whole thing is false (InstSimplify?).
     if (Value *AndICmp = SimplifyBinOp(Instruction::And, LHS, RHS, SQ)) {
       // TODO: Independently handle cases where the 'and' side is a constant.
-      if (OrICmp == LHS && AndICmp == RHS && RHS->hasOneUse()) {
-        // (LHS | RHS) & !(LHS & RHS) --> LHS & !RHS
-        RHS->setPredicate(RHS->getInversePredicate());
-        return Builder.CreateAnd(LHS, RHS);
+      ICmpInst *X = nullptr, *Y = nullptr;
+      if (OrICmp == LHS && AndICmp == RHS) {
+        // (LHS | RHS) & !(LHS & RHS) --> LHS & !RHS  --> X & !Y
+        X = LHS;
+        Y = RHS;
       }
-      if (OrICmp == RHS && AndICmp == LHS && LHS->hasOneUse()) {
-        // !(LHS & RHS) & (LHS | RHS) --> !LHS & RHS
-        LHS->setPredicate(LHS->getInversePredicate());
+      if (OrICmp == RHS && AndICmp == LHS) {
+        // !(LHS & RHS) & (LHS | RHS) --> !LHS & RHS  --> !Y & X
+        X = RHS;
+        Y = LHS;
+      }
+      if (X && Y && (Y->hasOneUse() || canFreelyInvertAllUsersOf(Y, &I))) {
+        // Invert the predicate of 'Y', thus inverting its output.
+        Y->setPredicate(Y->getInversePredicate());
+        // So, are there other uses of Y?
+        if (!Y->hasOneUse()) {
+          // We need to adapt other uses of Y though. Get a value that matches
+          // the original value of Y before inversion. While this increases
+          // immediate instruction count, we have just ensured that all the
+          // users are freely-invertible, so that 'not' *will* get folded away.
+          BuilderTy::InsertPointGuard Guard(Builder);
+          // Set insertion point to right after the Y.
+          Builder.SetInsertPoint(Y->getParent(), ++(Y->getIterator()));
+          Value *NotY = Builder.CreateNot(Y, Y->getName() + ".not");
+          // Replace all uses of Y (excluding the one in NotY!) with NotY.
+          Y->replaceUsesWithIf(NotY,
+                               [NotY](Use &U) { return U.getUser() != NotY; });
+        }
+        // All done.
         return Builder.CreateAnd(LHS, RHS);
       }
     }
@@ -3038,7 +3063,7 @@ Instruction *InstCombiner::visitXor(Bina
 
   if (auto *LHS = dyn_cast<ICmpInst>(I.getOperand(0)))
     if (auto *RHS = dyn_cast<ICmpInst>(I.getOperand(1)))
-      if (Value *V = foldXorOfICmps(LHS, RHS))
+      if (Value *V = foldXorOfICmps(LHS, RHS, I))
         return replaceInstUsesWith(I, V);
 
   if (Instruction *CastedXor = foldCastedBitwiseLogic(I))

Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h?rev=368685&r1=368684&r2=368685&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h (original)
+++ llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h Tue Aug 13 05:49:06 2019
@@ -139,6 +139,8 @@ static inline Constant *SubOne(Constant
 /// This happens in cases where the ~ can be eliminated.  If WillInvertAllUses
 /// is true, work under the assumption that the caller intends to remove all
 /// uses of V and only keep uses of ~V.
+///
+/// See also: canFreelyInvertAllUsersOf()
 static inline bool IsFreeToInvert(Value *V, bool WillInvertAllUses) {
   // ~(~(X)) -> X.
   if (match(V, m_Not(m_Value())))
@@ -168,6 +170,32 @@ static inline bool IsFreeToInvert(Value
   return false;
 }
 
+/// Given i1 V, can every user of V be freely adapted if V is changed to !V ?
+///
+/// See also: IsFreeToInvert()
+static inline bool canFreelyInvertAllUsersOf(Value *V, Value *IgnoredUser) {
+  // Look at every user of V.
+  for (User *U : V->users()) {
+    if (U == IgnoredUser)
+      continue; // Don't consider this user.
+
+    auto *I = cast<Instruction>(U);
+    switch (I->getOpcode()) {
+    case Instruction::Select:
+    case Instruction::Br:
+      break; // Free to invert by swapping true/false values/destinations.
+    case Instruction::Xor: // Can invert 'xor' if it's a 'not', by ignoring it.
+      if (!match(I, m_Not(m_Value())))
+        return false; // Not a 'not'.
+      break;
+    default:
+      return false; // Don't know, likely not freely invertible.
+    }
+    // So far all users were free to invert...
+  }
+  return true; // Can freely invert all users!
+}
+
 /// Some binary operators require special handling to avoid poison and undefined
 /// behavior. If a constant vector has undef elements, replace those undefs with
 /// identity constants if possible because those are always safe to execute.
@@ -540,7 +568,7 @@ private:
 
   Value *foldAndOfICmps(ICmpInst *LHS, ICmpInst *RHS, Instruction &CxtI);
   Value *foldOrOfICmps(ICmpInst *LHS, ICmpInst *RHS, Instruction &CxtI);
-  Value *foldXorOfICmps(ICmpInst *LHS, ICmpInst *RHS);
+  Value *foldXorOfICmps(ICmpInst *LHS, ICmpInst *RHS, BinaryOperator &I);
 
   /// Optimize (fcmp)&(fcmp) or (fcmp)|(fcmp).
   /// NOTE: Unlike most of instcombine, this returns a Value which should

Modified: llvm/trunk/test/Transforms/InstCombine/canonicalize-clamp-with-select-of-constant-threshold-pattern.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/canonicalize-clamp-with-select-of-constant-threshold-pattern.ll?rev=368685&r1=368684&r2=368685&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/InstCombine/canonicalize-clamp-with-select-of-constant-threshold-pattern.ll (original)
+++ llvm/trunk/test/Transforms/InstCombine/canonicalize-clamp-with-select-of-constant-threshold-pattern.ll Tue Aug 13 05:49:06 2019
@@ -77,11 +77,11 @@ define i32 @t3_select_cond_or_v1(i32 %X)
 
 define i32 @t4_select_cond_xor_v0(i32 %X) {
 ; CHECK-LABEL: @t4_select_cond_xor_v0(
-; CHECK-NEXT:    [[NEED_TO_CLAMP_POSITIVE:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[DONT_NEED_TO_CLAMP_NEGATIVE:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[CLAMP_LIMIT:%.*]] = select i1 [[NEED_TO_CLAMP_POSITIVE]], i32 32767, i32 -32768
-; CHECK-NEXT:    [[DONT_NEED_TO_CLAMP:%.*]] = xor i1 [[NEED_TO_CLAMP_POSITIVE]], [[DONT_NEED_TO_CLAMP_NEGATIVE]]
-; CHECK-NEXT:    [[R:%.*]] = select i1 [[DONT_NEED_TO_CLAMP]], i32 [[X]], i32 [[CLAMP_LIMIT]]
+; CHECK-NEXT:    [[NEED_TO_CLAMP_POSITIVE:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    [[CLAMP_LIMIT:%.*]] = select i1 [[NEED_TO_CLAMP_POSITIVE]], i32 -32768, i32 32767
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    [[R:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 [[CLAMP_LIMIT]]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
   %need_to_clamp_positive = icmp sgt i32 %X, 32767
@@ -110,11 +110,11 @@ define i32 @t4_select_cond_xor_v1(i32 %X
 
 define i32 @t5_select_cond_xor_v2(i32 %X) {
 ; CHECK-LABEL: @t5_select_cond_xor_v2(
-; CHECK-NEXT:    [[DONT_NEED_TO_CLAMP_POSITIVE:%.*]] = icmp slt i32 [[X:%.*]], 32768
-; CHECK-NEXT:    [[NEED_TO_CLAMP_NEGATIVE:%.*]] = icmp slt i32 [[X]], -32767
-; CHECK-NEXT:    [[CLAMP_LIMIT:%.*]] = select i1 [[NEED_TO_CLAMP_NEGATIVE]], i32 -32768, i32 32767
-; CHECK-NEXT:    [[DONT_NEED_TO_CLAMP:%.*]] = xor i1 [[DONT_NEED_TO_CLAMP_POSITIVE]], [[NEED_TO_CLAMP_NEGATIVE]]
-; CHECK-NEXT:    [[R:%.*]] = select i1 [[DONT_NEED_TO_CLAMP]], i32 [[X]], i32 [[CLAMP_LIMIT]]
+; CHECK-NEXT:    [[NEED_TO_CLAMP_NEGATIVE:%.*]] = icmp sgt i32 [[X:%.*]], -32768
+; CHECK-NEXT:    [[CLAMP_LIMIT:%.*]] = select i1 [[NEED_TO_CLAMP_NEGATIVE]], i32 32767, i32 -32768
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    [[R:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 [[CLAMP_LIMIT]]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
   %dont_need_to_clamp_positive = icmp sle i32 %X, 32767

Modified: llvm/trunk/test/Transforms/InstCombine/xor-of-icmps-with-extra-uses.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/xor-of-icmps-with-extra-uses.ll?rev=368685&r1=368684&r2=368685&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/InstCombine/xor-of-icmps-with-extra-uses.ll (original)
+++ llvm/trunk/test/Transforms/InstCombine/xor-of-icmps-with-extra-uses.ll Tue Aug 13 05:49:06 2019
@@ -7,12 +7,12 @@
 ; %cond0 is extra-used in select, which is freely invertible.
 define i1 @v0_select_of_consts(i32 %X, i32* %selected) {
 ; CHECK-LABEL: @v0_select_of_consts(
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 32767, i32 -32768
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 -32768, i32 32767
 ; CHECK-NEXT:    store i32 [[SELECT]], i32* [[SELECTED:%.*]], align 4
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
   %cond0 = icmp sgt i32 %X, 32767
   %cond1 = icmp sgt i32 %X, -32768
@@ -23,12 +23,12 @@ define i1 @v0_select_of_consts(i32 %X, i
 }
 define i1 @v1_select_of_var_and_const(i32 %X, i32 %Y, i32* %selected) {
 ; CHECK-LABEL: @v1_select_of_var_and_const(
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 [[Y:%.*]], i32 -32768
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 -32768, i32 [[Y:%.*]]
 ; CHECK-NEXT:    store i32 [[SELECT]], i32* [[SELECTED:%.*]], align 4
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
   %cond0 = icmp sgt i32 %X, 32767
   %cond1 = icmp sgt i32 %X, -32768
@@ -39,12 +39,12 @@ define i1 @v1_select_of_var_and_const(i3
 }
 define i1 @v2_select_of_const_and_var(i32 %X, i32 %Y, i32* %selected) {
 ; CHECK-LABEL: @v2_select_of_const_and_var(
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 32767, i32 [[Y:%.*]]
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 [[Y:%.*]], i32 32767
 ; CHECK-NEXT:    store i32 [[SELECT]], i32* [[SELECTED:%.*]], align 4
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
   %cond0 = icmp sgt i32 %X, 32767
   %cond1 = icmp sgt i32 %X, -32768
@@ -58,9 +58,8 @@ define i1 @v2_select_of_const_and_var(i3
 define i1 @v3_branch(i32 %X, i32* %dst0, i32* %dst1) {
 ; CHECK-LABEL: @v3_branch(
 ; CHECK-NEXT:  begin:
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    br i1 [[COND0]], label [[BB0:%.*]], label [[BB1:%.*]]
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    br i1 [[COND0]], label [[BB1:%.*]], label [[BB0:%.*]]
 ; CHECK:       bb0:
 ; CHECK-NEXT:    store i32 0, i32* [[DST0:%.*]], align 4
 ; CHECK-NEXT:    br label [[END:%.*]]
@@ -68,8 +67,9 @@ define i1 @v3_branch(i32 %X, i32* %dst0,
 ; CHECK-NEXT:    store i32 0, i32* [[DST1:%.*]], align 4
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP0]]
 ;
 begin:
   %cond0 = icmp sgt i32 %X, 32767
@@ -89,12 +89,11 @@ end:
 ; Can invert 'not'.
 define i1 @v4_not_store(i32 %X, i1* %not_cond) {
 ; CHECK-LABEL: @v4_not_store(
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[NOT_COND0:%.*]] = xor i1 [[COND0]], true
-; CHECK-NEXT:    store i1 [[NOT_COND0]], i1* [[NOT_COND:%.*]], align 1
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    store i1 [[COND0]], i1* [[NOT_COND:%.*]], align 1
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
   %cond0 = icmp sgt i32 %X, 32767
   %not_cond0 = xor i1 %cond0, -1
@@ -108,14 +107,13 @@ define i1 @v4_not_store(i32 %X, i1* %not
 ; All extra uses are invertible.
 define i1 @v5_select_and_not(i32 %X, i32 %Y, i32* %selected, i1* %not_cond) {
 ; CHECK-LABEL: @v5_select_and_not(
-; CHECK-NEXT:    [[COND0:%.*]] = icmp sgt i32 [[X:%.*]], 32767
-; CHECK-NEXT:    [[COND1:%.*]] = icmp sgt i32 [[X]], -32768
-; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 32767, i32 [[Y:%.*]]
-; CHECK-NEXT:    [[NOT_COND0:%.*]] = xor i1 [[COND0]], true
-; CHECK-NEXT:    store i1 [[NOT_COND0]], i1* [[NOT_COND:%.*]], align 1
+; CHECK-NEXT:    [[COND0:%.*]] = icmp slt i32 [[X:%.*]], 32768
+; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND0]], i32 [[Y:%.*]], i32 32767
+; CHECK-NEXT:    store i1 [[COND0]], i1* [[NOT_COND:%.*]], align 1
 ; CHECK-NEXT:    store i32 [[SELECT]], i32* [[SELECTED:%.*]], align 4
-; CHECK-NEXT:    [[RES:%.*]] = xor i1 [[COND0]], [[COND1]]
-; CHECK-NEXT:    ret i1 [[RES]]
+; CHECK-NEXT:    [[X_OFF:%.*]] = add i32 [[X]], 32767
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X_OFF]], 65535
+; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
   %cond0 = icmp sgt i32 %X, 32767
   %cond1 = icmp sgt i32 %X, -32768




More information about the llvm-commits mailing list