RFC: min/max/abs IR intrinsics

Sun Apr 26 13:25:00 PDT 2015

Hi James,

I consider this problem an InstCombine deficiency.
Consider the following patch:

diff --git a/lib/Transforms/InstCombine/InstCombineCompares.cpp
b/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 223bba0..4827686 100644
--- a/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -3970,6 +3970,19 @@ Instruction *InstCombiner::visitFCmpInst(FCmpInst
&I) {
     }
   }

+  // Test if the FCmpInst instruction is used exclusively by a select as
+  // part of a minimum or maximum operation. If so, refrain from doing
+  // any other folding. This helps out other analyses which understand
+  // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
+  // and CodeGen. And in this case, at least one of the comparison
+  // operands has at least one user besides the compare (the select),
+  // which would often largely negate the benefit of folding anyway.
+  if (I.hasOneUse())
+    if (SelectInst *SI = dyn_cast<SelectInst>(*I.user_begin()))
+      if ((SI->getOperand(1) == Op0 && SI->getOperand(2) == Op1) ||
+          (SI->getOperand(2) == Op0 && SI->getOperand(1) == Op1))
+        return nullptr;
+
   // Handle fcmp with constant RHS
   if (Constant *RHSC = dyn_cast<Constant>(Op1)) {
     if (Instruction *LHSI = dyn_cast<Instruction>(Op0))
diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp
b/lib/Transforms/InstCombine/InstructionCombining.cpp
index f62941f..6ebc8db 100644
--- a/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -714,6 +714,22 @@ Instruction
*InstCombiner::FoldOpIntoSelect(Instruction &Op, SelectInst *SI) {
         return nullptr;
     }

+    // Test if a CmpInst instruction is used exclusively by a select as
+    // part of a minimum or maximum operation. If so, refrain from doing
+    // any other folding. This helps out other analyses which understand
+    // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
+    // and CodeGen. And in this case, at least one of the comparison
+    // operands has at least one user besides the compare (the select),
+    // which would often largely negate the benefit of folding anyway.
+    if (auto *CI = dyn_cast<CmpInst>(SI->getCondition())) {
+      if (CI->hasOneUse()) {
+        Value *Op0 = CI->getOperand(0), *Op1 = CI->getOperand(1);
+        if ((SI->getOperand(1) == Op0 && SI->getOperand(2) == Op1) ||
+            (SI->getOperand(2) == Op0 && SI->getOperand(1) == Op1))
+          return nullptr;
+      }
+    }
+
     Value *SelectTrueVal = FoldOperationIntoSelectOperand(Op, TV, this);
     Value *SelectFalseVal = FoldOperationIntoSelectOperand(Op, FV, this);

With this patch in mind, running it on:
define i32 @f(float %x) {
  %1 = fcmp olt float %x, 0.000000e+00
  %2 = select i1 %1, float 0.000000e+00, float %x
  %3 = fptoui float %2 to i32
  ret i32 %3
}

results in no change.

On Sun, Apr 26, 2015 at 3:49 PM, James Molloy <james at jamesmolloy.co.uk>
wrote:

> Hi Philip,
>
> Thanks very much for reviewing my proposal. I should say that I generally
> agree with your points, and am still in multiple minds about what the best
> approach would look like.
>
> > I don't see the challenge here.  Matching a compare+select as a min/max
> for the purpose of the cost model under a target specific hook seems quite
> straightforward.  What am I missing?
> > ...
> > Er, not sure I get your point here.  Not having to match two distinct
> families of representation is an advantage, not a disadvantage.  The branch
> form should be getting converted into the select form much earlier in the
> optimizer.  Which cases are you worried about here?
>
> The awkward part is twofold. Firstly, the example I gave in my previous
> message where InstCombine mangles the pattern by pushing an fptoui between
> the icmp and the select. There's two approaches to this I see - first,
> match early and deal with the pattern as an intrinsic so it can't be broken
> up. Second, have a (flexible, deals with edge cases) way of matching at any
> stage in the pipeline - this might be ripping some of the matching logic
> out of InstCombine and exposing it as a utility. This might be a lose in
> compile time though.
>
> Secondly branches. You're right that we speculate branches to selects, but
> only to a small threshold. I upped the threshold so at least we'll get
> "min+min" or "max+min" to be speculated, but larger sequences such as
> "min+min+min" (folding four-input minimum) will still end up with a branch.
>
> I've been doing experiments with how much of the compiler I need to touch
> when inserting these intrinsics, and yes I do agree with you - random
> testing shows the compiler does need quite a bit of teaching, in
> InstCombine at least, to not regress randomly generated simple testcases.
> So I'm starting to come around more to your way of thinking.
>
> The only thing I haven't solved in my mind yet if we don't go down the
> intrinsics route is dealing with branches - we can't just keep upping the
> speculation threshold. Perhaps identify these (and only these) and up the
> speculation threshold in these cases only?
>
> Matching late in CGP would solve the sharing code across backends problem,
> so after your feedback I'm leaning further towards this approach.
>
> Do you have any further comments?
>
> Cheers,
>
> James
>
> On Sun, 26 Apr 2015 at 20:04 Philip Reames <listmail at philipreames.com>
> wrote:
>
>> On 04/23/2015 07:42 AM, James Molloy wrote:
>>
>> Hi all,
>>
>>  I've just started again on trying to solve the problem of getting
>> decent code generation for min, max and abs idioms as used by the
>> programmer and as emitted by the loop vectorizer.
>>
>>  I've been looking at doing this as a target DAGCombine, but actually I
>> think:
>>   1. it's incredibly complex to do at that stage and it limits all the
>> work I do to just one target.
>>   2. It's also much more difficult to test.
>>   3. The loop and SLP vectorizers still don't have a cost model for them
>> - they're just seen as compare+selects.
>>
>> I don't see the challenge here.  Matching a compare+select as a min/max
>> for the purpose of the cost model under a target specific hook seems quite
>> straightforward.  What am I missing?
>>
>>
>>  So my proposal is:
>>   * To add new intrinsics for minimum, maximum and absolute value. These
>> would have signed int/unsigned int/float variants and be valid across all
>> numeric types.
>>   * To add a pass fairly early in the pipeline to idiom recognize and
>> create intrinsics. This would be controllable per-backend - if a backend
>> doesn't have efficient lowering for these operations, perhaps it's best not
>> to do the idiom recognition.
>>
>> I am strongly opposed to this part of the proposal.  I have no problem*
>> adding such intrinsics and matching them late (i.e. CodeGenPrep), but I am
>> deeply concerned about the negative impacts of matching early.  Unless you
>> are volunteering to add support for these intrinsics to *every* pass, I
>> believe doing this is a non-starter.  As a good example, consider what
>> happened recently with the x.with.overflow intrinsics where we were missing
>> important simplifications on induction variable dependent checks due to
>> early canonicalization to a form that the rest of the optimizer didn't
>> understand.
>>
>> More generally, I'm not even sure matching these early would be the right
>> answer even if you were volunteering to update the entire optimizer.  Being
>> able to fold the condition (CSE, etc..) independently of the select and
>> then being able to exploit a dominating branch is extremely powerful at
>> eliminating the min/max operation entirely.  I would be deeply concerned
>> about giving up that power without an incredible compelling reason.
>>
>> * By "no problem", I really mean that I have no opinion here.  I am
>> neither endorsing nor opposing.
>>
>>
>>  The cost model would then fall out in the wash, because we already have
>> a cost model for intrinsics, it would be as simple as adding new
>> instructions.
>>
>>
>>  Because we idiom recognize at the IR stage instead of the SDAG stage,
>> we also wouldn't have to rely on the min/max idioms being in canonical
>> "select" form; we could match a branch sequence also.
>>
>> Er, not sure I get your point here.  Not having to match two distinct
>> families of representation is an advantage, not a disadvantage.  The branch
>> form should be getting converted into the select form much earlier in the
>> optimizer.  Which cases are you worried about here?
>>
>>
>>  What do you think? Is this an acceptable proposal?
>>
>>  Cheers,
>>
>>  James
>>
>>
>> _______________________________________________
>> llvm-commits mailing listllvm-commits at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150426/f17c4882/attachment.html>