RFC: min/max/abs IR intrinsics

James Molloy james at jamesmolloy.co.uk
Sun Apr 26 12:49:17 PDT 2015


Hi Philip,

Thanks very much for reviewing my proposal. I should say that I generally
agree with your points, and am still in multiple minds about what the best
approach would look like.

> I don't see the challenge here.  Matching a compare+select as a min/max
for the purpose of the cost model under a target specific hook seems quite
straightforward.  What am I missing?
> ...
> Er, not sure I get your point here.  Not having to match two distinct
families of representation is an advantage, not a disadvantage.  The branch
form should be getting converted into the select form much earlier in the
optimizer.  Which cases are you worried about here?

The awkward part is twofold. Firstly, the example I gave in my previous
message where InstCombine mangles the pattern by pushing an fptoui between
the icmp and the select. There's two approaches to this I see - first,
match early and deal with the pattern as an intrinsic so it can't be broken
up. Second, have a (flexible, deals with edge cases) way of matching at any
stage in the pipeline - this might be ripping some of the matching logic
out of InstCombine and exposing it as a utility. This might be a lose in
compile time though.

Secondly branches. You're right that we speculate branches to selects, but
only to a small threshold. I upped the threshold so at least we'll get
"min+min" or "max+min" to be speculated, but larger sequences such as
"min+min+min" (folding four-input minimum) will still end up with a branch.

I've been doing experiments with how much of the compiler I need to touch
when inserting these intrinsics, and yes I do agree with you - random
testing shows the compiler does need quite a bit of teaching, in
InstCombine at least, to not regress randomly generated simple testcases.
So I'm starting to come around more to your way of thinking.

The only thing I haven't solved in my mind yet if we don't go down the
intrinsics route is dealing with branches - we can't just keep upping the
speculation threshold. Perhaps identify these (and only these) and up the
speculation threshold in these cases only?

Matching late in CGP would solve the sharing code across backends problem,
so after your feedback I'm leaning further towards this approach.

Do you have any further comments?

Cheers,

James

On Sun, 26 Apr 2015 at 20:04 Philip Reames <listmail at philipreames.com>
wrote:

> On 04/23/2015 07:42 AM, James Molloy wrote:
>
> Hi all,
>
>  I've just started again on trying to solve the problem of getting decent
> code generation for min, max and abs idioms as used by the programmer and
> as emitted by the loop vectorizer.
>
>  I've been looking at doing this as a target DAGCombine, but actually I
> think:
>   1. it's incredibly complex to do at that stage and it limits all the
> work I do to just one target.
>   2. It's also much more difficult to test.
>   3. The loop and SLP vectorizers still don't have a cost model for them -
> they're just seen as compare+selects.
>
> I don't see the challenge here.  Matching a compare+select as a min/max
> for the purpose of the cost model under a target specific hook seems quite
> straightforward.  What am I missing?
>
>
>  So my proposal is:
>   * To add new intrinsics for minimum, maximum and absolute value. These
> would have signed int/unsigned int/float variants and be valid across all
> numeric types.
>   * To add a pass fairly early in the pipeline to idiom recognize and
> create intrinsics. This would be controllable per-backend - if a backend
> doesn't have efficient lowering for these operations, perhaps it's best not
> to do the idiom recognition.
>
> I am strongly opposed to this part of the proposal.  I have no problem*
> adding such intrinsics and matching them late (i.e. CodeGenPrep), but I am
> deeply concerned about the negative impacts of matching early.  Unless you
> are volunteering to add support for these intrinsics to *every* pass, I
> believe doing this is a non-starter.  As a good example, consider what
> happened recently with the x.with.overflow intrinsics where we were missing
> important simplifications on induction variable dependent checks due to
> early canonicalization to a form that the rest of the optimizer didn't
> understand.
>
> More generally, I'm not even sure matching these early would be the right
> answer even if you were volunteering to update the entire optimizer.  Being
> able to fold the condition (CSE, etc..) independently of the select and
> then being able to exploit a dominating branch is extremely powerful at
> eliminating the min/max operation entirely.  I would be deeply concerned
> about giving up that power without an incredible compelling reason.
>
> * By "no problem", I really mean that I have no opinion here.  I am
> neither endorsing nor opposing.
>
>
>  The cost model would then fall out in the wash, because we already have
> a cost model for intrinsics, it would be as simple as adding new
> instructions.
>
>
>  Because we idiom recognize at the IR stage instead of the SDAG stage, we
> also wouldn't have to rely on the min/max idioms being in canonical
> "select" form; we could match a branch sequence also.
>
> Er, not sure I get your point here.  Not having to match two distinct
> families of representation is an advantage, not a disadvantage.  The branch
> form should be getting converted into the select form much earlier in the
> optimizer.  Which cases are you worried about here?
>
>
>  What do you think? Is this an acceptable proposal?
>
>  Cheers,
>
>  James
>
>
> _______________________________________________
> llvm-commits mailing listllvm-commits at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150426/2c35e286/attachment.html>


More information about the llvm-commits mailing list