RFC: min/max/abs IR intrinsics

Sun Apr 26 16:04:06 PDT 2015

On 04/26/2015 12:49 PM, James Molloy wrote:
> Hi Philip,
>
> Thanks very much for reviewing my proposal. I should say that I 
> generally agree with your points, and am still in multiple minds about 
> what the best approach would look like.
>
> > I don't see the challenge here.  Matching a compare+select as a 
> min/max for the purpose of the cost model under a target specific hook 
> seems quite straightforward.  What am I missing?
> > ...
> > Er, not sure I get your point here.  Not having to match two distinct 
> families of representation is an advantage, not a disadvantage.  The 
> branch form should be getting converted into the select form much 
> earlier in the optimizer.  Which cases are you worried about here?
>
> The awkward part is twofold. Firstly, the example I gave in my 
> previous message where InstCombine mangles the pattern by pushing an 
> fptoui between the icmp and the select. There's two approaches to this 
> I see - first, match early and deal with the pattern as an intrinsic 
> so it can't be broken up. Second, have a (flexible, deals with edge 
> cases) way of matching at any stage in the pipeline - this might be 
> ripping some of the matching logic out of InstCombine and exposing it 
> as a utility. This might be a lose in compile time though.
First, you're not going to catch every possible case in existence with 
late matching, but you'll hopefully catch enough.  There will always be 
cases you'll miss by matching late, but there are also cases you'd miss 
by matching early.

For your specific example, I sort of wonder where the transformation is 
actually a profitable one at all.  If you're doing a floating point 
compare, isn't that generally using a different set of flags than an 
integer compare?  I wonder if the domain crossing is more expensive for 
the flags than the floating point value.  If you could convert both the 
select and the compare to use the integer value that would be a clear 
win.  It's not clear to me that the current transform is.

Regardless, reversing that particular transform in CGP is a bit ugly, 
but reasonable.  Is it just a couple of forms you're worried about?  Or 
are there a large number of such patterns showing up in the real world 
examples you're looking at?
>
> Secondly branches. You're right that we speculate branches to selects, 
> but only to a small threshold. I upped the threshold so at least we'll 
> get "min+min" or "max+min" to be speculated, but larger sequences such 
> as "min+min+min" (folding four-input minimum) will still end up with a 
> branch.
I think it would be very reasonable to have a TTI call for whether the 
target supported a min/max instruction and then special casing the 
branch->select transformation when it does.  To say this differently, a 
min/max on a platform with such a (cheap!) instruction should always be 
expressed as a cmp/select.  There's no reason not to perform the 
transformation provided that the min/max has approximately the same cost 
as a mov.  (I say mov only because you might still need the cmp/br 
sequence even after handing the min/max operations.  If you didn't, it's 
even more clearly okay.)

Although, there's a catch in what I just said.  Given movs are generally 
handled via renaming, it's not clear that any min/max instruction can be 
that cheap.  How does that tradeoff work out for the architecture you 
care about?
>
> I've been doing experiments with how much of the compiler I need to 
> touch when inserting these intrinsics, and yes I do agree with you - 
> random testing shows the compiler does need quite a bit of teaching, 
> in InstCombine at least, to not regress randomly generated simple 
> testcases. So I'm starting to come around more to your way of thinking.
>
> The only thing I haven't solved in my mind yet if we don't go down the 
> intrinsics route is dealing with branches - we can't just keep upping 
> the speculation threshold. Perhaps identify these (and only these) and 
> up the speculation threshold in these cases only?
>
> Matching late in CGP would solve the sharing code across backends 
> problem, so after your feedback I'm leaning further towards this approach.
>
> Do you have any further comments?
>
> Cheers,
>
> James
>
> On Sun, 26 Apr 2015 at 20:04 Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>     On 04/23/2015 07:42 AM, James Molloy wrote:
>>     Hi all,
>>
>>     I've just started again on trying to solve the problem of getting
>>     decent code generation for min, max and abs idioms as used by the
>>     programmer and as emitted by the loop vectorizer.
>>
>>     I've been looking at doing this as a target DAGCombine, but
>>     actually I think:
>>       1. it's incredibly complex to do at that stage and it limits
>>     all the work I do to just one target.
>>       2. It's also much more difficult to test.
>>       3. The loop and SLP vectorizers still don't have a cost model
>>     for them - they're just seen as compare+selects.
>     I don't see the challenge here.  Matching a compare+select as a
>     min/max for the purpose of the cost model under a target specific
>     hook seems quite straightforward.  What am I missing?
>
>
>>     So my proposal is:
>>       * To add new intrinsics for minimum, maximum and absolute
>>     value. These would have signed int/unsigned int/float variants
>>     and be valid across all numeric types.
>>       * To add a pass fairly early in the pipeline to idiom recognize
>>     and create intrinsics. This would be controllable per-backend -
>>     if a backend doesn't have efficient lowering for these
>>     operations, perhaps it's best not to do the idiom recognition.
>     I am strongly opposed to this part of the proposal.  I have no
>     problem* adding such intrinsics and matching them late (i.e.
>     CodeGenPrep), but I am deeply concerned about the negative impacts
>     of matching early.  Unless you are volunteering to add support for
>     these intrinsics to *every* pass, I believe doing this is a
>     non-starter.  As a good example, consider what happened recently
>     with the x.with.overflow intrinsics where we were missing
>     important simplifications on induction variable dependent checks
>     due to early canonicalization to a form that the rest of the
>     optimizer didn't understand.
>
>     More generally, I'm not even sure matching these early would be
>     the right answer even if you were volunteering to update the
>     entire optimizer.  Being able to fold the condition (CSE, etc..)
>     independently of the select and then being able to exploit a
>     dominating branch is extremely powerful at eliminating the min/max
>     operation entirely.  I would be deeply concerned about giving up
>     that power without an incredible compelling reason.
>
>     * By "no problem", I really mean that I have no opinion here.  I
>     am neither endorsing nor opposing.
>>
>>     The cost model would then fall out in the wash, because we
>>     already have a cost model for intrinsics, it would be as simple
>>     as adding new instructions.
>
>>     Because we idiom recognize at the IR stage instead of the SDAG
>>     stage, we also wouldn't have to rely on the min/max idioms being
>>     in canonical "select" form; we could match a branch sequence also.
>     Er, not sure I get your point here.  Not having to match two
>     distinct families of representation is an advantage, not a
>     disadvantage.  The branch form should be getting converted into
>     the select form much earlier in the optimizer.  Which cases are
>     you worried about here?
>>
>>     What do you think? Is this an acceptable proposal?
>>
>>     Cheers,
>>
>>     James
>>
>>
>>     _______________________________________________
>>     llvm-commits mailing list
>>     llvm-commits at cs.uiuc.edu  <mailto:llvm-commits at cs.uiuc.edu>
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150426/6b2e54a2/attachment.html>