RFC: min/max/abs IR intrinsics

Wed Apr 29 10:33:43 PDT 2015

Hi Chandler,

I also like David's suggestion, and took it as a given that it was a good
one, regardless of where/if we match into intrinsics.

Owen, it looks like 2 to 1 so far, can we get any nearer consensus? I'm
annoyingly on the fence.

Cheers,

James

On Tue, 28 Apr 2015 at 18:46 Chandler Carruth <chandlerc at gmail.com> wrote:

> FWIW, I agree that indexed, strided, and masked loads are a pretty wildly
> different set of problems from this one.
>
> I'm very sympathetic to the position that we need a canonical
> representation of min and max idioms. That seems really good and important
> to me. Programmers write these and the canonicalization should be working
> to *expose* them to the rest of the optimizer, not hide them.
>
> But I feel like the original select was a fine canonical form for a min or
> a max. The problem is that we're then "fixing" it in instcombine by
> changing its type and hoisting all manner of other nonsense into it.
>
> I actually really like David Majnemer's suggestion to change how
> instcombine canonicalizes selects (in this case floating point selects, but
> potentially others) to better match the needs of the rest of the optimizer
> and the code generator. But I've not seen any real discussion of that on
> the thread...
>
> -Chandler
> On Tue, Apr 28, 2015 at 10:25 AM Owen Anderson <resistor at mac.com> wrote:
>
>> Hi Renato,
>>
>> I actually think the discussion about indexed/strided/masked loads is
>> completely different from this, in that indexed/strided/masked loads are
>> fundamentally a low-level, hardware-influenced representation, and as such
>> any discussion about it is fundamentally a lowering discussion.  What we
>> are discussing here WRT min/max/neg/abs is about constructs that are
>> present from the user’s source in many programming languages, and LLVM is
>> today discarding that information in a way that inhibits the direct
>> matching of those user constructs to their direct hardware implementations.
>>
>> This seems like a completely straight forward canonicalization situation
>> to me, and fits directly into the “ascending/descending” model of
>> canonicalization and lowering: during the early stages of compilation we
>> unify constructs in the IR towards an abstracted, minimally redundant
>> form.  Then, critically, at some point (traditionally when we enter
>> SelectionDAG, though it has been moving earlier for the last couple of
>> years) we have reached “peak” canonicalization and begin breaking the
>> canonical form in favor of target-optimized or target-specific constructs.
>>
>> —Owen
>>
>> > On Apr 28, 2015, at 9:31 AM, Renato Golin <renato.golin at linaro.org>
>> wrote:
>> >
>> > On 28 April 2015 at 16:53, James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>> >>  * Philip Reames favours late matching, where we create intrinsics
>> late in
>> >> the optimization pipeline (CodeGenPrepare) and use "select" as the
>> canonical
>> >> form up till that point.
>> >>  * Owen Anderson favours early matching, using min/max intrinsics as
>> the
>> >> canonical form through most of the compiler.
>> >>
>> >> Consensus hasn't yet been reached. Thoughts?
>> >
>> > Hi James,
>> >
>> > A similar discussion spawned regarding indexed / strided / masked
>> > memory access and the risks are the same:
>> >
>> > * Early matching hardens the IR, stopping a lot of optimisations working
>> > * Late matching allows for scrambled IR (due to unaware
>> > optimisations), and destroy patterns
>> >
>> > Each one is horrible in their own right, but I'll side with Philip in
>> > this one, in the same way I think Chandler was right about doing more
>> > to match complex memory accesses in pure IR, even if the patterns do
>> > get more complex. My reasons are two fold:
>> >
>> > 1. I'll repeat Philip's words: Where do we stop? How many intrinsics
>> > are we going to add to the IR until every optimisation pass becomes a
>> > huge switch with all possible variations? This was the original design
>> > decision behind not implementing every NEON intrinsic as a builtin
>> > node, and I still believe Bob Wilson was right back then. It did
>> > generate better code.
>> >
>> > 2. It's easier to fix the passes that destroy data, even if there are
>> > many of them, than to add all builtins to all passes in order to
>> > understand IR. I agree, doing so doesn't scale well, especially if you
>> > move to a dynamic execution of passes (if the pass manager ever
>> > supports that), but the alternative doesn't scale at all. It's
>> > polynomial vs. exponential. Both are bad, but exponential is worse.
>> >
>> > In the end, for the strided loads, Hao decided to try out plain IR,
>> > shuffles and loads/stores. Elena will try too, for masked and indexed
>> > loads, and only as a last resort, we'll add those intrinsics. There
>> > were some added, and if possible, we should remove them if we succeed
>> > in matching enough patters with just IR.
>> >
>> > I think we should do the same in this case.
>> >
>> > cheers,
>> > --renato
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150429/b3521b78/attachment.html>