RFC: min/max/abs IR intrinsics

Wed May 6 03:24:19 PDT 2015

Hi,

Sorry for the late followup, I've been on vacation. David's patch certainly
makes matching mins and maxs easier. We should apply it no matter what we
do next.

For next steps, we could either apply Gerolf's patch for AArch64 and
replicate it for other architectures if that makes sense (as well as making
it handle VSELECTs of floating point types), or we could introduce
[SU][MIN|MAX] nodes at the target-independent DAG level as a TLI hook
opt-in, and have them produced by the SDAG builder.

I'd probably push for the latter, as it sounds the nicer all-round solution
and can work for floating point too. What do others think?

Cheers,

James

On Thu, 30 Apr 2015 at 17:21 James Molloy <james at jamesmolloy.co.uk> wrote:

> Hi,
>
> I still need to do a bit more followup, running all my testcases through a
> compiler with David's patch. I'll see if there still exists an example that
> shows that matching min/max sequences is any harder than instruction
> selection in general. Hopefully there won't be, in which case all this RFC
> folds down into is "apply David's patch and improve ISel patterns a bit".
> I'd be really happy if that happens :)
>
> Cheers,
>
> James
>
> On Wed, 29 Apr 2015 at 22:47 Owen Anderson <resistor at mac.com> wrote:
>
>> I actually don’t object to David’s suggestion, but in that case I don’t
>> see the need for min/max intrinsics.  All of my objections were predicated
>> on those existing.
>>
>> —Owen
>>
>>
>>
>> On Apr 29, 2015, at 10:33 AM, James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>> Hi Chandler,
>>
>> I also like David's suggestion, and took it as a given that it was a good
>> one, regardless of where/if we match into intrinsics.
>>
>> Owen, it looks like 2 to 1 so far, can we get any nearer consensus? I'm
>> annoyingly on the fence.
>>
>> Cheers,
>>
>> James
>>
>> On Tue, 28 Apr 2015 at 18:46 Chandler Carruth <chandlerc at gmail.com>
>> wrote:
>>
>>> FWIW, I agree that indexed, strided, and masked loads are a pretty
>>> wildly different set of problems from this one.
>>>
>>> I'm very sympathetic to the position that we need a canonical
>>> representation of min and max idioms. That seems really good and important
>>> to me. Programmers write these and the canonicalization should be working
>>> to *expose* them to the rest of the optimizer, not hide them.
>>>
>>> But I feel like the original select was a fine canonical form for a min
>>> or a max. The problem is that we're then "fixing" it in instcombine by
>>> changing its type and hoisting all manner of other nonsense into it.
>>>
>>> I actually really like David Majnemer's suggestion to change how
>>> instcombine canonicalizes selects (in this case floating point selects, but
>>> potentially others) to better match the needs of the rest of the optimizer
>>> and the code generator. But I've not seen any real discussion of that on
>>> the thread...
>>>
>>> -Chandler
>>> On Tue, Apr 28, 2015 at 10:25 AM Owen Anderson <resistor at mac.com> wrote:
>>>
>>>> Hi Renato,
>>>>
>>>> I actually think the discussion about indexed/strided/masked loads is
>>>> completely different from this, in that indexed/strided/masked loads are
>>>> fundamentally a low-level, hardware-influenced representation, and as such
>>>> any discussion about it is fundamentally a lowering discussion.  What we
>>>> are discussing here WRT min/max/neg/abs is about constructs that are
>>>> present from the user’s source in many programming languages, and LLVM is
>>>> today discarding that information in a way that inhibits the direct
>>>> matching of those user constructs to their direct hardware implementations.
>>>>
>>>> This seems like a completely straight forward canonicalization
>>>> situation to me, and fits directly into the “ascending/descending” model of
>>>> canonicalization and lowering: during the early stages of compilation we
>>>> unify constructs in the IR towards an abstracted, minimally redundant
>>>> form.  Then, critically, at some point (traditionally when we enter
>>>> SelectionDAG, though it has been moving earlier for the last couple of
>>>> years) we have reached “peak” canonicalization and begin breaking the
>>>> canonical form in favor of target-optimized or target-specific constructs.
>>>>
>>>> —Owen
>>>>
>>>> > On Apr 28, 2015, at 9:31 AM, Renato Golin <renato.golin at linaro.org>
>>>> wrote:
>>>> >
>>>> > On 28 April 2015 at 16:53, James Molloy <james at jamesmolloy.co.uk>
>>>> wrote:
>>>> >>  * Philip Reames favours late matching, where we create intrinsics
>>>> late in
>>>> >> the optimization pipeline (CodeGenPrepare) and use "select" as the
>>>> canonical
>>>> >> form up till that point.
>>>> >>  * Owen Anderson favours early matching, using min/max intrinsics as
>>>> the
>>>> >> canonical form through most of the compiler.
>>>> >>
>>>> >> Consensus hasn't yet been reached. Thoughts?
>>>> >
>>>> > Hi James,
>>>> >
>>>> > A similar discussion spawned regarding indexed / strided / masked
>>>> > memory access and the risks are the same:
>>>> >
>>>> > * Early matching hardens the IR, stopping a lot of optimisations
>>>> working
>>>> > * Late matching allows for scrambled IR (due to unaware
>>>> > optimisations), and destroy patterns
>>>> >
>>>> > Each one is horrible in their own right, but I'll side with Philip in
>>>> > this one, in the same way I think Chandler was right about doing more
>>>> > to match complex memory accesses in pure IR, even if the patterns do
>>>> > get more complex. My reasons are two fold:
>>>> >
>>>> > 1. I'll repeat Philip's words: Where do we stop? How many intrinsics
>>>> > are we going to add to the IR until every optimisation pass becomes a
>>>> > huge switch with all possible variations? This was the original design
>>>> > decision behind not implementing every NEON intrinsic as a builtin
>>>> > node, and I still believe Bob Wilson was right back then. It did
>>>> > generate better code.
>>>> >
>>>> > 2. It's easier to fix the passes that destroy data, even if there are
>>>> > many of them, than to add all builtins to all passes in order to
>>>> > understand IR. I agree, doing so doesn't scale well, especially if you
>>>> > move to a dynamic execution of passes (if the pass manager ever
>>>> > supports that), but the alternative doesn't scale at all. It's
>>>> > polynomial vs. exponential. Both are bad, but exponential is worse.
>>>> >
>>>> > In the end, for the strided loads, Hao decided to try out plain IR,
>>>> > shuffles and loads/stores. Elena will try too, for masked and indexed
>>>> > loads, and only as a last resort, we'll add those intrinsics. There
>>>> > were some added, and if possible, we should remove them if we succeed
>>>> > in matching enough patters with just IR.
>>>> >
>>>> > I think we should do the same in this case.
>>>> >
>>>> > cheers,
>>>> > --renato
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150506/ba24fc92/attachment.html>