[PATCH/RFC] New TLI option for fast selects

Tue May 5 12:03:03 PDT 2015

On Tue, May 5, 2015 at 11:37 AM, escha <escha at apple.com> wrote:

>
> On May 5, 2015, at 11:02 AM, Artem Belevich <tra at google.com> wrote:
>
>
>
> On Tue, May 5, 2015 at 10:38 AM, Eric Christopher <echristo at gmail.com>
> wrote:
>
>>
>>
>>> >
>>> > c) Got an in-tree user where this would be useful?
>>>
>>> I was kinda hoping someone from R600 would know, since I think I recall
>>> R600 having a select instruction? I figure it’d be useful to have some
>>> feedback from another architecture to see what they’d find useful here,
>>> since I’m not big on the idea of shoving in something solely based on an
>>> OOT arch’s needs (plus, I probably haven’t even fully thought through its
>>> possible benefits either).
>>>
>>
>> Yeah. Maybe poke them and the nvptx guys?
>>
>>
> NVIDIA's PTX supports predicated execution of almost all instructions.
> Itis, generally speaking, preferred over branches.
> http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution
>
> It's really easy to kill GPU performance with branches and by 'kill' I
> mean 'couple of orders of magnitude' of a difference. :-/
> For small fragments of code, predicated execution is likely to be a win.
>
>
> This isn’t about predication/unconditional code versus branches; this is
> about select instructions versus unconditional expansions such as the
> 3-instruction integer absolute value expansion. It was written with an
> out-of-tree GPU in mind that has a very powerful select() instruction.
>

Wouldn't flexible predicate calculation + ability to predicate an arbitrary
instruction be an equivalent of 'powerful select() instruction' and more?

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150505/35b0bfa1/attachment.html>