[PATCH/RFC] New TLI option for fast selects

Artem Belevich tra at google.com
Tue May 5 11:02:34 PDT 2015


On Tue, May 5, 2015 at 10:38 AM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
>> >
>> > c) Got an in-tree user where this would be useful?
>>
>> I was kinda hoping someone from R600 would know, since I think I recall
>> R600 having a select instruction? I figure it’d be useful to have some
>> feedback from another architecture to see what they’d find useful here,
>> since I’m not big on the idea of shoving in something solely based on an
>> OOT arch’s needs (plus, I probably haven’t even fully thought through its
>> possible benefits either).
>>
>
> Yeah. Maybe poke them and the nvptx guys?
>
>
NVIDIA's PTX supports predicated execution of almost all instructions.
Itis, generally speaking, preferred over branches.
http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution

It's really easy to kill GPU performance with branches and by 'kill' I mean
'couple of orders of magnitude' of a difference. :-/
For small fragments of code, predicated execution is likely to be a win.

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150505/924563ba/attachment.html>


More information about the llvm-commits mailing list