[PATCH/RFC] New TLI option for fast selects

Tue May 5 13:00:51 PDT 2015

> On May 5, 2015, at 12:03 PM, Artem Belevich <tra at google.com> wrote:
> 
> On Tue, May 5, 2015 at 11:37 AM, escha <escha at apple.com <mailto:escha at apple.com>> wrote:
> 
>> On May 5, 2015, at 11:02 AM, Artem Belevich <tra at google.com <mailto:tra at google.com>> wrote:
>> 
>> 
>> 
>> On Tue, May 5, 2015 at 10:38 AM, Eric Christopher <echristo at gmail.com <mailto:echristo at gmail.com>> wrote:
>>  
>> >
>> > c) Got an in-tree user where this would be useful?
>> 
>> I was kinda hoping someone from R600 would know, since I think I recall R600 having a select instruction? I figure it’d be useful to have some feedback from another architecture to see what they’d find useful here, since I’m not big on the idea of shoving in something solely based on an OOT arch’s needs (plus, I probably haven’t even fully thought through its possible benefits either).
>> 
>> Yeah. Maybe poke them and the nvptx guys?
>> 
>> 
>> NVIDIA's PTX supports predicated execution of almost all instructions. Itis, generally speaking, preferred over branches.
>> http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution <http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution>
>> 
>> It's really easy to kill GPU performance with branches and by 'kill' I mean 'couple of orders of magnitude' of a difference. :-/
>> For small fragments of code, predicated execution is likely to be a win.
> 
> This isn’t about predication/unconditional code versus branches; this is about select instructions versus unconditional expansions such as the 3-instruction integer absolute value expansion. It was written with an out-of-tree GPU in mind that has a very powerful select() instruction.
> 
> Wouldn't flexible predicate calculation + ability to predicate an arbitrary instruction be an equivalent of 'powerful select() instruction' and more?

I think in some cases? My thought was that the concept of “select is fast” here maps to “any SELECT_CC is one instruction”, while predicate calculation may only give you 2 instructions for a given SELECT_CC, I think?  But this feels kind of besides the point; the idea of this patch is if a target believes that a SELECT_CC is better than a more-than-one-node expansion, regardless of the real reason, right?

— escha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150505/66a9ad4f/attachment.html>