<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 5, 2015 at 10:38 AM, Eric Christopher <span dir="ltr"><<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">

><br>

> c) Got an in-tree user where this would be useful?<br>

<br></span>

I was kinda hoping someone from R600 would know, since I think I recall R600 having a select instruction? I figure it’d be useful to have some feedback from another architecture to see what they’d find useful here, since I’m not big on the idea of shoving in something solely based on an OOT arch’s needs (plus, I probably haven’t even fully thought through its possible benefits either).<br></blockquote><div><br></div><div>Yeah. Maybe poke them and the nvptx guys?</div><div><br></div></div></div></blockquote><div><br></div><div>NVIDIA's PTX supports predicated execution of almost all instructions. Itis, generally speaking, preferred over branches.</div><div><a href="http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution">http://docs.nvidia.com/cuda/parallel-thread-execution/#predicated-execution</a><br></div><div><br></div><div>It's really easy to kill GPU performance with branches and by 'kill' I mean 'couple of orders of magnitude' of a difference. :-/</div><div>For small fragments of code, predicated execution is likely to be a win.</div><div><br></div></div>-- <br><div class="gmail_signature"><div dir="ltr">--Artem Belevich</div></div>

</div></div>