<div dir="ltr"><div>Have you tried running SLP vectorizer pass (-vectorize-slp)?<br></div><div><br></div><div>Eugene</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <span dir="ltr"><<a href="mailto:arsenm2@gmail.com" target="_blank">arsenm2@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select:<br>
<br>
float4 simple_select(float4 a, float4 b, int4 c)<br>
{<br>
float4 result;<br>
<br>
result.x = c.x ? a.x : b.x;<br>
result.y = c.y ? a.y : b.y;<br>
result.z = c.z ? a.z : b.z;<br>
result.w = c.w ? a.w : b.w;<br>
<br>
return result;<br>
}<br>
<br>
I would expect this would be optimized to<br>
<br>
%bool = icmp eq <4 x i32> %c, 0<br>
%result = select <4 x i1> %bool, <4 x float> %a, <4x float> %b<br>
ret <4 x float> %result<br>
<br>
However, it actually ends up as the 4 separate extractelement/icmp/select sequence.<br>
<br>
Where would be the best place to fix this? Should InstCombine be taking care of this or the vectorizer?<br>
<br>
<br>
Thanks<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote></div><br></div>