<div dir="ltr"><div>Have you tried running SLP vectorizer pass (-vectorize-slp)?<br></div><div><br></div><div>Eugene</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <span dir="ltr"><<a href="mailto:arsenm2@gmail.com" target="_blank">arsenm2@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select:<br>

<br>

float4 simple_select(float4 a, float4 b, int4 c)<br>

{<br>

    float4 result;<br>

<br>

    result.x = c.x ? a.x : b.x;<br>

    result.y = c.y ? a.y : b.y;<br>

    result.z = c.z ? a.z : b.z;<br>

    result.w = c.w ? a.w : b.w;<br>

<br>

    return result;<br>

}<br>

<br>

I would expect this would be optimized to<br>

<br>

%bool = icmp eq <4 x i32> %c, 0<br>

%result = select <4 x i1> %bool, <4 x float> %a, <4x float> %b<br>

ret <4 x float> %result<br>

<br>

However, it actually ends up as the 4 separate extractelement/icmp/select sequence.<br>

<br>

Where would be the best place to fix this? Should InstCombine be taking care of this or the vectorizer?<br>

<br>

<br>

Thanks<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div><br></div>