<div dir="ltr">Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed?<div><br></div><div>Best regards</div><div>Saurabh<br><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 17 February 2017 at 19:03, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>I think this is caused by a front-end change (cc'ing clang-dev) because the IR with "-Xclang -disable-llvm-optzns" shows the difference.<br><br>But independently of that, there's a missing IR canonicalization - instcombine doesn't currently do anything with either version.<br><br></div>And the version where we trunc later survives through the backend and produces worse code even for x86 with AVX2:<br>before:<br> vmovd %edi, %xmm1<br> vpmovzxwq %xmm1, %xmm1 <br> vpsraw %xmm1, %xmm0, %xmm0<br> retq<br><br>after:<br> vmovd %edi, %xmm1<br> vpbroadcastd %xmm1, %ymm1<br> vmovdqa LCPI1_0(%rip), %ymm2 <br> vpshufb %ymm2, %ymm1, %ymm1<br> vpermq $232, %ymm1, %ymm1 <br> vpmovzxwd %xmm1, %ymm1 <br> vpmovsxwd %xmm0, %ymm0<br> vpsravd %ymm1, %ymm0, %ymm0<br> vpshufb %ymm2, %ymm0, %ymm0<br> vpermq $232, %ymm0, %ymm0 <br> vzeroupper<br><br><br></div>So this example may have won the bug lottery by exposing all of front-, middle-, back-end bugs. :)<br><div><br><br></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Fri, Feb 17, 2017 at 9:38 AM, Saurabh Verma via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr">Correction in the C snippet:<div><div><br></div><div><div>typedef signed short v8i16_t __attribute__((ext_vector_type<wbr>(8)));</div><div><br></div><div><div><div>v8i16_t foo (v8i16_t a, int n)</div><div>{</div><div> return a >> n;<br></div><div>}<br></div></div></div></div><div><br></div><div class="gmail_extra">Best regards</div><span class="m_-2695995442535747064m_-1598343854329839733HOEnZb"><font color="#888888"><div class="gmail_extra">Saurabh</div></font></span><div><div class="m_-2695995442535747064m_-1598343854329839733h5"><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 17 February 2017 at 16:21, Saurabh Verma <span dir="ltr"><<a href="mailto:saurabh.verma@movidius.com" target="_blank">saurabh.verma@movidius.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div>We are investigating a difference in code generation for vector splat instructions between llvm-3.9 and llvm-4.0, which could lead to a performance regression for our target. Here is the C snippet</div><div><br></div><div>typedef signed v8i16_t __attribute__((ext_vector_type<wbr>(8)))<br></div><div><div><br></div><div>v8i16_t foo (v8i16 a, int n)</div><div>{</div><div> return result = a >> n;<br></div><div>}<br></div></div><div><br></div><div>With llvm-3.9, the generated sequence does a trunc followed by splat, but with llvm-4.0 it is reversed to a splat to a bigger vector followed by a v8i32->v8i16 trunc. Is this by design? The earlier code sequence is definitely better for our target, but are there known scenarios where the new sequence would lead to better code?</div><div><br></div><div>Here are the instruction sequences generated in the two cases:</div><div><br></div><div>With llvm 3.9:</div><div><br></div><div><div>define <8 x i16> @foo(<8 x i16>, i32) #0 {</div><div> %3 = trunc i32 %1 to i16</div><div> %4 = insertelement <8 x i16> undef, i16 %3, i32 0</div><div> %5 = shufflevector <8 x i16> %4, <8 x i16> undef, <8 x i32> zeroinitializer</div><div> %6 = ashr <8 x i16> %0, %5</div><div> ret <8 x i16> %6</div><div>}</div></div><div><br></div><div><br></div><div>With llvm 4.0:</div><div><br></div><div><div>define <8 x i16> @foo(<8 x i16>, i32) #0 {</div><div> %3 = insertelement <8 x i32> undef, i32 %1, i32 0</div><div> %4 = shufflevector <8 x i32> %3, <8 x i32> undef, <8 x i32> zeroinitializer</div><div> %5 = trunc <8 x i32> %4 to <8 x i16></div><div> %6 = ashr <8 x i16> %0, %5</div><div> ret <8 x i16> %6</div><div>}</div></div><div><br></div><div>Best regards</div><span class="m_-2695995442535747064m_-1598343854329839733m_6843872365919815996gmail-HOEnZb"><font color="#888888"><div>Saurabh Verma</div></font></span></div>
</blockquote></div><br></div></div></div></div></div>
<br></div></div>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>
</blockquote></div><br></div>