<html><head></head><body bgcolor="#FFFFFF"><div>Makes perfect sense, thanks!<br><br><div>-Chris</div></div><div><br>On Feb 25, 2012, at 10:51 AM, Peter Cooper <<a href="mailto:peter_cooper@apple.com">peter_cooper@apple.com</a>> wrote:<br><br></div><div><span></span></div><blockquote type="cite"><div><div><span class="Apple-style-span" style="-webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); ">On Feb 25, 2012, at 10:32 AM, Chris Lattner <<a href="mailto:clattner@apple.com">clattner@apple.com</a>> wrote:</span></div><div><br></div><div></div><blockquote type="cite"><div><br><div><div>On Feb 25, 2012, at 3:17 AM, Carlo Alberto Ferraris wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

  
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  
  <div bgcolor="#FFFFFF" text="#000000">

    Prompted by a SO post

    (<a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363">http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363</a>)

    I checked and found that LLVM yields the same (seemingly) suboptimal

    code as MSVC.<br>

    Consider the following, simplified, C snippet:<br></div></blockquote><div><br></div></div><div><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000"><tt>extern void bar(int*);<br>

      <br>

      void foo(int a)<br>

      {<br>

          int ar[100] = {a}; <br>

          if (a)<br>

              return;<br>

          bar(ar);<br>

      }</tt><br>

    <br>

    Ideally, the array initialization should be sank after the return,

    but in Clang/LLVM 3.0 this doesn't happen:<br></div></blockquote><div><br></div><div>This is a straight-forward form of code motion we don't implement, which would be built on partially dead store analysis.  Our dead store analysis in general isn't very powerful, and cannot see across blocks.  It turns out that it is pretty expensive and doesn't often lead to big performance wins.  That said, it is certainly an area that should be improved.</div><div><br></div><div>I'll note that the original example from SO is more complex.  Instead of a single store, it is a whole loop that initializes the array.  Handling this case requires moving the entire loop, which requires fairly heroic compiler analysis.  The saving grace is that that case is equivalent to a memcpy, so we may be able to handle *that* someday.</div><div><br></div><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000"><pre><span>  %ar = <span class="llvm_keyword">alloca</span> [100 x <span class="llvm_type">i32</span>], <span class="llvm_keyword">align</span> 16

  %1 = <span class="llvm_keyword">bitcast</span> [100 x <span class="llvm_type">i32</span>]* %ar <span class="llvm_keyword">to</span> <span class="llvm_type">i8</span>*

  <span class="llvm_keyword">call</span> <span class="llvm_type">void</span> @llvm.memset.p0i8.<span class="llvm_type">i64</span>(<span class="llvm_type">i8</span>* %1, <span class="llvm_type">i8</span> 0, <span class="llvm_type">i64</span> 400, <span class="llvm_type">i32</span> 16, <span class="llvm_type">i1</span> <span class="llvm_keyword">false</span>)

  %2 = <span class="llvm_keyword">getelementptr</span> inbounds [100 x <span class="llvm_type">i32</span>]* %ar, <span class="llvm_type">i64</span> 0, <span class="llvm_type">i64</span> 0

  <span class="llvm_keyword">store</span> <span class="llvm_type">i32</span> %a, <span class="llvm_type">i32</span>* %2, <span class="llvm_keyword">align</span> 16, !tbaa !0

</span></pre></div></blockquote></div><div>I'm surprised that we're not shortening the memset to skip setting the dead element.  That *is* something that we should be able to handle.  Pete, didn't you implement this a while ago?</div></div></blockquote>Yeah. I think my implementation only trimmed stores to the end of the memset but this is the start. I'll take a look at improving that. Will probably only want to shorten the start of the memset when it's not going to shorten it to a horribly unaligned start position but that's ok here.<div><br></div><div>Pete<br><blockquote type="cite"><div><div><br></div><div>-Chris</div><br></div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>LLVM Developers mailing list</span><br><span><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a></span><br><span><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></span><br></div></blockquote></div></div></blockquote></body></html>