<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Prompted by a SO post
    (<a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363">http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363</a>)
    I checked and found that LLVM yields the same (seemingly) suboptimal
    code as MSVC.<br>
    Consider the following, simplified, C snippet:<br>
    <tt><br>
      extern void bar(int*);<br>
      <br>
      void foo(int a)<br>
      {<br>
          int ar[100] = {a}; <br>
          if (a)<br>
              return;<br>
          bar(ar);<br>
      }</tt><br>
    <br>
    Ideally, the array initialization should be sank after the return,
    but in Clang/LLVM 3.0 this doesn't happen:<br>
    <pre><span>; ModuleID = '/tmp/webcompile/_11079_0.bc'
<span class="llvm_keyword">target</span> datalayout = "e-p:64:64:64-<span class="llvm_type">i1</span>:8:8-<span class="llvm_type">i8</span>:8:8-<span class="llvm_type">i16</span>:16:16-<span class="llvm_type">i32</span>:32:32-<span class="llvm_type">i64</span>:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
<span class="llvm_keyword">target</span> triple = "x86_64-unknown-linux-gnu"

<span class="llvm_keyword">define</span> <span class="llvm_type">void</span> @_Z3fooi(<span class="llvm_type">i32</span> %a) uwtable {
  %ar = <span class="llvm_keyword">alloca</span> [100 x <span class="llvm_type">i32</span>], <span class="llvm_keyword">align</span> 16
  %1 = <span class="llvm_keyword">bitcast</span> [100 x <span class="llvm_type">i32</span>]* %ar <span class="llvm_keyword">to</span> <span class="llvm_type">i8</span>*
  <span class="llvm_keyword">call</span> <span class="llvm_type">void</span> @llvm.memset.p0i8.<span class="llvm_type">i64</span>(<span class="llvm_type">i8</span>* %1, <span class="llvm_type">i8</span> 0, <span class="llvm_type">i64</span> 400, <span class="llvm_type">i32</span> 16, <span class="llvm_type">i1</span> <span class="llvm_keyword">false</span>)
  %2 = <span class="llvm_keyword">getelementptr</span> inbounds [100 x <span class="llvm_type">i32</span>]* %ar, <span class="llvm_type">i64</span> 0, <span class="llvm_type">i64</span> 0
  <span class="llvm_keyword">store</span> <span class="llvm_type">i32</span> %a, <span class="llvm_type">i32</span>* %2, <span class="llvm_keyword">align</span> 16, !tbaa !0
  %3 = <span class="llvm_keyword">icmp</span> <span class="llvm_keyword">eq</span> <span class="llvm_type">i32</span> %a, 0
  <span class="llvm_keyword">br</span> <span class="llvm_type">i1</span> %3, <span class="llvm_type">label</span> %4, <span class="llvm_type">label</span> %5

; <<span class="llvm_type">label</span>>:4                                       ; preds = %0
  <span class="llvm_keyword">call</span> <span class="llvm_type">void</span> @_Z3barPi(<span class="llvm_type">i32</span>* %2)
  <span class="llvm_keyword">br</span> <span class="llvm_type">label</span> %5

; <<span class="llvm_type">label</span>>:5                                       ; preds = %4, %0
  <span class="llvm_keyword">ret</span> <span class="llvm_type">void</span>
}

<span class="llvm_keyword">declare</span> <span class="llvm_type">void</span> @llvm.memset.p0i8.<span class="llvm_type">i64</span>(<span class="llvm_type">i8</span>* <span class="llvm_keyword">nocapture</span>, <span class="llvm_type">i8</span>, <span class="llvm_type">i64</span>, <span class="llvm_type">i32</span>, <span class="llvm_type">i1</span>) <span class="llvm_keyword">nounwind</span>

<span class="llvm_keyword">declare</span> <span class="llvm_type">void</span> @_Z3barPi(<span class="llvm_type">i32</span>*)

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", <span class="llvm_keyword">null</span>}</span></pre>
    and this gets emitted as (for x64, but x86 is similar):<br>
    <pre><span># BB#0:
        pushq   %rbx
.Ltmp3:
        .cfi_def_cfa_offset 16
        subq    $400, %rsp              # imm = 0x190
.Ltmp4:
        .cfi_def_cfa_offset 416
.Ltmp5:
        .cfi_offset %rbx, -16
        movl    %edi, %ebx
        leaq    (%rsp), %rdi
        xorl    %esi, %esi
        movl    $400, %edx              # imm = 0x190
        callq   memset
        movl    %ebx, (%rsp)
        testl   %ebx, %ebx
        jne     .LBB0_2
# BB#1:
        leaq    (%rsp), %rdi
        callq   _Z3barPi
.LBB0_2:
        addq    $400, %rsp              # imm = 0x190
        popq    %rbx
        ret</span></pre>
    I don't have ToT at hand, so I don't know if this is still the case.
    Any idea why this might be happening?<br>
    <br>
  </body>
</html>