<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Prompted by a SO post
(<a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363">http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363</a>)
I checked and found that LLVM yields the same (seemingly) suboptimal
code as MSVC.<br>
Consider the following, simplified, C snippet:<br>
<tt><br>
extern void bar(int*);<br>
<br>
void foo(int a)<br>
{<br>
int ar[100] = {a}; <br>
if (a)<br>
return;<br>
bar(ar);<br>
}</tt><br>
<br>
Ideally, the array initialization should be sank after the return,
but in Clang/LLVM 3.0 this doesn't happen:<br>
<pre><span>; ModuleID = '/tmp/webcompile/_11079_0.bc'
<span class="llvm_keyword">target</span> datalayout = "e-p:64:64:64-<span class="llvm_type">i1</span>:8:8-<span class="llvm_type">i8</span>:8:8-<span class="llvm_type">i16</span>:16:16-<span class="llvm_type">i32</span>:32:32-<span class="llvm_type">i64</span>:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
<span class="llvm_keyword">target</span> triple = "x86_64-unknown-linux-gnu"
<span class="llvm_keyword">define</span> <span class="llvm_type">void</span> @_Z3fooi(<span class="llvm_type">i32</span> %a) uwtable {
%ar = <span class="llvm_keyword">alloca</span> [100 x <span class="llvm_type">i32</span>], <span class="llvm_keyword">align</span> 16
%1 = <span class="llvm_keyword">bitcast</span> [100 x <span class="llvm_type">i32</span>]* %ar <span class="llvm_keyword">to</span> <span class="llvm_type">i8</span>*
<span class="llvm_keyword">call</span> <span class="llvm_type">void</span> @llvm.memset.p0i8.<span class="llvm_type">i64</span>(<span class="llvm_type">i8</span>* %1, <span class="llvm_type">i8</span> 0, <span class="llvm_type">i64</span> 400, <span class="llvm_type">i32</span> 16, <span class="llvm_type">i1</span> <span class="llvm_keyword">false</span>)
%2 = <span class="llvm_keyword">getelementptr</span> inbounds [100 x <span class="llvm_type">i32</span>]* %ar, <span class="llvm_type">i64</span> 0, <span class="llvm_type">i64</span> 0
<span class="llvm_keyword">store</span> <span class="llvm_type">i32</span> %a, <span class="llvm_type">i32</span>* %2, <span class="llvm_keyword">align</span> 16, !tbaa !0
%3 = <span class="llvm_keyword">icmp</span> <span class="llvm_keyword">eq</span> <span class="llvm_type">i32</span> %a, 0
<span class="llvm_keyword">br</span> <span class="llvm_type">i1</span> %3, <span class="llvm_type">label</span> %4, <span class="llvm_type">label</span> %5
; <<span class="llvm_type">label</span>>:4 ; preds = %0
<span class="llvm_keyword">call</span> <span class="llvm_type">void</span> @_Z3barPi(<span class="llvm_type">i32</span>* %2)
<span class="llvm_keyword">br</span> <span class="llvm_type">label</span> %5
; <<span class="llvm_type">label</span>>:5 ; preds = %4, %0
<span class="llvm_keyword">ret</span> <span class="llvm_type">void</span>
}
<span class="llvm_keyword">declare</span> <span class="llvm_type">void</span> @llvm.memset.p0i8.<span class="llvm_type">i64</span>(<span class="llvm_type">i8</span>* <span class="llvm_keyword">nocapture</span>, <span class="llvm_type">i8</span>, <span class="llvm_type">i64</span>, <span class="llvm_type">i32</span>, <span class="llvm_type">i1</span>) <span class="llvm_keyword">nounwind</span>
<span class="llvm_keyword">declare</span> <span class="llvm_type">void</span> @_Z3barPi(<span class="llvm_type">i32</span>*)
!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", <span class="llvm_keyword">null</span>}</span></pre>
and this gets emitted as (for x64, but x86 is similar):<br>
<pre><span># BB#0:
pushq %rbx
.Ltmp3:
.cfi_def_cfa_offset 16
subq $400, %rsp # imm = 0x190
.Ltmp4:
.cfi_def_cfa_offset 416
.Ltmp5:
.cfi_offset %rbx, -16
movl %edi, %ebx
leaq (%rsp), %rdi
xorl %esi, %esi
movl $400, %edx # imm = 0x190
callq memset
movl %ebx, (%rsp)
testl %ebx, %ebx
jne .LBB0_2
# BB#1:
leaq (%rsp), %rdi
callq _Z3barPi
.LBB0_2:
addq $400, %rsp # imm = 0x190
popq %rbx
ret</span></pre>
I don't have ToT at hand, so I don't know if this is still the case.
Any idea why this might be happening?<br>
<br>
</body>
</html>