[LLVMdev] Testing LLVM on OS X
Chris Lattner
sabre at nondot.org
Tue May 4 21:30:02 PDT 2004
On Tue, 4 May 2004, Chris Lattner wrote:
> I suspect that a large reason that LLVM does worst than a native C
> compiler with the CBE+GCC is that LLVM generates very low-level C code,
> and I'm not convinced that GCC is doing a very good job (ie, without
> syntactic loops).
Yup, this is EXACTLY what is going on.
I took this very simple C function:
int Array[1000];
void test(int X) {
int i;
for (i = 0; i < 1000; ++i)
Array[i] += X;
}
Compile with -O3 on OS/X gave me this:
_test:
mflr r5
bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
mflr r2
mtlr r5
addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb")
li r2,0
lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4)
li r4,1000
mtctr r4
L9:
lwzx r7,r2,r9 ; load
add r6,r7,r3 ; add
stwx r6,r2,r9 ; store
addi r2,r2,4 ; Increment pointer
bdnz L9 ; Decrement count register, branch while not zero
blr
This is nice code, good GCC. :)
Okay, LLVM currently generates this code from the CBE:
void test(int l7_X) {
unsigned l8_indvar;
unsigned l8_indvar__PHI_TEMPORARY;
int *l14_tmp_2E_5;
int l7_tmp_2E_9;
unsigned l8_indvar_2E_next;
l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */
l13_no_exit:
l8_indvar = l8_indvar__PHI_TEMPORARY;
l14_tmp_2E_5 = &Array[l8_indvar];
l7_tmp_2E_9 = *l14_tmp_2E_5;
*l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X);
l8_indvar_2E_next = l8_indvar + 1u;
if (!(l8_indvar_2E_next == 1000u)) {
l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
goto l13_no_exit;
}
return;
}
This has exactly the same operations in the loop, so GCC should produce
the same code, right? Wrong:
_test:
mflr r4
bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
mflr r2
mtlr r4
li r11,0
addis r10,r2,ha16(_Array-"L00000000001$pb")
L2:
slwi r2,r11,2 ; Shift left "i" by 2
la r5,lo16(_Array-"L00000000001$pb")(r10)
cmpwi cr0,r11,999 ; compare i to the trip count
lwzx r7,r2,r5 ; Load from array
addi r11,r11,1 ; increment "i"
add r6,r7,r3 ; Add value to array value
stwx r6,r2,r5 ; store into array
bne+ cr0,L2 ; Loop until done
blr
Hrm, basically gcc is not doing ANY loop optimization (e.g.
strength reduction or "do-loop" optimization) what-so-ever. I'm sure that
the X86 GCC is suffering from the same problems, it's just that X86
doesn't depend on strength reduction and do-loop optimization as much, so
it's not so pronounced.
Interestingly, if I tweak the .cbe code to be this:
do {
l8_indvar = l8_indvar__PHI_TEMPORARY;
l14_tmp_2E_5 = &Array[l8_indvar];
l7_tmp_2E_9 = *l14_tmp_2E_5;
*l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X);
l8_indvar_2E_next = l8_indvar + 1u;
l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
} while (!(l8_indvar_2E_next == 1000u));
GCC generates the nice code again, virtually identical to the code from
the original source. AAAH! :)
Maybe this is a good argument for making the CBE generate syntactic loops
in simple cases. I may have some time to try implementing this on the
weekend. That is, if no one beats me to it. :)
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/
More information about the llvm-dev
mailing list