[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels
Don Quixote de la Mancha
quixote at dulcineatech.com
Fri Nov 11 16:14:45 PST 2011
This would be best reported to Apple's Radar bug database at
http://bugreport.apple.com/ but its whole website has been down for a
while.
I have a 100% reproducible Thumb-2 code generation error that occurs
at all of the levels of optimization available in the Xcode 4.2 for
Snow Leopard build settings GUI: -O0, -O1, -O2, -O3 and -Os.
However the bad machine code only occurs in Release builds, never in
Debug builds! I tried the Debug builds at all levels of optimization
as well.
$ xcodebuild -version
Xcode 4.2
Build version 4C199
$ /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/clang --version
Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)
Target: i386-apple-darwin10.8.0
Thread model: posix
I'm not real clear where to find the part of the toolchain that emits
the Thumb-2 assembly, so I can't tell you that tool's precise version.
$ uname -a
Darwin frylock.local 10.8.0 Darwin Kernel Version 10.8.0:
Tue Jun 7 16:33:36 PDT 2011;
root:xnu-1504.15.3~1/RELEASE_I386 i386
The Xcode's iPhone and iPad Simulators run iOS Apps that on my 32-bit
MacBook Pro are built as i386 code. The iOS frameworks (shared
libraries, sort of) that simulated Apps link to are actually shims
that interface to Mac OS X's frameworks.
The i386 code for my simulated App is generated correctly at -Os for
both Release and Debug builds. That suggests that the problem is in
the Thumb-2 code generation back-end, and not in the LLVM IR.
I've seen lots of reports that the Thumb code that the Apple LLVM
compiler generates for ARMv6 is quite buggy, so that one must disable
Thumb code generation for ARMv6 targets. However my first-generation
iPad has a Cortex A8 CPU, which is ARMv7, as does my iPhone 4.
It's quite possible that disabling Thumb code generation for at least
this one source file will correct the bad machine code, but Google has
not blessed me with the insight as to how to do that. It's not done
the same way for LLVM as for GCC. Have any of you this insight to
spare?
It's going to take me a little while to cook up a minimal test case as
I was up all night <strike>trolling the Internet</strike> working on
my iOS App, so I'm pretty beat. But when I have more details for you,
I will post a more detailed report as well as a minimal test case that
builds as a complete iOS App at what is now just a placeholder page:
Apple Xcode 4.2 LLVM Compiler Bug Reports
http://www.dulcineatech.com/bug-reports/xcode/4.2/llvm/
My App Warp Life is so named because it goes very, very fast, with
many more optimizations coming soon. The UI has a speed control
slider whose value is scaled, then pass to the usleep() iOS system
call. usleep() suspends the process for the given number of
microseconds.
I realized just recently that calling usleep with delays that
themselves are insignificant might actually slow my App down quite a
bit, because there is all manner of overhead to making and returning
from even the most trivial system calls. After measuring my game's
frame rate at the best optimizations I could find, for various kinds
of test data, I set a threshhold of 1/250th of a second. I never call
usleep() if the configured delay setting is less than that.
The full source of the entire method, and the Release and Debug build
assembly codes are at the end of this mail. For clarity I show only
the pertinent lines of code right here:
useconds_t usecs = (useconds_t)( self.delay * (float)500000 );
if ( usecs >= 4000 ){ // ~ 1/250 sec
usleep( usecs ); // usecs is ZERO!!!!
}
self.delay is an Objective-C 2.0 property that holds the current value
of the speed slider. When set to maximum speed, usecs will always be
zero. Even so, the branch is ALWAYS taken, despite the source code
ensuring that the branch is only taken when usecs is greater than or
equal to four thousand.
Here is the Thumb-2 assembly for the Release build.
I think the (float)500000 delay scaling factor is meant to be held in
floating point register d8. I thought at first it might not be
initialized at all, but upon closer examination I think it may
actually be initialized from a program counter-relative 32-bit .long
constant immediately following my method's code.
.loc 1 388 3
ldr r0, [r5]
ldr r1, [r4, r0]
adds r1, #1
str r1, [r4, r0]
.loc 1 390 64
mov r0, r4
ldr r1, [r6]
blx _objc_msgSend
vmov s0, r0
vmul.f32 d0, d0, d8
vcvt.u32.f32 d0, d0
vmov r0, s0
Ltmp272:
.loc 1 392 9
cmp.w r0, #4000
Ltmp273:
.loc 1 393 13
it hs
blxhs _usleep
cmp.w *looks* like a 16-bit comparison with an immediate constant, but
in reality the constant is twelve bits. The ARM and Thumb instruction
sets have quite severe restrictions on the allowed ranges of immediate
values because the richness of the ARM and Thumb instruction set makes
it hard to find enough bits in the instruction words to express a
wider range of immediate values than is presently possible.
I don't know what the "it hs" instruction does. I suspect that's
where the problem lies, but "it" is a very common word, and "hs" is
quite common as well, as it is a frequent mispelling for "has".
Perhaps someone who knows Thumb-2 assembly better than I do could
comment.
The assembly for my Debug build is quite unlike that for the Release
build, for every single one of the available optimization levels.
There are quite a few instructions separating the load of the #4000
immediate into r0 and the call to usleep().
I have not yet ensured that there aren't build configuration
differences between my Debug and Release builds, but I don't recall
setting any. My guess is that the totally different machine code in
Debug is there to make source code debugging work better.
Here is my method's full Objective-C source:
- (void) cycleContinuously
{
startDate = [[NSDate alloc] init];
generation = 0;
while ( mRunning ){
[self cycle];
++generation;
useconds_t usecs = (useconds_t)( self.delay * (float)500000 );
if ( usecs >= 4000 ){ // ~ 1/250 sec
usleep( usecs );
}
}
NSDate *endDate = [[NSDate alloc] init];
NSTimeInterval elapsed = [endDate timeIntervalSinceDate: startDate];
[startDate release];
[endDate release];
printf( "Speed: %f gen/sec\n", ( (float)generation ) / elapsed );
return;
}
The assembly for the problem area of my code is completely identical
for each available optimization setting for Release builds. I haven't
made such detailed comparisons for the Debug builds yet.
Here is the Release assembly at -Os:
.align 2
.code 16
.thumb_func "-[LifeGrid cycleContinuously]"
"-[LifeGrid cycleContinuously]":
Ltmp265:
Lfunc_begin24:
.loc 1 380 0
.loc 1 380 1 prologue_end
push {r4, r5, r6, r7, lr}
add r7, sp, #12
push.w {r8, r10, r11}
vpush {d8}
sub sp, #4
.loc 1 382 2
Ltmp266:
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4))
Ltmp267:
mov r4, r0
Ltmp268:
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4))
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4))
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4))
LPC24_0:
add r1, pc
LPC24_1:
add r0, pc
ldr r1, [r1]
ldr r0, [r0]
blx _objc_msgSend
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4))
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4))
LPC24_2:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
movw r11, :lower16:(_OBJC_IVAR_$_LifeGrid.startDate-(LPC24_3+4))
movt r11, :upper16:(_OBJC_IVAR_$_LifeGrid.startDate-(LPC24_3+4))
LPC24_3:
add r11, pc
ldr.w r1, [r11]
.loc 1 383 2
movw r5, :lower16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_4+4))
movt r5, :upper16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_4+4))
LPC24_4:
add r5, pc
.loc 1 382 2
str r0, [r4, r1]
movs r1, #0
.loc 1 383 2
ldr r0, [r5]
.loc 1 385 2
movw r8, :lower16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_5+4))
movt r8, :upper16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_5+4))
LPC24_5:
add r8, pc
.loc 1 383 2
str r1, [r4, r0]
.loc 1 385 2
ldr.w r0, [r8]
ldrb r0, [r4, r0]
cbz r0, LBB24_3
Ltmp269:
.loc 1 386 3
movw r10, :lower16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_6+4))
vldr.32 s16, LCPI24_0
movt r10, :upper16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_6+4))
.loc 1 390 64
movw r6, :lower16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_7+4))
movt r6, :upper16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_7+4))
.loc 1 386 3
LPC24_6:
add r10, pc
.loc 1 390 64
LPC24_7:
add r6, pc
LBB24_2:
Ltmp270:
.loc 1 386 3
ldr.w r1, [r10]
Ltmp271:
mov r0, r4
blx _objc_msgSend
.loc 1 388 3
ldr r0, [r5]
ldr r1, [r4, r0]
adds r1, #1
str r1, [r4, r0]
.loc 1 390 64
mov r0, r4
ldr r1, [r6]
blx _objc_msgSend
vmov s0, r0
vmul.f32 d0, d0, d8
vcvt.u32.f32 d0, d0
vmov r0, s0
Ltmp272:
.loc 1 392 9
cmp.w r0, #4000
Ltmp273:
.loc 1 393 13
it hs
blxhs _usleep
Ltmp274:
.loc 1 385 2
ldr.w r0, [r8]
ldrb r0, [r4, r0]
cmp r0, #0
bne LBB24_2
LBB24_3:
Ltmp275:
.loc 1 382 2
movw r0, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_8+4))
movt r0, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_8+4))
LPC24_8:
add r0, pc
.loc 1 397 41
ldr r1, [r0]
Ltmp276:
.loc 1 382 2
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_9+4))
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_9+4))
LPC24_9:
add r0, pc
.loc 1 397 41
ldr r0, [r0]
blx _objc_msgSend
.loc 1 382 2
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_10+4))
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_10+4))
LPC24_10:
add r1, pc
.loc 1 397 41
ldr r1, [r1]
blx _objc_msgSend
.loc 1 399 69
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_66-(LPC24_11+4))
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_66-(LPC24_11+4))
.loc 1 397 41
mov r6, r0
.loc 1 399 69
ldr.w r0, [r11]
LPC24_11:
add r1, pc
ldr r1, [r1]
ldr r2, [r4, r0]
mov r0, r6
blx _objc_msgSend
str r0, [sp]
.loc 1 401 2
movw r8, :lower16:(L_OBJC_SELECTOR_REFERENCES_68-(LPC24_12+4))
movt r8, :upper16:(L_OBJC_SELECTOR_REFERENCES_68-(LPC24_12+4))
ldr.w r0, [r11]
LPC24_12:
add r8, pc
.loc 1 399 69
mov r10, r1
.loc 1 401 2
ldr.w r1, [r8]
ldr r0, [r4, r0]
blx _objc_msgSend
.loc 1 402 2
ldr.w r1, [r8]
mov r0, r6
blx _objc_msgSend
.loc 1 404 2
ldr r0, [r5]
add r0, r4
vldr.32 s0, [r0]
vcvt.f32.s32 d0, d0
.loc 1 399 69
ldr r0, [sp]
vmov d17, r0, r10
Ltmp277:
.loc 1 404 2
movw r0, :lower16:(L_.str69-(LPC24_13+4))
movt r0, :upper16:(L_.str69-(LPC24_13+4))
vcvt.f64.f32 d16, s0
LPC24_13:
add r0, pc
vdiv.f64 d16, d16, d17
vmov r1, r2, d16
blx _printf
Ltmp278:
.loc 1 407 1
add sp, #4
vpop {d8}
pop.w {r8, r10, r11}
pop {r4, r5, r6, r7, pc}
Ltmp279:
.align 2
LCPI24_0:
.long 1223959552
Ltmp280:
Lfunc_end24:
Ltmp281:
Leh_func_end24:
Here is the Debug assembly at -Os:
.align 2
.code 16
.thumb_func "-[LifeGrid cycleContinuously]"
"-[LifeGrid cycleContinuously]":
Ltmp112:
Lfunc_begin24:
.loc 1 380 0
push {r4, r7, lr}
add r7, sp, #4
sub sp, #44
mov r4, sp
bic r4, r4, #7
mov sp, r4
movs r2, #0
movt r2, #0
str r0, [sp, #40]
str r1, [sp, #36]
.loc 1 382 2 prologue_end
Ltmp113:
ldr.n r0, LCPI24_4
LPC24_4:
add r0, pc
ldr r0, [r0]
ldr.n r1, LCPI24_3
LPC24_3:
add r1, pc
ldr r1, [r1]
str r2, [sp, #12]
blx _objc_msgSend
ldr.n r1, LCPI24_2
LPC24_2:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
ldr r1, [sp, #40]
ldr.n r2, LCPI24_1
LPC24_1:
add r2, pc
ldr r2, [r2]
add r1, r2
str r0, [r1]
.loc 1 383 2
ldr r0, [sp, #40]
ldr.n r1, LCPI24_0
LPC24_0:
add r1, pc
ldr r1, [r1]
add r0, r1
ldr r1, [sp, #12]
str r1, [r0]
LBB24_1:
.loc 1 385 2
ldr r0, [sp, #40]
movw r1, :lower16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_14+4))
movt r1, :upper16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_14+4))
LPC24_14:
add r1, pc
ldr r1, [r1]
ldrb r0, [r0, r1]
movs r1, #0
cmp r0, #0
it ne
movne r1, #1
tst.w r1, #1
beq LBB24_5
movw r0, #4000
movt r0, #0
.loc 1 386 3
Ltmp114:
ldr r1, [sp, #40]
movw r2, :lower16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_15+4))
movt r2, :upper16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_15+4))
LPC24_15:
add r2, pc
ldr r2, [r2]
str r0, [sp, #8]
mov r0, r1
mov r1, r2
blx _objc_msgSend
.loc 1 388 3
ldr r0, [sp, #40]
movw r1, :lower16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_16+4))
movt r1, :upper16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_16+4))
LPC24_16:
add r1, pc
ldr r1, [r1]
mov r2, r1
ldr r2, [r0, r2]
adds r2, #1
str r2, [r0, r1]
.loc 1 390 64
ldr r0, [sp, #40]
Ltmp115:
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_17+4))
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_17+4))
LPC24_17:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
vmov s0, r0
vmov.f64 d1, d16
vldr.32 s1, LCPI24_14
vmov.f64 d2, d1
vmov.f32 s4, s1
vmov.f64 d3, d1
vmov.f32 s6, s0
vmul.f32 d16, d3, d2
vmov.f64 d2, d16
vmov.f32 s0, s4
vmov.f32 s2, s0
vcvt.u32.f32 d16, d1
vmov.f64 d1, d16
vmov.f32 s0, s2
vmov r0, s0
str r0, [sp, #32]
.loc 1 392 9
ldr r0, [sp, #32]
ldr r1, [sp, #8]
cmp r0, r1
blo LBB24_4
.loc 1 393 13
Ltmp116:
ldr r0, [sp, #32]
bl _usleep
str r0, [sp, #4]
Ltmp117:
LBB24_4:
.loc 1 395 2
b LBB24_1
Ltmp118:
LBB24_5:
.loc 1 397 41
ldr.n r0, LCPI24_13
LPC24_13:
add r0, pc
ldr r0, [r0]
ldr.n r1, LCPI24_12
LPC24_12:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
ldr.n r1, LCPI24_11
LPC24_11:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
str r0, [sp, #28]
.loc 1 399 69
ldr r0, [sp, #28]
ldr r1, [sp, #40]
ldr.n r2, LCPI24_10
LPC24_10:
add r2, pc
ldr r2, [r2]
add r1, r2
ldr r2, [r1]
ldr.n r1, LCPI24_9
LPC24_9:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
vmov d16, r0, r1
vstr.64 d16, [sp, #16]
.loc 1 401 2
ldr r0, [sp, #40]
ldr.n r1, LCPI24_8
LPC24_8:
add r1, pc
ldr r1, [r1]
add r0, r1
ldr r0, [r0]
ldr.n r1, LCPI24_7
LPC24_7:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
.loc 1 402 2
ldr r0, [sp, #28]
ldr.n r1, LCPI24_6
LPC24_6:
add r1, pc
ldr r1, [r1]
blx _objc_msgSend
.loc 1 404 2
ldr r0, [sp, #40]
ldr.n r1, LCPI24_5
LPC24_5:
add r1, pc
ldr r1, [r1]
add r0, r1
ldr r0, [r0]
vmov s0, r0
vcvt.f32.s32 s0, s0
vcvt.f64.f32 d16, s0
vldr.64 d17, [sp, #16]
vdiv.f64 d16, d16, d17
vmov r1, r2, d16
movw r0, :lower16:(L_.str69-(LPC24_18+4))
movt r0, :upper16:(L_.str69-(LPC24_18+4))
LPC24_18:
add r0, pc
blx _printf
.loc 1 407 1
str r0, [sp]
subs r4, r7, #4
mov sp, r4
pop {r4, r7, pc}
.align 2
LCPI24_0:
.long _OBJC_IVAR_$_LifeGrid.generation-(LPC24_0+4)
.align 2
LCPI24_1:
.long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_1+4)
.align 2
LCPI24_2:
.long L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4)
.align 2
LCPI24_3:
.long L_OBJC_SELECTOR_REFERENCES_7-(LPC24_3+4)
.align 2
LCPI24_4:
.long L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_4+4)
.align 2
LCPI24_5:
.long _OBJC_IVAR_$_LifeGrid.generation-(LPC24_5+4)
.align 2
LCPI24_6:
.long L_OBJC_SELECTOR_REFERENCES_68-(LPC24_6+4)
.align 2
LCPI24_7:
.long L_OBJC_SELECTOR_REFERENCES_68-(LPC24_7+4)
.align 2
LCPI24_8:
.long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_8+4)
.align 2
LCPI24_9:
.long L_OBJC_SELECTOR_REFERENCES_66-(LPC24_9+4)
.align 2
LCPI24_10:
.long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_10+4)
.align 2
LCPI24_11:
.long L_OBJC_SELECTOR_REFERENCES_-(LPC24_11+4)
.align 2
LCPI24_12:
.long L_OBJC_SELECTOR_REFERENCES_7-(LPC24_12+4)
.align 2
LCPI24_13:
.long L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_13+4)
.align 2
LCPI24_14:
.long 1223959552
Ltmp119:
Lfunc_end24:
Ltmp120:
Leh_func_end24:
Man I gotta catch some ZZZs, I'm totally thrashed. I'll do my best
just to take a little nap, but the chances are pretty good I won't get
outta bed unilt Monday!
--
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
quixote at dulcineatech.com
More information about the llvm-dev
mailing list