[PATCH] D29485: [Builtin][ARM] Implement addsf3/__aeabi_fadd for Thumb1
Weiming Zhao via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 3 23:45:03 PST 2017
weimingz added a comment.
My previous comment of "136 bytes of stack usage" is inaccurate. I just find the default build doesn't have any -O flag. Is it an issue without a default -O flag?
Internally, we use -Os.
Using -Os, it uses 24 bytes in stack for spilling.
Regarding constants, it uses the following: (some I can figure out easily)
238: 7fffffff .word 0x7fffffff ==> this mask is to get the Abs value. In Asm, we do lsl and lsr. Since we need to do Abs twice, compiler may think this is better. But ld is more expensive.
23c: 007fffff .word 0x007fffff ===> similar mask. Used twice
240: 55555555 .word 0x55555555 ==> no idea
244: 33333333 .word 0x33333333 ==> no idea
248: 0f0f0f0f .word 0x0f0f0f0f ==> no idea
24c: 01010101 .word 0x01010101 ==> no idea
250: 7fc00000 .word 0x7fc00000 ==> Similar mask
On code size, addsf3.c.o (using -Os) vs addsf3.S.o is 604 vs 312 bytes.
For performance, on a armv7 Android device, it's 1.4x faster using my own micro benchmark. I don't have the number on a cortex-m0. But on real project, this asm version let us close the 25% gap against gcc.
More information about the llvm-commits