[LLVMdev] Possible missed optimization on function calling?
Borja Ferrer
borja.ferav at gmail.com
Tue Sep 21 13:21:46 PDT 2010
Hello, I noticed that the following code could be improved a little bit
further. If the optimization is too tricky for the compiler or something and
it's done this way by design forgive me, but in any case i just wanted to
point it out.
Consider the following C code:
extern int mcos(int a);
extern int msin(int a);
extern int mdiv(int a, int b);
int foo(int a, int b)
{
int a4 = mdiv(mcos(a), msin(b));
return a4;
}
I noticed this while testing it for the backend i'm currently developing,
but it produces exactly the same code for other targets:
march = msp430:
push.w r11
push.w r10
push.w r9
push.w r8
mov.w r14, r11
mov.w r15, r10 ; store a
mov.w r13, r15
mov.w r12, r14 ; pass b
call #msin
mov.w r15, r9
mov.w r14, r8 ; store msin(b)
mov.w r10, r15
mov.w r11, r14 ; pass a
call #mcos
mov.w r9, r13 ; pass msin(b)
mov.w r8, r12
call #mdiv
pop.w r8
pop.w r9
pop.w r10
pop.w r11
ret
march = thumb
push {r4, r5, lr}
mov r4, r0
mov r0, r1
bl msin
mov r5, r0
mov r0, r4
bl mcos
mov r1, r5
bl mdiv
pop {r4, r5, pc}
Using the MSP430 example above, it could have produced:
push.w r11
push.w r10
mov.w r14, r11
mov.w r15, r10 ; store a
mov.w r13, r15
mov.w r12, r14 ; pass b
call #msin
; SWAP MSIN(B) AND ARGUMENT "a" USING R13:R12
mov.w r15, r13
mov.w r14, r12 : store msin(b)
mov.w r11, r14
mov.w r10, r15 ; pass a
mov.w r13, r11
mov.w r12, r10 ; save msin(b) into callee saved regs
call #mcos
mov.w r11, r13 ; pass msin(b)
mov.w r10, r12
call #mdiv
pop.w r10
pop.w r11
ret
The basic explanation is that r13:r12 could be used as scratch registers
after msin() is called to swap the result of msin(b) with the argument a.
This saves pushing and popping r9 and r8, at the cost of using two extra
moves, saving in total 2 instructions but saving 4 memory acceses.
In the case of my backend which is targetted for an 8bit arch but supports
16bit moves it saves pushing and popping four 8bit regs which means saving 6
instructions, or in other words 8 memory accesses. In terms of speed it
saves 14 cycles (2 cycles per push/pop).
As a side note GCC produces this same code.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100921/74c00f6f/attachment.html>
More information about the llvm-dev
mailing list