[LLVMdev] Tail call optimization deeds
Arnold Schwaighofer
arnold.schwaighofer at gmail.com
Sat Aug 11 13:25:56 PDT 2007
Okay so i implemented an(other :) initial version for X86-32 backend,
this time based on TOT:
It is not very generic at the moment. Only functions with
callingconv::fastcc and the tail call attribute will be optimized.
Maybe the next step should be to integrate the code into the other
calling convention lowering. Here is what i have at the moment:
If callingconv::fastcc is used the function will be custom lowered.
(LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is
based on the std calling convention minus ecx as inreg)
The lowering code decides whether the function call really is
eligible for tail call optimization tco (caller-callee agree, tail
call opt is enabled, tail position, TODO: check for PIC code).
If it is okay to do tco it calculates the stack adjustment.
Repositions the return address,
Puts the arguments on the correct position relative to the (virtual)
frame pointer.
Moves the function address to register ECX if the tailcallee is in a
register. (TODO: set ECX as live out?)
TODO: check what to do with pic code
Adds register arguments (EAX, EDX) as live in if they are used for
arguments.
Create a TAILCALL node (a pseudo instruction) holding the Callee
(Register or TargetGlobalAddress/ExternalSymbol), the stack
adjustment as a constant, and the live-in registers. The
TargetLowering class has a IsLastCallTailCall state field which is
set to true such that the LowerRET function nows it has to produce
different code. The TAILCALL node is then returned.
If it is not okay the code produces a function call like the std
calling convention (callee pops arguments on return).
There is one difference though: since tail call functions might
arbitrarily change stack alignment (since they adjust the stack by
the difference of the argument stack size) the stack adjustment (by
that i mean the number of bytes a function requires as its stack
size) i when i calculate that stack size/ stack adjustment i take
care that the stack sizes are always multiples of the targets stack
alignment. (this code at the moment can be turned off by an option
eg.:if it's known that no dynamically linked functions are called).
I observed the behavior of dynamically linked functions requiring 16
byte stack alignment on mac os x. This might also render it difficult
to do general tail call optimization (for normal std call, c call
functions it would only bem possible if the stack size difference is
a multiple of the stack alignment). But maybe i am misinterpreting
the situation. (TODO: look into that:)
The lowering of arguments is pretty similar to std call.
The LowerRET function checks for the IsLastCallTailCall field and if
its true does the following otherwise it does its usual duty.
It looks for the TAILCALL node and uses the operand holding the
callee and the stack adjustment to build and return a TC_RETURN node
with preceeding operands as its own operands.
The TAILCALL node is converted to a noop machine instruction.
Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,
isret=1)
The machine instructions needed to adjust the stack and jump to the
callee is added in the epilogue generating function in
X86RegisterInfo. It uses the two operands of the TC_RETURN node for
the required information.
So modifications are:
a LowerX86_32FastCCCallTo function
a LowerX86_32FastCCArguments function
change to LowerRET to generate the TC_RETURN node
some code at the end of X86RegisterInfo to do stack adjustment/
jumping to tailcallee (like the EH_RETURN code, thanks Anton for the
pointers!)
small modifications to X86InstrInfo.td
(a yes and at the moment i remove generating TAILCALL nodes in the
other functions - LowerCCCall and the like)
Another TODO: look a floating point code to see whether there are any
kinks. Look at other calling conventions to see what is possible
there. (see Anton's remarks)
I am away now for 2 weeks (isle of cres - camping). but after that i
will be looking into further improving the code (and also look at the
other architectures).
Or maybe i have answers telling me that i am going into a totally
wrong direction , and i should let the experts do this stuff and stop
nagging people :).
regards arnold
> need to formulate conditions of the call to be "suitable for tail
> call". On x86 it's the following:
> 1. The calling conventions of b and c are "compatible". There are two
> scenarios: caller cleans the stack (C calling convention), callee
> cleans
> the stack (stdcall calling convention). The call is not suitable for
> tail call, if b and c cleans the stack differently. For example, let b
> has C CC and c - stdcall CC. In this case making tail call results to
> double cleaning (on in a and one in c), which is incorrect.
> 2. The same situtation is for struct return functions: the caller
> pushes
> extra pointer to return value, but callee is required to clean this
> pointer out of the stack. So, call b->c is "suitable for tail call"
> iff
> both b and c are struct return.
> 3. Various tricky cases with FP. I don't remeber correctly, but there
> can be some problems with functions returning FP values.
> 4. PIC. %ebx should be live in order to make a call via PLT.
> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
> calls only to functions within the same module (PLT is not required).
> 5. %ecx is livein for regparm(3) functions.
> ... (maybe something I forget)
>
> --
> WBR, Anton Korobeynikov
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list