[LLVMdev] Tail call optimization deeds

Sat Aug 11 13:25:56 PDT 2007

Okay so i implemented an(other :) initial version for X86-32 backend,  
this time based on TOT:

It is not very generic at the moment. Only functions with  
callingconv::fastcc and the tail call attribute will be optimized.  
Maybe the next step should be to integrate the code into the other  
calling convention lowering. Here is what i have at the moment:

If callingconv::fastcc is used the function will be custom lowered.  
(LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is  
based on the std calling convention minus ecx as inreg)
The lowering code decides whether the function call really is  
eligible for tail call optimization tco (caller-callee agree, tail  
call opt is enabled, tail position, TODO: check for PIC code).
If it is okay to do tco it calculates the stack adjustment.
Repositions the return address,
Puts the arguments on the correct position relative to the (virtual)  
frame pointer.
Moves the function address to register ECX if the tailcallee is in a  
register. (TODO: set ECX as live out?)
TODO: check what to do with pic code
Adds register arguments (EAX, EDX) as live in if they are used for  
arguments.
Create a TAILCALL node (a pseudo instruction) holding the Callee  
(Register or TargetGlobalAddress/ExternalSymbol), the stack  
adjustment as a constant, and the live-in registers. The  
TargetLowering class has a IsLastCallTailCall state field which is  
set to true such that the LowerRET function nows it has to produce  
different code. The TAILCALL node is then returned.

If it is not okay the code produces a function call like the std  
calling convention (callee pops arguments on return).
There is one difference though: since tail call functions might  
arbitrarily change stack alignment (since they adjust the stack by  
the difference of the argument stack size) the stack adjustment (by  
that i mean the number of bytes a function requires as its stack  
size) i when i calculate that stack size/ stack adjustment i take  
care that the stack sizes are always multiples of the targets stack  
alignment. (this code at the moment can be turned off by an option  
eg.:if it's known that no dynamically linked functions are called).
I observed the behavior of dynamically linked functions requiring 16  
byte stack alignment on mac os x. This might also render it difficult  
to do general tail call optimization (for normal std call, c call  
functions it would only bem possible if the stack size difference is  
a multiple of the stack alignment). But maybe i am misinterpreting  
the situation. (TODO: look into that:)

The lowering of arguments is pretty similar to  std call.

The LowerRET function checks for the IsLastCallTailCall field and if  
its true does the following otherwise it does its usual duty.
It looks for the TAILCALL node and uses the operand holding the  
callee and the stack adjustment to build and return a TC_RETURN node  
with preceeding operands as its own operands.

The TAILCALL node is converted to a noop machine instruction.
Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,  
isret=1)

The machine instructions needed to adjust the stack and jump to the  
callee is added in the epilogue generating function in  
X86RegisterInfo. It uses the two operands of the TC_RETURN node for  
the required information.

So modifications are:
a LowerX86_32FastCCCallTo function
a LowerX86_32FastCCArguments function
change to LowerRET to generate the TC_RETURN node
some code at the end of X86RegisterInfo to do stack adjustment/ 
jumping to tailcallee (like the EH_RETURN code, thanks Anton for the  
pointers!)
small modifications to X86InstrInfo.td
(a yes and at the moment i remove generating TAILCALL nodes in the  
other functions - LowerCCCall and the like)

Another TODO: look a floating point code to see whether there are any  
kinks. Look at other calling conventions to see what is possible  
there. (see Anton's remarks)

I am away now for 2 weeks (isle of cres - camping). but after that i  
will be looking into further improving the code (and also look at the  
other architectures).
Or maybe i have answers telling me that i am going into a totally  
wrong direction , and i should let the experts do this stuff and stop  
nagging people :).

regards arnold

>  need to formulate conditions of the call to be "suitable for tail
> call". On x86 it's the following:
> 1. The calling conventions of b and c are "compatible". There are two
> scenarios: caller cleans the stack (C calling convention), callee  
> cleans
> the stack (stdcall calling convention). The call is not suitable for
> tail call, if b and c cleans the stack differently. For example, let b
> has C CC and c - stdcall CC. In this case making tail call results to
> double cleaning (on in a and one in c), which is incorrect.
> 2. The same situtation is for struct return functions: the caller  
> pushes
> extra pointer to return value, but callee is required to clean this
> pointer out of the stack. So, call b->c is "suitable for tail call"  
> iff
> both b and c are struct return.
> 3. Various tricky cases with FP. I don't remeber correctly, but there
> can be some problems with functions returning FP values.
> 4. PIC. %ebx should be live in order to make a call via PLT.
> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
> calls only to functions within the same module (PLT is not required).
> 5. %ecx is livein for regparm(3) functions.
> ... (maybe something I forget)
>
> --
> WBR, Anton Korobeynikov
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev