[LLVMdev] Tail call optimization deeds

Wed Sep 5 15:57:10 PDT 2007

Hi,

I finally had a little time to talk to Chris about tail call. But what  
I have is nothing concrete and also very vague.

We'd like to see tail call optimization to be similar to the target  
independent lowering of ISD::CALL nodes. These are auto-generated  
from ???CallingConv.td files. Some target specific details such as  
function address register (ECX in your example) should be coded in  
these files. Ditto for stack offsets, etc. These should be part of the  
fastcc calling convention. The tail call lowering will use these  
information and target hooks to perform tail call optimization on  
fastcc calls that are really tail calls.

I think it's reasonable to start by doing a (clean) target specific  
implementation for X86. Just understand we'd like to move to a target  
independent mechanism so the earlier implementation may not survive.

As for what you described here. I am having a hard time following  
it. :-) Please send patch.

Thanks,

Evan

On Aug 11, 2007, at 1:25 PM, Arnold Schwaighofer wrote:

> Okay so i implemented an(other :) initial version for X86-32 backend,
> this time based on TOT:
>
> It is not very generic at the moment. Only functions with
> callingconv::fastcc and the tail call attribute will be optimized.
> Maybe the next step should be to integrate the code into the other
> calling convention lowering. Here is what i have at the moment:
>
> If callingconv::fastcc is used the function will be custom lowered.
> (LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is
> based on the std calling convention minus ecx as inreg)
> The lowering code decides whether the function call really is
> eligible for tail call optimization tco (caller-callee agree, tail
> call opt is enabled, tail position, TODO: check for PIC code).
> If it is okay to do tco it calculates the stack adjustment.
> Repositions the return address,
> Puts the arguments on the correct position relative to the (virtual)
> frame pointer.
> Moves the function address to register ECX if the tailcallee is in a
> register. (TODO: set ECX as live out?)
> TODO: check what to do with pic code
> Adds register arguments (EAX, EDX) as live in if they are used for
> arguments.
> Create a TAILCALL node (a pseudo instruction) holding the Callee
> (Register or TargetGlobalAddress/ExternalSymbol), the stack
> adjustment as a constant, and the live-in registers. The
> TargetLowering class has a IsLastCallTailCall state field which is
> set to true such that the LowerRET function nows it has to produce
> different code. The TAILCALL node is then returned.
>
> If it is not okay the code produces a function call like the std
> calling convention (callee pops arguments on return).
> There is one difference though: since tail call functions might
> arbitrarily change stack alignment (since they adjust the stack by
> the difference of the argument stack size) the stack adjustment (by
> that i mean the number of bytes a function requires as its stack
> size) i when i calculate that stack size/ stack adjustment i take
> care that the stack sizes are always multiples of the targets stack
> alignment. (this code at the moment can be turned off by an option
> eg.:if it's known that no dynamically linked functions are called).
> I observed the behavior of dynamically linked functions requiring 16
> byte stack alignment on mac os x. This might also render it difficult
> to do general tail call optimization (for normal std call, c call
> functions it would only bem possible if the stack size difference is
> a multiple of the stack alignment). But maybe i am misinterpreting
> the situation. (TODO: look into that:)
>
> The lowering of arguments is pretty similar to  std call.
>
> The LowerRET function checks for the IsLastCallTailCall field and if
> its true does the following otherwise it does its usual duty.
> It looks for the TAILCALL node and uses the operand holding the
> callee and the stack adjustment to build and return a TC_RETURN node
> with preceeding operands as its own operands.
>
> The TAILCALL node is converted to a noop machine instruction.
> Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,
> isret=1)
>
> The machine instructions needed to adjust the stack and jump to the
> callee is added in the epilogue generating function in
> X86RegisterInfo. It uses the two operands of the TC_RETURN node for
> the required information.
>
>
> So modifications are:
> a LowerX86_32FastCCCallTo function
> a LowerX86_32FastCCArguments function
> change to LowerRET to generate the TC_RETURN node
> some code at the end of X86RegisterInfo to do stack adjustment/
> jumping to tailcallee (like the EH_RETURN code, thanks Anton for the
> pointers!)
> small modifications to X86InstrInfo.td
> (a yes and at the moment i remove generating TAILCALL nodes in the
> other functions - LowerCCCall and the like)
>
> Another TODO: look a floating point code to see whether there are any
> kinks. Look at other calling conventions to see what is possible
> there. (see Anton's remarks)
>
> I am away now for 2 weeks (isle of cres - camping). but after that i
> will be looking into further improving the code (and also look at the
> other architectures).
> Or maybe i have answers telling me that i am going into a totally
> wrong direction , and i should let the experts do this stuff and stop
> nagging people :).
>
> regards arnold
>
>
>> need to formulate conditions of the call to be "suitable for tail
>> call". On x86 it's the following:
>> 1. The calling conventions of b and c are "compatible". There are two
>> scenarios: caller cleans the stack (C calling convention), callee
>> cleans
>> the stack (stdcall calling convention). The call is not suitable for
>> tail call, if b and c cleans the stack differently. For example,  
>> let b
>> has C CC and c - stdcall CC. In this case making tail call results to
>> double cleaning (on in a and one in c), which is incorrect.
>> 2. The same situtation is for struct return functions: the caller
>> pushes
>> extra pointer to return value, but callee is required to clean this
>> pointer out of the stack. So, call b->c is "suitable for tail call"
>> iff
>> both b and c are struct return.
>> 3. Various tricky cases with FP. I don't remeber correctly, but there
>> can be some problems with functions returning FP values.
>> 4. PIC. %ebx should be live in order to make a call via PLT.
>> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
>> calls only to functions within the same module (PLT is not required).
>> 5. %ecx is livein for regparm(3) functions.
>> ... (maybe something I forget)
>>
>> --
>> WBR, Anton Korobeynikov
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev