[LLVMdev] Tail call optimization deeds

Mon Aug 13 18:33:17 PDT 2007

On 2007-08-13, at 17:12, Evan Cheng wrote:

> Hi Arnold and Anton,
>
> Sorry I have been ignoring your emails on this topic. It's an
> important task

ocaml agrees. :)

> and I really need sometime to think about it (and talk
> to Chris about it!) But this has been an especially hectic week. I am
> also going to vacation soon so I am not sure when I would get around
> to it.
>
> If Chris has time, I am sure he has lots to say on this topic. :-)
> Otherwise, please be patient. :-)
>
> Evan
>
> On Aug 11, 2007, at 1:25 PM, Arnold Schwaighofer wrote:
>
>> Okay so i implemented an(other :) initial version for X86-32 backend,
>> this time based on TOT:
>>
>> It is not very generic at the moment. Only functions with
>> callingconv::fastcc and the tail call attribute will be optimized.
>> Maybe the next step should be to integrate the code into the other
>> calling convention lowering. Here is what i have at the moment:
>>
>> If callingconv::fastcc is used the function will be custom lowered.
>> (LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is
>> based on the std calling convention minus ecx as inreg)
>> The lowering code decides whether the function call really is
>> eligible for tail call optimization tco (caller-callee agree, tail
>> call opt is enabled, tail position, TODO: check for PIC code).
>> If it is okay to do tco it calculates the stack adjustment.
>> Repositions the return address,
>> Puts the arguments on the correct position relative to the (virtual)
>> frame pointer.
>> Moves the function address to register ECX if the tailcallee is in a
>> register. (TODO: set ECX as live out?)
>> TODO: check what to do with pic code
>> Adds register arguments (EAX, EDX) as live in if they are used for
>> arguments.
>> Create a TAILCALL node (a pseudo instruction) holding the Callee
>> (Register or TargetGlobalAddress/ExternalSymbol), the stack
>> adjustment as a constant, and the live-in registers. The
>> TargetLowering class has a IsLastCallTailCall state field which is
>> set to true such that the LowerRET function nows it has to produce
>> different code. The TAILCALL node is then returned.
>>
>> If it is not okay the code produces a function call like the std
>> calling convention (callee pops arguments on return).
>> There is one difference though: since tail call functions might
>> arbitrarily change stack alignment (since they adjust the stack by
>> the difference of the argument stack size) the stack adjustment (by
>> that i mean the number of bytes a function requires as its stack
>> size) i when i calculate that stack size/ stack adjustment i take
>> care that the stack sizes are always multiples of the targets stack
>> alignment. (this code at the moment can be turned off by an option
>> eg.:if it's known that no dynamically linked functions are called).
>> I observed the behavior of dynamically linked functions requiring 16
>> byte stack alignment on mac os x. This might also render it difficult
>> to do general tail call optimization (for normal std call, c call
>> functions it would only bem possible if the stack size difference is
>> a multiple of the stack alignment). But maybe i am misinterpreting
>> the situation. (TODO: look into that:)
>>
>> The lowering of arguments is pretty similar to  std call.
>>
>> The LowerRET function checks for the IsLastCallTailCall field and if
>> its true does the following otherwise it does its usual duty.
>> It looks for the TAILCALL node and uses the operand holding the
>> callee and the stack adjustment to build and return a TC_RETURN node
>> with preceeding operands as its own operands.
>>
>> The TAILCALL node is converted to a noop machine instruction.
>> Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,
>> isret=1)
>>
>> The machine instructions needed to adjust the stack and jump to the
>> callee is added in the epilogue generating function in
>> X86RegisterInfo. It uses the two operands of the TC_RETURN node for
>> the required information.
>>
>>
>> So modifications are:
>> a LowerX86_32FastCCCallTo function
>> a LowerX86_32FastCCArguments function
>> change to LowerRET to generate the TC_RETURN node
>> some code at the end of X86RegisterInfo to do stack adjustment/
>> jumping to tailcallee (like the EH_RETURN code, thanks Anton for the
>> pointers!)
>> small modifications to X86InstrInfo.td
>> (a yes and at the moment i remove generating TAILCALL nodes in the
>> other functions - LowerCCCall and the like)
>>
>> Another TODO: look a floating point code to see whether there are any
>> kinks. Look at other calling conventions to see what is possible
>> there. (see Anton's remarks)
>>
>> I am away now for 2 weeks (isle of cres - camping). but after that i
>> will be looking into further improving the code (and also look at the
>> other architectures).
>> Or maybe i have answers telling me that i am going into a totally
>> wrong direction , and i should let the experts do this stuff and stop
>> nagging people :).
>>
>> regards arnold
>>
>>
>>> need to formulate conditions of the call to be "suitable for tail
>>> call". On x86 it's the following:
>>> 1. The calling conventions of b and c are "compatible". There are  
>>> two
>>> scenarios: caller cleans the stack (C calling convention), callee
>>> cleans
>>> the stack (stdcall calling convention). The call is not suitable for
>>> tail call, if b and c cleans the stack differently. For example,
>>> let b
>>> has C CC and c - stdcall CC. In this case making tail call  
>>> results to
>>> double cleaning (on in a and one in c), which is incorrect.
>>> 2. The same situtation is for struct return functions: the caller
>>> pushes
>>> extra pointer to return value, but callee is required to clean this
>>> pointer out of the stack. So, call b->c is "suitable for tail call"
>>> iff
>>> both b and c are struct return.
>>> 3. Various tricky cases with FP. I don't remeber correctly, but  
>>> there
>>> can be some problems with functions returning FP values.
>>> 4. PIC. %ebx should be live in order to make a call via PLT.
>>> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
>>> calls only to functions within the same module (PLT is not  
>>> required).
>>> 5. %ecx is livein for regparm(3) functions.
>>> ... (maybe something I forget)
>>>
>>> --
>>> WBR, Anton Korobeynikov
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070813/0ad00976/attachment.html>