[LLVMdev] RFC: Tail call optimization X86

Tue Sep 11 11:21:53 PDT 2007

Begin forwarded message:

> From: Evan Cheng <evan.cheng at apple.com>
> Date: 11 September 2007 19:26:39 GMT+02:00
> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] RFC: Tail call optimization X86
> Reply-To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>
> Hi Arnold,
>
> Thanks for the patch. Some questions and commons:
>
> 1. Have you test it against the llvm test suite?
No not yet.

> Does it work if fp
> elimination optimization is turned off?
For my test cases - yes.
> 2. Please follow llvm coding convention and make sure every line fits
> in 80 columns.
Okay dokey. i was looking through the files manually - must have  
missed some - sorry.
> 3.
> enum NameDecorationStyle {
>     None,
>     StdCall,
> -  FastCall
> +  FastCall,
> +  FastCC // the normal fastcc calling convention
> };
>
> Why is FastCC necessary? Can't you just use FastCall?

FastCall is used to indicate a function has x86_FastCall semantics if  
i am not mistaken.
I needed to differentiate between a normal fastcc and the  
x86_fastcall semantics in an older version
of my code. I no longer depend on that so it can be removed as you  
suggest. sorry for the code corpse :)
> 4.
> def X86tailcall: SDNode<"X86ISD::TAILCALL",     SDT_X86Call,
>                           [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL",     SDT_X86Call,
> +                        [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +
>
> Please use X86tailcall. It's not currently used so feel free to
> change its patterns, etc.
>

Okay dokey.

> 5.
> +// the following two instructions are used to adjust the stack  
> pointer
> +// in the case where the callee has more arguments than the caller
> +// an area is created where the return addr can be safely moved to
> +let isConvertibleToThreeAddress = 1 in {   // Can transform into LEA.
> +def TCADD32ri  : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> +                    "add{l}\t{$src2, $dst|$dst, $src2}",
> +                    []>;
> +}
> +
> +def TCSUB32ri  : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> +                    "sub{l}\t{$src2, $dst|$dst, $src2}",
> +                    []>;
> +
In an intermediate implementation (that did not work anyway) i need  
to differentiate
  between the normal ADD32ri/SUB32ri that were in the code and the stack
adjustments i added. I  no longer need them so i will remove them.  
again a code
corpse sorry for that.
> +//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
> +//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:
> $offset),
> +//                 "#TC_RETURN $dst $offset",
> +//                 []>;
> +

This node holds the stack adjustment delta and the jmp destination of  
the call.
The information is used in epilogue generation.
> Why are these needed? They don't look any different from normal add
> and subtraction instructions. Why not just use ADJCALLSTACKDOWN and
> ADJCALLSTACKUP?
>

> 6.
> +  } else if (RetOpcode ==  X86::TCRETURNdi) {
> +    // a tailcall adjust the stack
> ...
> +  } else if (RetOpcode == X86::TCRETURNri) {
> +    MBBI = prior(MBB.end());
>
> Seems like there are quite a bit of duplicate code between the 2
> cases. Please refactor.

Okay dokey
> 7.
> X86TargetLowering::X86TargetLowering(TargetMachine &TM)
>     : TargetLowering(TM) {
> +  IsLastCallTailCall = false;
>
> This is probably not a good idea. You are assuming nodes are lowered
> in certain order, that is dangerous. It's probably better to check
> whether the last call is a tail call on the fly as you are processing
> the return node.

Will do so.
> 8.
> -
> -  SDOperand Chain = Op.getOperand(0);
> -  SDOperand Flag;
> -
> -  // Copy the result values into the output registers.
> -  if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() ||
> -      RVLocs[0].getLocReg() != X86::ST0) {
> -    for (unsigned i = 0; i != RVLocs.size(); ++i) {
> -      CCValAssign &VA = RVLocs[i];
> -      assert(VA.isRegLoc() && "Can only return in registers!");
> -      Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand
> (i*2+1),
> -                               Flag);
> -      Flag = Chain.getValue(1);
> -    }
> +  if (IsLastCallTailCall) {
> +    IsLastCallTailCall = false;
> +    SDOperand TailCall = GetTailCall(Op);
> +    SDOperand TargetAddress = TailCall.getOperand(1);
> +    SDOperand StackAdjustment = TailCall.getOperand(2);
> +    assert ( ((TargetAddress.getOpcode() == ISD::Register &&
> +               cast<RegisterSDNode>(TargetAddress)->getReg() ==
> X86::ECX) ||
> +              TargetAddress.getOpcode() ==  
> ISD::TargetExternalSymbol ||
> +              TargetAddress.getOpcode() ==  
> ISD::TargetGlobalAddress) &&
> +             "Expecting an global address, external symbol, or
> register");
> +    assert( StackAdjustment.getOpcode() == ISD::Constant &&
> "Expecting a const value");
> +    // TODO: should we add flag from tail call as last operand?
> +    return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand
> (0), TargetAddress, StackAdjustment);
>     } else {
>
> Since if (IsLastCallTailCall) { ... } always ends with an early
> exist, you can just place the block before if (RVLocs.size() ...) and
> there is no need to change indentation for the rest of the function.
>

Okay
> 9.
> +// Check to see whether the next instruction following the call is a
> return
> +// A function is eligable if caller/callee calling conventions match
> and the
> +// function CALL is immediatly followed by a RET
> +bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand
> Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) {
> +  bool IsEligible = false;
> +  SDNode * CallNode = Call.Val;
> ...
>
> +SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op,
> +                                                     SelectionDAG  
> &DAG,
> +                                                     unsigned CC) {
> +  DOUT << "LowerX86_32FastCCCallTo\n";
> +  SDOperand Chain     = Op.getOperand(0);
> +  bool isVarArg       = cast<ConstantSDNode>(Op.getOperand(2))-
>> getValue() != 0;
> +  bool isTailCall     = cast<ConstantSDNode>(Op.getOperand(3))-
>> getValue() != 0;
> +  SDOperand Callee    = Op.getOperand(4);
> +  //unsigned NumOps     = (Op.getNumOperands() - 5) / 2;
> +
> +  // Analyze operands of the call, assigning locations to each  
> operand.
> +  SmallVector<CCValAssign, 16> ArgLocs;
> +  CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs);
> +  CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall);
> +  if (isTailCall &&
> +      IsEligibleForTailCallElimination(Op, DAG,CC, Callee) &&
> +      PerformTailCallOpt) {
>
>
> IsEligibleForTailCallElimination() should be a target hook. This way
> TargetLowering::LowerCallTo() can determine whether a call is
> eligible for tail call elimination and insert the current ISD::CALL
> node.
Okay

> X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
> handle non-tail calls.
>
I will then add code to LowerCCCCallTo() to check whether to use the  
normal
calling convention CC_X86_32_C (3 registers free for argument  
passing) or
the tail call (fastcc) calling convention CC_X86_32_TailCall (2  
register free).

Expect to see a new patch after i have done the testing with the  
whole test suite.

regards arnold