[LLVMdev] RFC: Tail call optimization X86

Fri Oct 5 02:42:44 PDT 2007

Hi Evan,
I incoporated the changes you request but to the following i have got  
a question:

> Also, moving the option
> there will allow us to change fastcc ABI (callee popping arguments)
> only when this option is on. See Chris' email:

I am not to sure on that. because that would make modules compiled  
with the flag on incompatible with ones compiled without the flag off  
as stack behaviour would mismatch.
It would be no problem to make the behaviour dependent on the -tail- 
call-opt flag. i am not sure that this is a good idea?

pseudocode:

module 1: with -tailcallopt enabled
fastcc int callee(int arg1, int arg2) {
...
-> onreturn: pops 8 byte
}

module 2: no -tailcallopt
fastcc int caller() {
	int x= call fastcc callee();
	//!! caller pops the arguments => stack mismatch

callee pops the arguments but caller also wants to pop the arguments  
of the stack

Apparently i forgot to send the answer email to chris reponse. sorry  
for that.
>
>> Hmmm. Ok. So this is due to X86CallingConv.td changes? Unfortunately
>> that's not controlled by options. Ok then.
>>
>
> Sure it can be, you can set up custom predicates, for example the
> X86CallingConv.td file has:
>
> class CCIfSubtarget<string F, CCAction A>
>   : CCIf<!strconcat("State.getTarget().getSubtarget<X86Subtarget> 
> ().", F), A>;
>
> It would be straight-forward to have a CCIf defined to check some  
> command
> line argument.  I think enabling this as llcbeta for a few nights  
> makes
> sense before turning it on by default.
No not directly. The code related to "caller/callee cleans arguments  
off the stack" is not controlled by the .td. It's controlled in code  
by the operands of CALLSEQ_END.

for example in SDOperand X86TargetLowering::LowerCCCCallTo:
...
   if (CC == CallingConv::X86_StdCall || CC == CallingConv::Fast) {
     if (isVarArg)
       NumBytesForCalleeToPush = isSRet ? 4 : 0;
     else
       NumBytesForCalleeToPush = NumBytes;
     assert(!(isVarArg && CC==CallingConv::Fast) &&
             "CallingConv::Fast does not support varargs.");
   } else {
     // If this is is a call to a struct-return function, the callee
     // pops the hidden struct pointer, so we have to push it back.
     // This is common for Darwin/X86, Linux & Mingw32 targets.
     NumBytesForCalleeToPush = isSRet ? 4 : 0;
   }

   NodeTys = DAG.getVTList(MVT::Other, MVT::Flag);
   Ops.clear();
   Ops.push_back(Chain);
   Ops.push_back(DAG.getConstant(NumBytes, getPointerTy()));
   Ops.push_back(DAG.getConstant(NumBytesForCalleeToPush, getPointerTy 
()));
   Ops.push_back(InFlag);
   Chain = DAG.getNode(ISD::CALLSEQ_END, NodeTys, &Ops[0], Ops.size());
   InFlag = Chain.getValue(1);

   The third operand is the number of bytes the callee pops of the  
stack on return (on x86). This gets lowered to a ADJCALLSTACKUP  
pseudo machineinstruction.

Later when X86RegisterInfo::eliminateCallFramePseudoInstr is called  
and framepointerelimination is enabled the following code gets called:
...
   else if (I->getOpcode() == X86::ADJCALLSTACKUP) {
     // If we are performing frame pointer elimination and if the  
callee pops
     // something off the stack pointer, add it back.  We do this  
until we have
     // more advanced stack pointer tracking ability.
     if (uint64_t CalleeAmt = I->getOperand(1).getImm()) {
       unsigned Opc = (CalleeAmt < 128) ?
         (Is64Bit ? X86::SUB64ri8 : X86::SUB32ri8) :
         (Is64Bit ? X86::SUB64ri32 : X86::SUB32ri);
       MachineInstr *New =
         BuildMI(TII.get(Opc), StackPtr).addReg(StackPtr).addImm 
(CalleeAmt);
       MBB.insert(I, New);
     }
   }

I am not sure about a command line switch would toggling the stack  
adjusting behaviour of a function. Because if the function is called  
from a different module which was compiled with the opposite command  
line switch all hell would break loose (because it assumes callee  
pops arguments when it does not).