[LLVMdev] RFC: Tail call optimization X86

Evan Cheng evan.cheng at apple.com
Thu Oct 4 16:29:22 PDT 2007


Comments:

CheckDAGForTailCallsAndFixThem -
1.
   for (SelectionDAG::allnodes_iterator BE = DAG.allnodes_begin(),
+         BI = prior(DAG.allnodes_end()); BI != BE; BI--) {
Please use pre-decrement instead of post-decrement.

2. The function is slower than it should be. You are scanning all the  
nodes in the DAG twice. You should just examine DAG.getRoot() to make  
determine whether it's a return or not.

@@ -4618,6 +4662,12 @@
    // Lower the terminator after the copies are emitted.
    SDL.visit(*LLVMBB->getTerminator());

+  // check whether calls in this block are real tail calls fix up  
CALL nodes
+  // with correct tailcall attribute so that the target can rely on  
the tailcall
+  // attribute indicating whether the call is really eligible for  
tail call
+  // optimization
+  CheckDAGForTailCallsAndFixThem(DAG, TLI);
+

check -> Check to start a sentence. :-)


+    // Skip the RETADDR move area
+    X86MachineFunctionInfo *X86FI =  
MF.getInfo<X86MachineFunctionInfo>();
+    int32_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();

Why using int32_t instead of int in some of the places? Nothing  
"wrong" with it, just inconsistent.



+      // If there is an SUB32ri of ESP immediately before this  
instruction,
+      // merge the two. This can be the case when tail call  
elimination is
+      // enabled and the callee has more arguments then the caller
+      if (MBBI != MBB.begin()) {
+        MachineBasicBlock::iterator PI = prior(MBBI);
+        unsigned Opc = PI->getOpcode();
+        if ((Opc == X86::SUB64ri32 || Opc == X86::SUB64ri8 ||
+                    Opc == X86::SUB32ri || Opc == X86::SUB32ri8) &&

Weird indentation?

+/// CheckAndDeletePreceedingADD - looks at instruction before the  
passed
+/// instruction if it is an ADD instruction it is deleted and the  
size is
+/// returned
+static int32_t  
CheckAndDeletePreceedingADD(MachineBasicBlock::iterator MBBI,
+                                           MachineBasicBlock &MBB,
+                                           unsigned StackPtr) {
+  int32_t Offset = 0;
+  if (MBBI != MBB.begin()) {
+    MachineBasicBlock::iterator PI = prior(MBBI);
+    unsigned Opc = PI->getOpcode();
+    if ((Opc == X86::ADD64ri32 || Opc == X86::ADD64ri8 ||
+         Opc == X86::ADD32ri || Opc == X86::ADD32ri8) &&
+        PI->getOperand(0).getReg() == StackPtr){
+      Offset += PI->getOperand(2).getImm();
+      MBB.erase(PI);
+    }
+  }
+  return Offset;
+}
+

Seems like the previous chunk can be changed to use this function with  
some refactoring?

+
+static cl::opt<bool>
+    PerformTailCallOpt("tail-call-opt", cl::Hidden,
+               cl::desc("Turn on tail call optimization"));

Perhaps it's better to move this out of x86. Eventually we want to do  
tail call optimization for more targets. See include/llvm/Target/ 
TargetOptions.h, lib/Target/TargetMachine.cpp  Also, moving the option  
there will allow us to change fastcc ABI (callee popping arguments)  
only when this option is on. See Chris' email:

 > Sure it can be, you can set up custom predicates, for example the
 > X86CallingConv.td file has:
 >
 > class CCIfSubtarget<string F, CCAction A>
 >  : CCIf<! 
strconcat("State.getTarget().getSubtarget<X86Subtarget>().", F), A>;
 >
 > It would be straight-forward to have a CCIf defined to check some  
command
 > line argument.

+/// IsEligibleForTailCallElimination - Check to see whether the next  
instruction
+// following the call is a return. A function is eligible if caller/ 
callee
+// calling conventions match, currently only fastcc supports tail  
calls, and the
+// function CALL is immediatly followed by a RET
+bool X86TargetLowering::IsEligibleForTailCallOptimization(SDOperand  
Call,

/// vs. //

+  // copy from stack slots to stack slot of a tail called function.  
this needs
+  // to be done because if we would lower the arguments directly to  
their real
+  // stack slot we might end up overwriting each other
+  // TODO: to make this more efficient (sometimes saving a store/ 
load) we could
+  // analyse the arguments and emit this store/load/store sequence  
only for
+  // arguments which would be overwritten otherwise

Please capitalize "copy", "this", etc. Also please end sentences with  
proper punctuation. :-)

Overall, a very good patch. Please also incorporate the minor changes  
mentions in other emails (LLCBETAOPTION, -tail-call-opt-align-stack).  
I think it'll be ready for svn comit after one more iteration. :-)

Evan

On Oct 2, 2007, at 2:27 AM, Arnold Schwaighofer wrote:

> Hi all,
>
> I changed the code that checks whether a tail call is really  
> eligible for optimization so that it performs the check/fix in  
> SelectionDAGISel.cpp:BuildSelectionDAG() as suggest by Evan. Also  
> eliminated an error that caused the remaining failing test cases in  
> the test-suite.
>
> The results look very nice (on darwin x86, r42486).
> The same number (46) of failing test cases on patched version and  
> vanilla version. LCCBETA was enabled on both. I changed the LCCBETA  
> option in Makefile.programs in the patched version to include the  
> tail-call-opt flags so that the optimization code gets exercised.
>
> ifeq ($(ARCH),x86)
> LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call-opt- 
> align-stack
>
> On 26 Sep 2007, at 02:26, Chris Lattner wrote:
>> I think enabling this as llcbeta for a few nights makes
>> sense before turning it on by default.
>>
>> -Chris
>
> What does turning on by default mean? Does that mean it is shown in  
> llc -help or that it is performed every time (without requesting via  
> the command line) when compiling?
>
> I would not do the latter since tail call optimization clears the  
> stack frame of the caller (sequence), possibly causing confusion  
> when debugging also resulting code is probably not faster (since  
> arguments are lowered to regular stack slot -where they would be if  
> the call was a normal call - first and then moved to the real stack  
> slot - onto the callers parameters. As noted in the source file this  
> might be optimized in many cases by looking whether the actual  
> parameters come from the caller function parameters and would be  
> overwritten if they are not moved. In cases where they would not be  
> overwritten they could be directly lowered to the real stack slot).
>
> regards arnold
>
>
> <tailcall-r42525- 
> src.patch>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list