[llvm-commits] PATCH: stronger tail recursion elimination with nocapture

Tue Oct 16 11:33:19 PDT 2012

Hi Nick,

On 16/10/12 00:55, Nick Lewycky wrote:
> Ping! Surely somebody on llvm-commits remembers how tail recursion elimination
> works, right?

well, I'll give it a go!

> --- lib/Transforms/Scalar/TailRecursionElimination.cpp	(revision 165437)
> +++ lib/Transforms/Scalar/TailRecursionElimination.cpp	(working copy)
> @@ -69,6 +69,7 @@
>  #include "llvm/Support/CFG.h"
>  #include "llvm/Support/Debug.h"
>  #include "llvm/Support/raw_ostream.h"
> +#include "llvm/Support/ValueHandle.h"
>  #include "llvm/ADT/Statistic.h"
>  #include "llvm/ADT/STLExtras.h"
>  using namespace llvm;
> @@ -117,35 +118,52 @@
>    return new TailCallElim();
>  }
>
> -/// AllocaMightEscapeToCalls - Return true if this alloca may be accessed by
> -/// callees of this function.  We only do very simple analysis right now, this
> -/// could be expanded in the future to use mod/ref information for particular
> -/// call sites if desired.
> -static bool AllocaMightEscapeToCalls(AllocaInst *AI) {
> -  // FIXME: do simple 'address taken' analysis.
> -  return true;
> -}
> -
> -/// CheckForEscapingAllocas - Scan the specified basic block for alloca
> -/// instructions.  If it contains any that might be accessed by calls, return
> -/// true.
> -static bool CheckForEscapingAllocas(BasicBlock *BB,
> -                                    bool &CannotTCETailMarkedCall) {
> -  bool RetVal = false;
> +/// CanTRE - Scan the specified basic block for alloca instructions.
> +/// If it contains any that are variable-sized or not in the entry block,
> +/// returns false.
> +static bool CanTRE(BasicBlock *BB) {
> +  // Because of PR962, we don't TRE allocas outside the entry block.
>    for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>      if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
> -      RetVal |= AllocaMightEscapeToCalls(AI);
> -
>        // If this alloca is in the body of the function, or if it is a variable
>        // sized allocation, we cannot tail call eliminate calls marked 'tail'
>        // with this mechanism.
>        if (BB != &BB->getParent()->getEntryBlock() ||
>            !isa<ConstantInt>(AI->getArraySize()))
> -        CannotTCETailMarkedCall = true;
> +        return false;
>      }
> -  return RetVal;
> +  return true;
>  }
>
> +// FIXME: This could be dominator based. We can only not mark calls tail if
> +// they are dominated by a capture.

While I understand what you have in mind, this is not literally correct since
you there must not be any path from the entry block to here that sees a capture,
which is not the same as being dominated by a capture.

> +struct AllocaCaptureTracker : public CaptureTracker {
> +  AllocaCaptureTracker() : Captured(false) {}
> +
> +  void tooManyUses() { Captured = true; }
> +
> +  bool shouldExplore(Use *U) {
> +    Value *V = U->get();
> +    for (Value::use_iterator I = V->use_begin(), E = V->use_end(); I != E; ++I)
> +      if (isa<CallInst>(*I) || isa<InvokeInst>(*I))
> +        UsesAlloca.push_back(*I);

Since shouldExplore will be called for all uses, why iterate over uses?  I mean,
can't you just do

     if (isa<CallInst>(U->getUser()) || isa<InvokeInst>(U->getUser()))
       UsesAlloca.push_back(U->getUser());

or something like that?  As it is the behaviour seems quadratic (though capped,
because capture tracking won't call you if there are too many uses), and also
won't you potentially push the same instructions into UsesAlloca many times?

> +
> +    return true;
> +  }
> +
> +  bool captured(Use *U) {
> +    if (isa<ReturnInst>(U->getUser()))
> +      return false;
> +
> +    Captured = true;
> +    return true;
> +  }
> +
> +  SmallVector<WeakVH, 64> UsesAlloca;
> +
> +  bool Captured;
> +};
> +
>  bool TailCallElim::runOnFunction(Function &F) {
>    // If this function is a varargs function, we won't be able to PHI the args
>    // right, so don't even try to convert it...
> @@ -157,38 +175,41 @@
>    bool MadeChange = false;
>    bool FunctionContainsEscapingAllocas = false;
>
> -  // CannotTCETailMarkedCall - If true, we cannot perform TCE on tail calls
> +  // Find calls that can be marked tail.
> +  AllocaCaptureTracker ACT;
> +  for (Function::iterator BB = F.begin(), EE = F.end(); BB != EE; ++BB) {
> +    for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
> +      if (isa<AllocaInst>(I)) {

Why not move the CanTRE test here?  It just loops over basic blocks, looking
for allocas, and bails out if not in the entry block (or a variable sized
alloca).  Well, here you are already looping over basic blocks, you just found
an alloca, so you apply that test here.  Right now you do capture tracking for
alloca's outside the entry block even though you are going do bail out later
when the CanTRE test notices that such allocas exist.

> +        PointerMayBeCaptured(I, &ACT);
> +        if (ACT.Captured)
> +          return false;
> +      }
> +    }
> +  }
> +
...

Otherwise LGTM.

Ciao, Duncan.