[LLVMdev] [PATCH] Split init.trampoline into init.trampoline & adjust.trampoline

Wed Aug 31 08:56:31 PDT 2011

Hi Sanjoy, the first and last patches look good (except that you didn't add any
tests for the auto-upgrade functionality).  Comments on the other two below.

> Attached patches split init.trampoline into adjust.trampoline and
> init.trampoline, like in gcc.
>
> As mentioned in the previous mail, I've not made a documentation
> patch, since I'm not sure about what the documented semantics of
> llvm.adjust.trampoline should be.

Here's what the GCC docs say:

@hook TARGET_TRAMPOLINE_ADJUST_ADDRESS
This hook should perform any machine-specific adjustment in
the address of the trampoline.  Its argument contains the address of the
memory block that was passed to @code{TARGET_TRAMPOLINE_INIT}.  In case
the address to be used for a function call should be different from the
address at which the template was stored, the different address should
be returned; otherwise @var{addr} should be returned unchanged.
If this hook is not defined, @var{addr} will be used for function calls.
@end deftypefn

Perhaps you can come up with something based on that.  When I initially
implemented the trampoline stuff I probably documented adjust_trampoline
(which was later removed), so maybe you could also dig that up.  By the
way an example of adjust_trampoline is ARM, which or's a 1 into the address
of the trampoline.  When the pointer is called the processor sees the 1 and
puts itself into thumb mode.

> --- a/lib/Transforms/InstCombine/InstCombineCalls.cpp
> +++ b/lib/Transforms/InstCombine/InstCombineCalls.cpp
> @@ -12,7 +12,6 @@
>  //===----------------------------------------------------------------------===//
>
>  #include "InstCombine.h"
> -#include "llvm/IntrinsicInst.h"
>  #include "llvm/Support/CallSite.h"
>  #include "llvm/Target/TargetData.h"
>  #include "llvm/Analysis/MemoryBuiltins.h"
> @@ -821,6 +820,93 @@ Instruction *InstCombiner::tryOptimizeCall(CallInst *CI, const TargetData *TD) {
>    return Simplifier.NewInstruction;
>  }
>
> +// Given a call to llvm.adjust.trampoline, find the corresponding call to
> +// llvm.init.trampoline.

I think you should also say that it only returns a non-null value if the
transform is safe to do.  It's not just a question of finding the init.tramp
call.

> +//
> +static IntrinsicInst *FindInitTrampoline(Value *Callee) {
> +  Callee = Callee->stripPointerCasts();
> +
> +  // This technique does not work (directly) if the trampoline has been
> +  // bitcasted even once.

I find this obscure.  Especially as the trampoline is almost always bitcast
before being passed as a pointer to init.trampoline and adjust.trampoline.
For example like it is in the testcase for this feature.

   However, using -instcombine, -gvn, -instcombine
> +  // (hence with -O2 and above) usually works, even with multiple bitcasts.  It
> +  // probably is not worthwhile to-reduplicate the work here in the unlikely
> +  // case of the code being compiled with only the InstCombine pass enabled.

I don't think this comment (including the bit before) should exist.  Passes rely
on other passes have put things in decent shape all the time, and don't bother
to say so.

> +
> +  // First check if this instruction is actually an llvm.adjust.trampoline
> +  IntrinsicInst *AdjustTramp = dyn_cast<IntrinsicInst>(Callee);
> +  if (!AdjustTramp ||
> +      AdjustTramp->getIntrinsicID() != Intrinsic::adjust_trampoline)
> +    return 0;
> +
> +  // The first argument passed to adjust_trampoline is the trampoline argument.
> +  // Get a hold of this.
> +  Value *TrampMem = AdjustTramp->getOperand(0);
> +
> +  // The first thing to check is if there is a direct path from a
> +  // init.trampoline instruction to this adjust.trampoline instruction (both
> +  // operating on the same trampoline).  Of course, this is useless if there are
> +  // writes to the trampoline between the two instructions.
> +
> +  // We look at all the instructions before the adjust.trampoline in its basic
> +  // block.
> +
> +  BasicBlock::iterator InstIterator = AdjustTramp,
> +    InstBegin = AdjustTramp->getParent()->begin();
> +  IntrinsicInst *In = NULL;

Why is In defined here and not locally inside the loop below?

> +
> +  InstIterator--;

What if AdjustTramp was the first instruction in the basic block?  I think you
should first compare InstIterator with InstBegin and only after decrement the
iterator.

> +
> +  while(InstIterator != InstBegin) {
> +    // Check if we got lucky.
> +    In = dyn_cast<IntrinsicInst>(InstIterator);
> +    if (In && In->getIntrinsicID() == Intrinsic::init_trampoline &&
> +        In->getOperand(0) == TrampMem)
> +      return In;
> +
> +    // If there is a write to the trampoline, then the sifting through previous
> +    // instructions is useless.
> +    if (InstIterator->mayWriteToMemory()) {

Here you should just always give up.  After all, the trampoline memory might be
a global variable, or a pointer to it might have been squirrelled away somewhere
so any call might modify the memory without using TrampMem explicitly.  Not to
mention that TrampMem is probably a bitcast of the actual memory (eg: alloca),
and instructions might be writing to the alloca directly rather than via this
bitcast.  And so on.  You can't do better without making use of alias analysis,
and you're not allowed to do that in instcombine.

> +      for (unsigned i = 0; i < InstIterator->getNumOperands(); i++)
> +        if (InstIterator->getOperand(i) == TrampMem)
> +          break;
> +    }
> +
> +    InstIterator--;
> +  }
> +
> +  // We can also check for the case where the trampoline memory is straight out
> +  // of an alloca.  Since we now know the origin of the memory, we can simply
> +  // count its uses.
> +
> +  GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(TrampMem);

I think you should also handle bitcast here.  You can do it like this:
   Value *Underlying = TrampMem->stripPointerCasts();
   if (!Underlying->hasOneUse() || *Underlying.use_begin() != TrampMem)
     return 0;

> +  if (GEPI && GEPI->hasOneUse()) {
> +    AllocaInst *AI = dyn_cast<AllocaInst>(GEPI->getOperand(0));
> +    if (AI == NULL || !AI->hasOneUse())
> +      return 0;
> +  } else {
> +    return 0;
> +  }

You should do early returns if possible:
   if (!AI)
     return 0;

> +
> +  if (!TrampMem->hasNUses(2))
> +    return 0;

You should do this test earlier, because if it fails you didn't need to bother
with the GEP/bitcast/alloca logic.

Also, I'm worried that the case of multiple adjust.trampoline calls won't be
handled properly.  If adjust.trampoline was declared IntrNoMem then probably
GVN would unify them (depends on dominance whether this is possible).  But it
isn't declared like that (but maybe it should be) which makes things harder
for GVN.  So I reckon you should maybe allow TrampMem to have lots of uses,
and check that they are all calls to adjust.trampoline or init.trampoline
(at most one of these).  [Having lots of adjust.trampoline calls in the
original bitcode is the common case].

> +
> +  // At this point, there are be two possibilities: either init.trampoline was

are be -> are

> +  // called before adjust.trampoline or after it.

This reads oddly because you don't even know that there is an init.trampoline
call yet.

   The first case can be
> +  // correctly optimized.  The second case has undefined behavior, so we
> +  // optimize it anyway.
> +
> +  Value::use_iterator I = TrampMem->use_begin();
> +  if (*I == AdjustTramp) // We don't need *this* use.
> +    ++I;
> +
> +  In = dyn_cast<IntrinsicInst>(*I);
> +  if (In && In->getIntrinsicID() == Intrinsic::init_trampoline)
> +    return In;

You also have to check that TrampMem is used as the correct argument to
init.trampoline, since a priori some evil person may be using it as the
second or third argument, rather than the first!

Finally I think the alloca case should come before the "hunt back in the
basic block" logic, because it is the most common.

> +
> +  // Could not find the init.trampoline call.
> +  return 0;
> +}
> +
>  // visitCallSite - Improvements for call and invoke instructions.
>  //
>  Instruction *InstCombiner::visitCallSite(CallSite CS) {

> --- a/lib/VMCore/AutoUpgrade.cpp
> +++ b/lib/VMCore/AutoUpgrade.cpp
> @@ -43,6 +43,32 @@ static bool UpgradeIntrinsicFunction1(Function *F, Function *&NewFn) {
>
>    switch (Name[0]) {
>    default: break;
> +  case 'i':
> +    //  This upgrades the old llvm.init.trampoline to the new
> +    //  llvm.init.trampoline and llvm.adjust.trampoline pair.
> +    if (Name == "init.trampoline") {
> +      Type *ReturnTy = FTy->getReturnType();
> +
> +      // The new llvm.init.trampoline returns nothing.
> +      if (ReturnTy->getTypeID() == Type::VoidTyID)

Simpler:

   if (ReturnTy->isVoidTy())

> +        break;
> +
> +      assert(ReturnTy->isPointerTy () &&
> +             "old init.trampoline returns a pointer");

Is this worth checking?  If not (I think not) you can get rid of the variable
ReturnTy altogether.

> +      assert(FTy->getNumParams() == 3 && "old init.trampoline takes 3 args!");
> +
> +      std::string NameTmp = F->getName();
> +      F->setName("");
> +
> +      //  Like in llvm.prefetch below, change the name of the old intrinsic so
> +      //  that we can play with its type.

I don't think you should mention llvm.prefetch.  It's like using a "goto": the
documentation is jumping to some place far away then back!  If someone changes
the llvm.prefetch logic one day your comment will get stale.  I suggest you just
remove "Like in llvm.prefetch below, ".

> +      NewFn = cast<Function>(M->getOrInsertFunction(
> +                               NameTmp,
> +                               Type::getVoidTy(M->getContext()),
> +                               FTy->getParamType(0), FTy->getParamType(1),
> +                               FTy->getParamType(2), (Type *)0));
> +      return true;
> +    }
>    case 'p':
>      //  This upgrades the llvm.prefetch intrinsic to accept one more parameter,
>      //  which is a instruction / data cache identifier. The old version only
> @@ -216,6 +242,34 @@ void llvm::UpgradeIntrinsicCall(CallInst *CI, Function *NewFn) {
>      CI->eraseFromParent();
>      break;
>    }
> +  case Intrinsic::init_trampoline: {
> +
> +    //  Transform
> +    //    %tramp = call i8* llvm.init.trampoline (i8* x, i8* y, i8* z)
> +    //  to
> +    //    call void llvm.init.trampoline (i8* %x, i8* %y, i8* %z)
> +    //    %tramp = call i8* llvm.adjust.trampoline (i8* %x)
> +
> +    Type *PointerTy = NewFn->getFunctionType()->getParamType(0);
> +    Function *AdjustTrampolineFn =
> +      cast<Function>(F->getParent()->getOrInsertFunction(
> +                       "llvm.adjust.trampoline", PointerTy, PointerTy,
> +                       (Type *)0));

I think you should use Intrinsic::getDeclaration to get AdjustTrampolineFn.

> +
> +    IRBuilder<> Builder(C);
> +    Builder.SetInsertPoint(CI->getParent(), CI);

You can just use: Builder.SetInsertPoint(CI);

> +
> +    Builder.CreateCall3(NewFn, CI->getArgOperand(0), CI->getArgOperand(1),
> +                        CI->getArgOperand(2));
> +
> +    CallInst *AdjustCall = Builder.CreateCall(AdjustTrampolineFn,
> +                                              CI->getArgOperand(0),
> +                                              CI->getName());
> +    if (!CI->use_empty())
> +      CI->replaceAllUsesWith(AdjustCall);
> +    CI->eraseFromParent();
> +    break;
> +  }
>    }
>  }

Please don't forget to add an auto-upgrade testcase.

Ciao, Duncan.