[LLVMdev] Issues with the llvm.stackrestore intrinsic - now LoopRotation handling of alloca

Fri Feb 3 01:32:20 PST 2012

Hi,

I've tracked the first problem (mentioned in my previous email, quoted 
below) down further, ending up in the handling of alloca in 
LoopRotation.cpp (from trunk):

      // If the instruction's operands are invariant and it doesn't read
or write
      // memory, then it is safe to hoist.  Doing this doesn't change the
order of
      // execution in the preheader, but does prevent the instruction from
      // executing in each iteration of the loop.  This means it is safe
to hoist
      // something that might trap, but isn't safe to hoist something
that reads
      // memory (without proving that the loop doesn't write).
      if (L->hasLoopInvariantOperands(Inst) &&
          !Inst->mayReadFromMemory() && !Inst->mayWriteToMemory() &&
          !isa<TerminatorInst>(Inst) && !isa<DbgInfoIntrinsic>(Inst)) {
        Inst->moveBefore(LoopEntryBranch);
        continue;
      }

The above code happily moves an alloca instruction out of the loop, to
the new loop header. Shouldn't we also check on

   !isa<AllocaInst>(Inst)

before allowing to move it?

The code that breaks because of the moved alloca looks like

start:
   [...]
   br loopstart

loopbody:
   [use alloc'd mem]
   call stackrestore(%oldsp)
   br loopstart

loopstart:
    %oldsp = call stacksave()
   alloca
   [...]
   brcond loopbody, end

end:
   [use alloc'd mem]
   call stackrestore(%oldsp)

Then LoopRotation clones the stacksave from bb3 to bb1 and _moves_ the
alloca from bb3 to bb1. So the alloca is moved outside the loop instead
of having it execute each lap. But there still is a stackrestore inside
the loop that will "deallocate" the memory, so all but the first lap of
the loop will access deallocated memory.

So, shouldn't alloca instructions be cloned rather than moved? And maybe
there are other instructions with side effects that also should be cloned.

Patrik Hägglund wrote 2012-02-01 12:11:
> Hi,
>
> I have two problems regarding the llvm.stackrestore intrinsic. I'm
> running on 3.0, but a quick test on trunk also showed the same behavior.
>
> First problem:
> ---------------
> I have code like:
>
>     tmp1 = call llvm.stacksave()
>     tmp2 = alloca
>     [do some stuff with tmp2]
>     call llvm.stackrestore(tmp1)
>
>     [some other stuff]
>
>     tmp3 = call llvm.stacksave()
>     tmp4 = alloca
>     [do some stuff with tmp4]
>     call llvm.stackrestore(tmp3)
>
> Then some transformation rewrites this to
>
>     tmp1 = call llvm.stacksave()
>     tmp2 = alloca
>     [do some stuff with tmp2]
>     call llvm.stackrestore(tmp1)
>
>     [some other stuff]
>
>     tmp3 = call llvm.stacksave()
>     tmp4 = tmp2<----- Ops!!!
>     [do some stuff with tmp4]
>     call llvm.stackrestore(tmp3)
>
> Unfortunately the tmp2 pointer isn't valid after the first stackrestore,
> since the memory it's pointing at has in fact been deallocated by the
> intrinsic, so the uses of it through the variable tmp4 are wrong.
>
> Maybe some dependencies between alloca and the stackrestore instrinsic
> are missing or how should this work?
>
> In Intrinsics.td it says
>
> // Note: we treat stacksave/stackrestore as writemem because we don't
> otherwise
> // model their dependencies on allocas.
> def int_stacksave     : Intrinsic<[llvm_ptr_ty]>,
>                            GCCBuiltin<"__builtin_stack_save">;
> def int_stackrestore  : Intrinsic<[], [llvm_ptr_ty]>,
>                            GCCBuiltin<"__builtin_stack_restore">;
>
> Does "GCCBuiltin" imply "writemem", or how does the comment and the code
> correspond?
>
>
> Second problem:
> ---------------
> It seems that calls to stackrestore are removed if they are close to a
> ret instruction. I suppose the idea is that the function epilogue will
> restore the stack pointer anyway, so the llvm.stackrestore can be safely
> removed.
>
> However, in my target, the stack restoration in the epilogue is
> currently done by adding the used stacksize to the stackpointer, so I do
> indeed need the call to the stackrestore intrinsic to be left.
>
> Am I breaking some design rule by reading the stackpointer in the
> epilogue or how is this supposed to work?