[PATCH] InlineFunction doesn't update InlineFunctionInfo with allocas created for byval arguments

Julien Lerouge jlerouge at apple.com
Thu Apr 10 14:22:46 PDT 2014


On Tue, Apr 08, 2014 at 09:35:40PM -0700, Julien Lerouge wrote:
> On Tue, Apr 08, 2014 at 05:40:58PM -0700, Manman Ren wrote:
> > On Tue, Apr 8, 2014 at 3:18 PM, Julien Lerouge <jlerouge at apple.com> wrote:
> > 
> > > On Tue, Apr 08, 2014 at 02:26:52PM -0700, Julien Lerouge wrote:
> > > > On Tue, Apr 08, 2014 at 10:30:44AM -0700, Manman Ren wrote:
> > > > > On Fri, Apr 4, 2014 at 5:15 PM, Julien Lerouge <julien.lerouge at m4x.org
> > > >wrote:
> > > > >
> > > > > > When llvm::InlineFunction inlines a function that has byval
> > > arguments,
> > > > > > it creates allocas in the caller. Those allocas aren't inserted in
> > > the
> > > > > > InlineFunctionInfo data structure. So after inlining, if the client
> > > code
> > > > > > wants to know where are the allocas that were created, it will miss
> > > > > > those.
> > > > > >
> > > > > > Should InlineFunctionInfo contain these allocas, or is the omission
> > > > > > deliberate ?
> > > > > >
> > > > >
> > > > > Hi Julien,
> > > > >
> > > > > I don't think the omission is deliberate, but I may be wrong.
> > > > > Do you have a testing case?
> > > > >
> > > > > Thanks,
> > > > > Manman
> > > > >
> > > >
> > > > Yeah, sorry, I'll add one and resend the patch.
> > > >
> > > > Thanks,
> > > > Julien
> > > >
> > >
> > > Attached below. The test just inlines a function with byval and makes
> > > sure the lifetime marker gets created. Without the fix, no lifetime
> > > marker is created.
> > >
> > 
> > Hi Julien,
> > 
> > Thanks for the testing case. However I don't think it is the right fix.
> > llvm.lifetime intrinsics are optimization hints. With this patch, we get
> > define i32 @main(i32 %argc, i8** %argv) {
> > entry:
> >   %gFoo = alloca %struct.foo, align 8
> >   %tmp = bitcast %struct.foo* %gFoo to i8*
> >   %tmp1 = bitcast %struct.foo* @gFoo to i8*
> >   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp, i8* %tmp1, i64 ptrtoint
> > (%struct.foo* getelementptr (%struct.foo* null, i32 1) to i64), i32 1, i1
> > false)
> >   %0 = bitcast %struct.foo* %gFoo to i8*
> >   call void @llvm.lifetime.start(i64 -1, i8* %0)
> >   %a1.i = getelementptr inbounds %struct.foo* %gFoo, i32 0, i32 1
> >   %arrayidx.i = getelementptr inbounds [16 x i32]* %a1.i, i32 0, i32 %argc
> >   %tmp2.i = load i32* %arrayidx.i, align 1
> >   %1 = bitcast %struct.foo* %gFoo to i8*
> >   call void @llvm.lifetime.end(i64 -1, i8* %1)
> >   ret i32 %tmp2.i
> > }
> > 
> > lifetime.start after memcpy seems to be wrong because %gFoo is already live
> > at memcpy.
> > Do you see a performance issue or you happen to notice this when browsing
> > the code?
> 
> Ah, good catch. Thanks. I don't have any performance issue no. I just
> have some code that uses the InlineInfo and noticed the byval allocas
> weren't in there.
> 
> When I looked into writing a test case, the lifetime markers looked like
> one of the only real users of the InlineInfo, so that's why the test
> case is checking that. Unfortunately, I didn't verify the markers were
> at the right place.
> 
> I'll check if it's an easy fix, and if not, I'll just change my code to
> not assume the byval allocas are accounted for.
> 

Alright, attached a new version. It involves splitting the byval
handling so that we inject the memcopy after the code has been cloned,
in the first new block. By doing so, the lifetime markers insertion
doesn't need to change.

Hope this helps,
Julien

-- 
Julien Lerouge
PGP Key Id: 0xB1964A62
PGP Fingerprint: 392D 4BAD DB8B CE7F 4E5F FA3C 62DB 4AA7 B196 4A62
PGP Public Key from: keyserver.pgp.com
-------------- next part --------------
commit 681e61c7ca24b7095c4d8cbba5ce8a97fdc32921
Author: Julien Lerouge <jlerouge at apple.com>
Date:   Wed Apr 2 17:07:18 2014 -0700

    Keep byval allocas.

diff --git a/lib/Transforms/Utils/InlineFunction.cpp b/lib/Transforms/Utils/InlineFunction.cpp
index 86def3e..9c38088 100644
--- a/lib/Transforms/Utils/InlineFunction.cpp
+++ b/lib/Transforms/Utils/InlineFunction.cpp
@@ -322,6 +322,36 @@ static void UpdateCallGraphAfterInlining(CallSite CS,
   CallerNode->removeCallEdgeFor(CS);
 }
 
+static void HandleByValArgumentInit(Value *Dst, Value *Src, Module *M,
+                                    BasicBlock *InsertBlock,
+                                    InlineFunctionInfo &IFI) {
+  LLVMContext &Context = Src->getContext();
+  Type *VoidPtrTy = Type::getInt8PtrTy(Context);
+  Type *AggTy = cast<PointerType>(Src->getType())->getElementType();
+  Type *Tys[3] = { VoidPtrTy, VoidPtrTy, Type::getInt64Ty(Context) };
+  Function *MemCpyFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);
+  IRBuilder<> builder(InsertBlock->begin());
+  Value *DstCast = builder.CreateBitCast(Dst, VoidPtrTy, "tmp");
+  Value *SrcCast = builder.CreateBitCast(Src, VoidPtrTy, "tmp");
+
+  Value *Size;
+  if (IFI.DL == 0)
+    Size = ConstantExpr::getSizeOf(AggTy);
+  else
+    Size = ConstantInt::get(Type::getInt64Ty(Context),
+                            IFI.DL->getTypeStoreSize(AggTy));
+
+  // Always generate a memcpy of alignment 1 here because we don't know
+  // the alignment of the src pointer.  Other optimizations can infer
+  // better alignment.
+  Value *CallArgs[] = {
+    DstCast, SrcCast, Size,
+    ConstantInt::get(Type::getInt32Ty(Context), 1),
+    ConstantInt::getFalse(Context) // isVolatile
+  };
+  builder.CreateCall(MemCpyFn, CallArgs);
+}
+
 /// HandleByValArgument - When inlining a call site that has a byval argument,
 /// we have to make the implicit memcpy explicit by adding it.
 static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
@@ -349,11 +379,7 @@ static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
     // Otherwise, we have to make a memcpy to get a safe alignment.  This is bad
     // for code quality, but rarely happens and is required for correctness.
   }
-  
-  LLVMContext &Context = Arg->getContext();
 
-  Type *VoidPtrTy = Type::getInt8PtrTy(Context);
-  
   // Create the alloca.  If we have DataLayout, use nice alignment.
   unsigned Align = 1;
   if (IFI.DL)
@@ -366,33 +392,10 @@ static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
   
   Function *Caller = TheCall->getParent()->getParent(); 
   
-  Value *NewAlloca = new AllocaInst(AggTy, 0, Align, Arg->getName(), 
+  AllocaInst *NewAlloca = new AllocaInst(AggTy, 0, Align, Arg->getName(),
                                     &*Caller->begin()->begin());
-  // Emit a memcpy.
-  Type *Tys[3] = {VoidPtrTy, VoidPtrTy, Type::getInt64Ty(Context)};
-  Function *MemCpyFn = Intrinsic::getDeclaration(Caller->getParent(),
-                                                 Intrinsic::memcpy, 
-                                                 Tys);
-  Value *DestCast = new BitCastInst(NewAlloca, VoidPtrTy, "tmp", TheCall);
-  Value *SrcCast = new BitCastInst(Arg, VoidPtrTy, "tmp", TheCall);
-  
-  Value *Size;
-  if (IFI.DL == 0)
-    Size = ConstantExpr::getSizeOf(AggTy);
-  else
-    Size = ConstantInt::get(Type::getInt64Ty(Context),
-                            IFI.DL->getTypeStoreSize(AggTy));
-  
-  // Always generate a memcpy of alignment 1 here because we don't know
-  // the alignment of the src pointer.  Other optimizations can infer
-  // better alignment.
-  Value *CallArgs[] = {
-    DestCast, SrcCast, Size,
-    ConstantInt::get(Type::getInt32Ty(Context), 1),
-    ConstantInt::getFalse(Context) // isVolatile
-  };
-  IRBuilder<>(TheCall).CreateCall(MemCpyFn, CallArgs);
-  
+  IFI.StaticAllocas.push_back(NewAlloca);
+
   // Uses of the argument in the function should use our new alloca
   // instead.
   return NewAlloca;
@@ -562,6 +565,8 @@ bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,
 
   { // Scope to destroy VMap after cloning.
     ValueToValueMapTy VMap;
+    // Keep a list of pair (dst, src) to emit byval initializations.
+    SmallVector<std::pair<Value*, Value*>, 4> ByValInit;
 
     assert(CalledFunc->arg_size() == CS.arg_size() &&
            "No varargs calls can be inlined!");
@@ -585,7 +590,11 @@ bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,
         // Calls that we inline may use the new alloca, so we need to clear
         // their 'tail' flags if HandleByValArgument introduced a new alloca and
         // the callee has calls.
-        MustClearTailCallFlags |= ActualArg != *AI;
+        if (ActualArg != *AI) {
+          MustClearTailCallFlags = true;
+          ByValInit.push_back(std::make_pair(ActualArg, (Value*) *AI));
+        }
+
       }
 
       VMap[I] = ActualArg;
@@ -602,6 +611,13 @@ bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,
     // Remember the first block that is newly cloned over.
     FirstNewBlock = LastBlock; ++FirstNewBlock;
 
+    // The lifetime start markers will be added at the beginning of the first
+    // new block, so that's where we also inject the byval arguments
+    // initialization.
+    for (std::pair<Value*, Value*> &Init : ByValInit)
+      HandleByValArgumentInit(Init.first, Init.second, Caller->getParent(),
+                              FirstNewBlock, IFI);
+
     // Update the callgraph if requested.
     if (IFI.CG)
       UpdateCallGraphAfterInlining(CS, FirstNewBlock, VMap, IFI);
diff --git a/test/Transforms/Inline/2010-05-31-ByvalTailcall.ll b/test/Transforms/Inline/2010-05-31-ByvalTailcall.ll
index b37b9f2..07ea5fc 100644
--- a/test/Transforms/Inline/2010-05-31-ByvalTailcall.ll
+++ b/test/Transforms/Inline/2010-05-31-ByvalTailcall.ll
@@ -18,7 +18,8 @@ define void @bar(i32* byval %x) {
 
 define void @foo(i32* %x) {
 ; CHECK-LABEL: define void @foo(
-; CHECK: store i32 %1, i32* %x
+; CHECK: llvm.lifetime.start
+; CHECK: store i32 %2, i32* %x
   call void @bar(i32* byval %x)
   ret void
 }
diff --git a/test/Transforms/Inline/byval_lifetime.ll b/test/Transforms/Inline/byval_lifetime.ll
new file mode 100644
index 0000000..e8dff2a
--- /dev/null
+++ b/test/Transforms/Inline/byval_lifetime.ll
@@ -0,0 +1,26 @@
+; RUN: opt -S -inline < %s | FileCheck %s
+; END.
+
+; By inlining foo, an alloca is created in main to hold the byval argument, so
+; a lifetime marker should be generated as well by default.
+
+%struct.foo = type { i32, [16 x i32] }
+
+ at gFoo = global %struct.foo zeroinitializer, align 8
+
+define i32 @foo(%struct.foo* byval align 8 %f, i32 %a) {
+entry:
+  %a1 = getelementptr inbounds %struct.foo* %f, i32 0, i32 1
+  %arrayidx = getelementptr inbounds [16 x i32]* %a1, i32 0, i32 %a
+  %tmp2 = load i32* %arrayidx, align 1
+  ret i32 %tmp2
+}
+
+define i32 @main(i32 %argc, i8** %argv) {
+; CHECK-LABEL: @main
+; CHECK: llvm.lifetime.start
+; CHECK: memcpy
+entry:
+  %call = call i32 @foo(%struct.foo* byval align 8 @gFoo, i32 %argc)
+  ret i32 %call
+}


More information about the llvm-commits mailing list