[llvm] r193356 - Inliner: Handle readonly attribute per argument when adding memcpy

Mon Nov 4 10:19:27 PST 2013

Disregard this patch.  Vincent pointed out to me on IRC that it will
still fail this case:

struct S {
    int field0;
    int field1;
};

void f(const Struct *SO, Struct *S1) {
    S1->field0 = S0->field1;
    S1->field1 = S0->field0;
}

void main() {
    Struct S;
    S.field0 = 0;
    S.field1 = 1;
    f(&S, &S);
}

I think we will need to use some kind of alias analysis for this.

-Tom

On Mon, Nov 04, 2013 at 04:23:20PM +0000, Stellard, Thomas wrote:
> Hi David,
> 
> How does this updated patch look?
> 
> -Tom
> ________________________________
> From: David Majnemer [david.majnemer at gmail.com]
> Sent: Sunday, November 03, 2013 4:38 AM
> To: Stellard, Thomas
> Cc: LLVM Commits
> Subject: Re: [llvm] r193356 - Inliner: Handle readonly attribute per argument when adding memcpy
> 
> This was reverted in r193955, it caused PR17781.
> 
> This introduces a regression if the caller byval argument was accessible through other means, such as being global.
> 
> If we stored against the global and then loaded from the argument, we *should* get the original value, not the modified one.
> 
> Simply checking that the argument is const is insufficient to ensure that eliding the copy is safe.
> 
> A reduced test case for this specific example is part of r193955 as well.
> 
> 
> 
> On Thu, Oct 24, 2013 at 9:38 AM, Tom Stellard <thomas.stellard at amd.com<mailto:thomas.stellard at amd.com>> wrote:
> Author: tstellar
> Date: Thu Oct 24 11:38:33 2013
> New Revision: 193356
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=193356&view=rev
> Log:
> Inliner: Handle readonly attribute per argument when adding memcpy
> 
> Patch by: Vincent Lejeune
> 
> Modified:
>     llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp
>     llvm/trunk/test/Transforms/Inline/byval.ll
> 
> Modified: llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp?rev=193356&r1=193355&r2=193356&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp (original)
> +++ llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp Thu Oct 24 11:38:33 2013
> @@ -337,33 +337,35 @@ static void UpdateCallGraphAfterInlining
> 
>  /// HandleByValArgument - When inlining a call site that has a byval argument,
>  /// we have to make the implicit memcpy explicit by adding it.
> -static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
> +static Value *HandleByValArgument(Value *PassedValue,
> +                                  const Argument *ArgumentSignature,
> +                                  Instruction *TheCall,
>                                    const Function *CalledFunc,
>                                    InlineFunctionInfo &IFI,
>                                    unsigned ByValAlignment) {
> -  Type *AggTy = cast<PointerType>(Arg->getType())->getElementType();
> +  Type *AggTy = cast<PointerType>(PassedValue->getType())->getElementType();
> 
>    // If the called function is readonly, then it could not mutate the caller's
>    // copy of the byval'd memory.  In this case, it is safe to elide the copy and
>    // temporary.
> -  if (CalledFunc->onlyReadsMemory()) {
> +  if (CalledFunc->onlyReadsMemory() || ArgumentSignature->onlyReadsMemory()) {
>      // If the byval argument has a specified alignment that is greater than the
>      // passed in pointer, then we either have to round up the input pointer or
>      // give up on this transformation.
>      if (ByValAlignment <= 1)  // 0 = unspecified, 1 = no particular alignment.
> -      return Arg;
> +      return PassedValue;
> 
>      // If the pointer is already known to be sufficiently aligned, or if we can
>      // round it up to a larger alignment, then we don't need a temporary.
> -    if (getOrEnforceKnownAlignment(Arg, ByValAlignment,
> +    if (getOrEnforceKnownAlignment(PassedValue, ByValAlignment,
>                                     IFI.TD<http://IFI.TD>) >= ByValAlignment)
> -      return Arg;
> +      return PassedValue;
> 
>      // Otherwise, we have to make a memcpy to get a safe alignment.  This is bad
>      // for code quality, but rarely happens and is required for correctness.
>    }
> 
> -  LLVMContext &Context = Arg->getContext();
> +  LLVMContext &Context = PassedValue->getContext();
> 
>    Type *VoidPtrTy = Type::getInt8PtrTy(Context);
> 
> @@ -379,7 +381,7 @@ static Value *HandleByValArgument(Value
> 
>    Function *Caller = TheCall->getParent()->getParent();
> 
> -  Value *NewAlloca = new AllocaInst(AggTy, 0, Align, Arg->getName(),
> +  Value *NewAlloca = new AllocaInst(AggTy, 0, Align, PassedValue->getName(),
>                                      &*Caller->begin()->begin());
>    // Emit a memcpy.
>    Type *Tys[3] = {VoidPtrTy, VoidPtrTy, Type::getInt64Ty(Context)};
> @@ -387,7 +389,7 @@ static Value *HandleByValArgument(Value
>                                                   Intrinsic::memcpy,
>                                                   Tys);
>    Value *DestCast = new BitCastInst(NewAlloca, VoidPtrTy, "tmp", TheCall);
> -  Value *SrcCast = new BitCastInst(Arg, VoidPtrTy, "tmp", TheCall);
> +  Value *SrcCast = new BitCastInst(PassedValue, VoidPtrTy, "tmp", TheCall);
> 
>    Value *Size;
>    if (IFI.TD<http://IFI.TD> == 0)
> @@ -588,13 +590,14 @@ bool llvm::InlineFunction(CallSite CS, I
>      for (Function::const_arg_iterator I = CalledFunc->arg_begin(),
>           E = CalledFunc->arg_end(); I != E; ++I, ++AI, ++ArgNo) {
>        Value *ActualArg = *AI;
> +      const Argument *Arg = I;
> 
>        // When byval arguments actually inlined, we need to make the copy implied
>        // by them explicit.  However, we don't do this if the callee is readonly
>        // or readnone, because the copy would be unneeded: the callee doesn't
>        // modify the struct.
>        if (CS.isByValArgument(ArgNo)) {
> -        ActualArg = HandleByValArgument(ActualArg, TheCall, CalledFunc, IFI,
> +        ActualArg = HandleByValArgument(ActualArg, Arg, TheCall, CalledFunc, IFI,
>                                          CalledFunc->getParamAlignment(ArgNo+1));
> 
>          // Calls that we inline may use the new alloca, so we need to clear
> 
> Modified: llvm/trunk/test/Transforms/Inline/byval.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/byval.ll?rev=193356&r1=193355&r2=193356&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/byval.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/byval.ll Thu Oct 24 11:38:33 2013
> @@ -25,7 +25,7 @@ entry:
>         store i64 2, i64* %tmp4, align 4
>         call void @f( %struct.ss* byval  %S ) nounwind
>         ret i32 0
> -; CHECK: @test1()
> +; CHECK-LABEL: @test1()
>  ; CHECK: %S1 = alloca %struct.ss
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK: call void @llvm.memcpy
> @@ -52,7 +52,7 @@ entry:
>         store i64 2, i64* %tmp4, align 4
>         %X = call i32 @f2( %struct.ss* byval  %S ) nounwind
>         ret i32 %X
> -; CHECK: @test2()
> +; CHECK-LABEL: @test2()
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK-NOT: call void @llvm.memcpy
>  ; CHECK: ret i32
> @@ -74,7 +74,7 @@ entry:
>         %S = alloca %struct.ss, align 1  ;; May not be aligned.
>         call void @f3( %struct.ss* byval align 64 %S) nounwind
>         ret void
> -; CHECK: @test3()
> +; CHECK-LABEL: @test3()
>  ; CHECK: %S1 = alloca %struct.ss, align 64
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK: call void @llvm.memcpy
> @@ -97,10 +97,35 @@ entry:
>         %S = alloca %struct.ss, align 2         ; <%struct.ss*> [#uses=4]
>         %X = call i32 @f4( %struct.ss* byval align 64 %S ) nounwind
>         ret i32 %X
> -; CHECK: @test4()
> +; CHECK-LABEL: @test4()
>  ; CHECK: %S = alloca %struct.ss, align 64
>  ; CHECK-NOT: call void @llvm.memcpy
>  ; CHECK: call void @g3
>  ; CHECK: ret i32 4
>  }
> 
> +; Inlining a byval struct should NOT cause an explicit copy
> +; into an alloca if the parameter is readonly
> +
> +define internal i32 @f5(%struct.ss* byval readonly %b) nounwind {
> +entry:
> +       %tmp = getelementptr %struct.ss* %b, i32 0, i32 0               ; <i32*> [#uses=2]
> +       %tmp1 = load i32* %tmp, align 4         ; <i32> [#uses=1]
> +       %tmp2 = add i32 %tmp1, 1                ; <i32> [#uses=1]
> +       ret i32 %tmp2
> +}
> +
> +define i32 @test5() nounwind  {
> +entry:
> +       %S = alloca %struct.ss          ; <%struct.ss*> [#uses=4]
> +       %tmp1 = getelementptr %struct.ss* %S, i32 0, i32 0              ; <i32*> [#uses=1]
> +       store i32 1, i32* %tmp1, align 8
> +       %tmp4 = getelementptr %struct.ss* %S, i32 0, i32 1              ; <i64*> [#uses=1]
> +       store i64 2, i64* %tmp4, align 4
> +       %X = call i32 @f5( %struct.ss* byval  %S ) nounwind
> +       ret i32 %X
> +; CHECK-LABEL: @test5()
> +; CHECK: %S = alloca %struct.ss
> +; CHECK-NOT: call void @llvm.memcpy
> +; CHECK: ret i32
> +}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

> From fce72dd4983c8fbcb3a031e854a624e3de3981cd Mon Sep 17 00:00:00 2001
> From: Tom Stellard <thomas.stellard at amd.com>
> Date: Thu, 24 Oct 2013 16:38:33 +0000
> Subject: [PATCH] Inliner: Handle readonly attribute per argument when adding
>  memcpy v2
> 
> Patch by: Vincent Lejeune
> 
> v2: Tom Stellard
>   - Add check for global values that are passed to the function as
>     a read-only argument and separately modified by the function.
> ---
>  lib/Transforms/Utils/InlineFunction.cpp | 35 +++++++++++++------
>  test/Transforms/Inline/byval.ll         | 61 ++++++++++++++++++++++++++++++---
>  2 files changed, 82 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/Transforms/Utils/InlineFunction.cpp b/lib/Transforms/Utils/InlineFunction.cpp
> index d021bce..4d9d486 100644
> --- a/lib/Transforms/Utils/InlineFunction.cpp
> +++ b/lib/Transforms/Utils/InlineFunction.cpp
> @@ -17,6 +17,7 @@
>  #include "llvm/ADT/StringExtras.h"
>  #include "llvm/Analysis/CallGraph.h"
>  #include "llvm/Analysis/InstructionSimplify.h"
> +#include "llvm/Analysis/ValueTracking.h"
>  #include "llvm/DebugInfo.h"
>  #include "llvm/IR/Attributes.h"
>  #include "llvm/IR/Constants.h"
> @@ -338,33 +339,46 @@ static void UpdateCallGraphAfterInlining(CallSite CS,
>  
>  /// HandleByValArgument - When inlining a call site that has a byval argument,
>  /// we have to make the implicit memcpy explicit by adding it.
> -static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
> +static Value *HandleByValArgument(Value *PassedValue,
> +                                  const Argument *ArgumentSignature,
> +                                  Instruction *TheCall,
>                                    const Function *CalledFunc,
>                                    InlineFunctionInfo &IFI,
>                                    unsigned ByValAlignment) {
> -  Type *AggTy = cast<PointerType>(Arg->getType())->getElementType();
> +  Type *AggTy = cast<PointerType>(PassedValue->getType())->getElementType();
>  
>    // If the called function is readonly, then it could not mutate the caller's
>    // copy of the byval'd memory.  In this case, it is safe to elide the copy and
>    // temporary.
> -  if (CalledFunc->onlyReadsMemory()) {
> +  // If the argument is readonly, then we don't need the copy either as long as
> +  // the value passed to the function is not global.  If a global value is passed
> +  // as a byval readonly function argument and that same global value is written
> +  // by the function, then we still need the copy, because the argument value
> +  // should be the original global value before it is modified.
> +  Value *UnderlyingPassedValue = GetUnderlyingObject(PassedValue, NULL);
> +  bool FoundUnderlyingObject =
> +    (UnderlyingPassedValue == GetUnderlyingObject(UnderlyingPassedValue, NULL));
> +  if (CalledFunc->onlyReadsMemory() ||
> +      (ArgumentSignature->onlyReadsMemory() &&
> +       !dyn_cast<GlobalValue>(UnderlyingPassedValue) &&
> +       FoundUnderlyingObject)) {
>      // If the byval argument has a specified alignment that is greater than the
>      // passed in pointer, then we either have to round up the input pointer or
>      // give up on this transformation.
>      if (ByValAlignment <= 1)  // 0 = unspecified, 1 = no particular alignment.
> -      return Arg;
> +      return PassedValue;
>  
>      // If the pointer is already known to be sufficiently aligned, or if we can
>      // round it up to a larger alignment, then we don't need a temporary.
> -    if (getOrEnforceKnownAlignment(Arg, ByValAlignment,
> +    if (getOrEnforceKnownAlignment(PassedValue, ByValAlignment,
>                                     IFI.TD) >= ByValAlignment)
> -      return Arg;
> +      return PassedValue;
>      
>      // Otherwise, we have to make a memcpy to get a safe alignment.  This is bad
>      // for code quality, but rarely happens and is required for correctness.
>    }
>    
> -  LLVMContext &Context = Arg->getContext();
> +  LLVMContext &Context = PassedValue->getContext();
>  
>    Type *VoidPtrTy = Type::getInt8PtrTy(Context);
>    
> @@ -380,7 +394,7 @@ static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
>    
>    Function *Caller = TheCall->getParent()->getParent(); 
>    
> -  Value *NewAlloca = new AllocaInst(AggTy, 0, Align, Arg->getName(), 
> +  Value *NewAlloca = new AllocaInst(AggTy, 0, Align, PassedValue->getName(),
>                                      &*Caller->begin()->begin());
>    // Emit a memcpy.
>    Type *Tys[3] = {VoidPtrTy, VoidPtrTy, Type::getInt64Ty(Context)};
> @@ -388,7 +402,7 @@ static Value *HandleByValArgument(Value *Arg, Instruction *TheCall,
>                                                   Intrinsic::memcpy, 
>                                                   Tys);
>    Value *DestCast = new BitCastInst(NewAlloca, VoidPtrTy, "tmp", TheCall);
> -  Value *SrcCast = new BitCastInst(Arg, VoidPtrTy, "tmp", TheCall);
> +  Value *SrcCast = new BitCastInst(PassedValue, VoidPtrTy, "tmp", TheCall);
>    
>    Value *Size;
>    if (IFI.TD == 0)
> @@ -589,13 +603,14 @@ bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,
>      for (Function::const_arg_iterator I = CalledFunc->arg_begin(),
>           E = CalledFunc->arg_end(); I != E; ++I, ++AI, ++ArgNo) {
>        Value *ActualArg = *AI;
> +      const Argument *Arg = I;
>  
>        // When byval arguments actually inlined, we need to make the copy implied
>        // by them explicit.  However, we don't do this if the callee is readonly
>        // or readnone, because the copy would be unneeded: the callee doesn't
>        // modify the struct.
>        if (CS.isByValArgument(ArgNo)) {
> -        ActualArg = HandleByValArgument(ActualArg, TheCall, CalledFunc, IFI,
> +        ActualArg = HandleByValArgument(ActualArg, Arg, TheCall, CalledFunc, IFI,
>                                          CalledFunc->getParamAlignment(ArgNo+1));
>   
>          // Calls that we inline may use the new alloca, so we need to clear
> diff --git a/test/Transforms/Inline/byval.ll b/test/Transforms/Inline/byval.ll
> index d7597ad..35efbb9 100644
> --- a/test/Transforms/Inline/byval.ll
> +++ b/test/Transforms/Inline/byval.ll
> @@ -25,7 +25,7 @@ entry:
>  	store i64 2, i64* %tmp4, align 4
>  	call void @f( %struct.ss* byval  %S ) nounwind 
>  	ret i32 0
> -; CHECK: @test1()
> +; CHECK-LABEL: @test1()
>  ; CHECK: %S1 = alloca %struct.ss
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK: call void @llvm.memcpy
> @@ -52,7 +52,7 @@ entry:
>  	store i64 2, i64* %tmp4, align 4
>  	%X = call i32 @f2( %struct.ss* byval  %S ) nounwind 
>  	ret i32 %X
> -; CHECK: @test2()
> +; CHECK-LABEL: @test2()
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK-NOT: call void @llvm.memcpy
>  ; CHECK: ret i32
> @@ -74,7 +74,7 @@ entry:
>  	%S = alloca %struct.ss, align 1  ;; May not be aligned.
>  	call void @f3( %struct.ss* byval align 64 %S) nounwind 
>  	ret void
> -; CHECK: @test3()
> +; CHECK-LABEL: @test3()
>  ; CHECK: %S1 = alloca %struct.ss, align 64
>  ; CHECK: %S = alloca %struct.ss
>  ; CHECK: call void @llvm.memcpy
> @@ -97,13 +97,17 @@ entry:
>  	%S = alloca %struct.ss, align 2		; <%struct.ss*> [#uses=4]
>  	%X = call i32 @f4( %struct.ss* byval align 64 %S ) nounwind 
>  	ret i32 %X
> -; CHECK: @test4()
> +; CHECK-LABEL: @test4()
>  ; CHECK: %S = alloca %struct.ss, align 64
>  ; CHECK-NOT: call void @llvm.memcpy
>  ; CHECK: call void @g3
>  ; CHECK: ret i32 4
>  }
>  
> +; If a global value is passed to a function as a readonly byval parameter
> +; and that same global value is written by the function before the argument
> +; is used, then the implicit memcpy cannot be optimized away since the value
> +; of the argument neeeds to be the original unmodified global value.
>  %struct.S0 = type { i32 }
>  
>  @b = global %struct.S0 { i32 1 }, align 4
> @@ -123,7 +127,56 @@ entry:
>  	tail call void @f5(%struct.S0* byval align 4 @b)
>  	%0 = load i32* @a, align 4
>  	ret i32 %0
> +}
>  ; CHECK: @test5()
>  ; CHECK: store i32 0, i32* getelementptr inbounds (%struct.S0* @b, i64 0, i32 0), align 4
>  ; CHECK-NOT: load i32* getelementptr inbounds (%struct.S0* @b, i64 0, i32 0), align 4
> +
> +; This is the same as test5, except that the global value has been bitcast to
> +; a different type before being passed to the function.
> +define internal void @f6(i8* byval nocapture readonly align 4 %p) {
> +entry:
> +	store i32 0, i32* getelementptr inbounds (%struct.S0* @b, i64 0, i32 0), align 4
> +	%f2 = getelementptr inbounds i8* %p, i64 0
> +	%0 = load i8* %f2, align 4
> +	%1 = sext i8 %0 to i32
> +	store i32 %1, i32* @a, align 4
> +	ret void
> +}
> +
> +define i32 @test6() {
> +entry:
> +	%0 = bitcast %struct.S0* @b to i8*
> +	tail call void @f6(i8* byval align 4 %0)
> +	%1 = load i32* @a, align 4
> +	ret i32 %1
> +}
> +; CHECK-LABEL: @test6
> +; CHECK:  call void @llvm.memcpy
> +; CHECK: store i32 0, i32* getelementptr inbounds (%struct.S0* @b, i64 0, i32 0), align 4
> +
> +; Inlining a byval struct should NOT cause an explicit copy
> +; into an alloca if the parameter is readonly
> +
> +define internal i32 @f7(%struct.ss* byval readonly %b) nounwind {
> +entry:
> +	%tmp = getelementptr %struct.ss* %b, i32 0, i32 0		; <i32*> [#uses=2]
> +	%tmp1 = load i32* %tmp, align 4		; <i32> [#uses=1]
> +	%tmp2 = add i32 %tmp1, 1		; <i32> [#uses=1]
> +	ret i32 %tmp2
> +}
> +
> +define i32 @test7() nounwind  {
> +entry:
> +	%S = alloca %struct.ss		; <%struct.ss*> [#uses=4]
> +	%tmp1 = getelementptr %struct.ss* %S, i32 0, i32 0		; <i32*> [#uses=1]
> +	store i32 1, i32* %tmp1, align 8
> +	%tmp4 = getelementptr %struct.ss* %S, i32 0, i32 1		; <i64*> [#uses=1]
> +	store i64 2, i64* %tmp4, align 4
> +	%X = call i32 @f7( %struct.ss* byval  %S ) nounwind
> +	ret i32 %X
> +; CHECK-LABEL: @test7()
> +; CHECK: %S = alloca %struct.ss
> +; CHECK-NOT: call void @llvm.memcpy
> +; CHECK: ret i32
>  }
> -- 
> 1.8.1.5
> 

> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits