[llvm] r348645 - [MemCpyOpt] memset->memcpy forwarding with undef tail

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 7 15:05:39 PST 2018


Not a comment on the patch per se, just thinking about implications and 
followons.

If we know that a prefix or suffix of a memcpy can be reduced to a 
memset, would we consider inserting the memset and narrowing the memcpy 
to the remaining width? i.e. produce something like this when M>N:

     memset(a, byte, N);
     memset(b, byte, M);
     memcpy(b+B, a+N, M-N);

This could definitely be a pessimization in some cases, but if N is 
small and can be inlined, or very large, then this could be quite 
profitable.

Alternatively, should we extend the MemCpyOpt code to try to identify a 
set of regions which can be used to form the memcpy region?  In this 
case, we could have framed the result as:

     memset(a, byte, N);
     memset(b, byte, M);
     memset(b+B, undef, M-N);

Is it worth generalizing this for other cases where we can discover 
non-overlapping regions.  Such as maybe:

     memset(a, byte, N);
     memset(a+N, byte2, M-N);
     memcpy(b, a, M);

We do something like this to detect redundant stores covered by a set of 
partial clobbers in DSE after all...  but we don't do the same for loads 
in GVN on the other hand...

Thoughts?

Philip


On 12/7/18 1:16 PM, Nikita Popov via llvm-commits wrote:
> Author: nikic
> Date: Fri Dec  7 13:16:58 2018
> New Revision: 348645
>
> URL: http://llvm.org/viewvc/llvm-project?rev=348645&view=rev
> Log:
> [MemCpyOpt] memset->memcpy forwarding with undef tail
>
> Currently memcpyopt optimizes cases like
>
>      memset(a, byte, N);
>      memcpy(b, a, M);
>
> to
>
>      memset(a, byte, N);
>      memset(b, byte, M);
>
> if M <= N. Often this allows further simplifications down the line,
> which drop the first memset entirely.
>
> This patch extends this optimization for the case where M > N, but we
> know that the bytes a[N..M] are undef due to alloca/lifetime.start.
>
> This situation arises relatively often for Rust code, because Rust does
> not initialize trailing structure padding and loves to insert redundant
> memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.
>
> For the implementation, I'm reusing a bit of code for a similar existing
> optimization (direct memcpy of undef). I've also added memset support to
> MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
> used, but it seems to make more sense to add this to GetLocation and thus
> make the computation cachable.
>
> Differential Revision: https://reviews.llvm.org/D55120
>
> Modified:
>      llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp
>      llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>      llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
>
> Modified: llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp?rev=348645&r1=348644&r2=348645&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp (original)
> +++ llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp Fri Dec  7 13:16:58 2018
> @@ -154,6 +154,12 @@ static ModRefInfo GetLocation(const Inst
>       return ModRefInfo::Mod;
>     }
>   
> +  if (const MemSetInst *MI = dyn_cast<MemSetInst>(Inst)) {
> +    Loc = MemoryLocation::getForDest(MI);
> +    // Conversatively assume ModRef for volatile memset.
> +    return MI->isVolatile() ? ModRefInfo::ModRef : ModRefInfo::Mod;
> +  }
> +
>     if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
>       switch (II->getIntrinsicID()) {
>       case Intrinsic::lifetime_start:
>
> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=348645&r1=348644&r2=348645&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Dec  7 13:16:58 2018
> @@ -1144,6 +1144,21 @@ bool MemCpyOptPass::processMemSetMemCpyD
>     return true;
>   }
>   
> +/// Determine whether the instruction has undefined content for the given Size,
> +/// either because it was freshly alloca'd or started its lifetime.
> +static bool hasUndefContents(Instruction *I, ConstantInt *Size) {
> +  if (isa<AllocaInst>(I))
> +    return true;
> +
> +  if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))
> +    if (II->getIntrinsicID() == Intrinsic::lifetime_start)
> +      if (ConstantInt *LTSize = dyn_cast<ConstantInt>(II->getArgOperand(0)))
> +        if (LTSize->getZExtValue() >= Size->getZExtValue())
> +          return true;
> +
> +  return false;
> +}
> +
>   /// Transform memcpy to memset when its source was just memset.
>   /// In other words, turn:
>   /// \code
> @@ -1167,12 +1182,23 @@ bool MemCpyOptPass::performMemCpyToMemSe
>     if (!AA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
>       return false;
>   
> -  ConstantInt *CopySize = cast<ConstantInt>(MemCpy->getLength());
> +  // A known memset size is required.
>     ConstantInt *MemSetSize = dyn_cast<ConstantInt>(MemSet->getLength());
> +  if (!MemSetSize)
> +    return false;
> +
>     // Make sure the memcpy doesn't read any more than what the memset wrote.
>     // Don't worry about sizes larger than i64.
> -  if (!MemSetSize || CopySize->getZExtValue() > MemSetSize->getZExtValue())
> -    return false;
> +  ConstantInt *CopySize = cast<ConstantInt>(MemCpy->getLength());
> +  if (CopySize->getZExtValue() > MemSetSize->getZExtValue()) {
> +    // If the memcpy is larger than the memset, but the memory was undef prior
> +    // to the memset, we can just ignore the tail.
> +    MemDepResult DepInfo = MD->getDependency(MemSet);
> +    if (DepInfo.isDef() && hasUndefContents(DepInfo.getInst(), CopySize))
> +      CopySize = MemSetSize;
> +    else
> +      return false;
> +  }
>   
>     IRBuilder<> Builder(MemCpy);
>     Builder.CreateMemSet(MemCpy->getRawDest(), MemSet->getOperand(1),
> @@ -1252,19 +1278,7 @@ bool MemCpyOptPass::processMemCpy(MemCpy
>       if (MemCpyInst *MDep = dyn_cast<MemCpyInst>(SrcDepInfo.getInst()))
>         return processMemCpyMemCpyDependence(M, MDep);
>     } else if (SrcDepInfo.isDef()) {
> -    Instruction *I = SrcDepInfo.getInst();
> -    bool hasUndefContents = false;
> -
> -    if (isa<AllocaInst>(I)) {
> -      hasUndefContents = true;
> -    } else if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
> -      if (II->getIntrinsicID() == Intrinsic::lifetime_start)
> -        if (ConstantInt *LTSize = dyn_cast<ConstantInt>(II->getArgOperand(0)))
> -          if (LTSize->getZExtValue() >= CopySize->getZExtValue())
> -            hasUndefContents = true;
> -    }
> -
> -    if (hasUndefContents) {
> +    if (hasUndefContents(SrcDepInfo.getInst(), CopySize)) {
>         MD->removeInstruction(M);
>         M->eraseFromParent();
>         ++NumMemCpyInstr;
>
> Modified: llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll?rev=348645&r1=348644&r2=348645&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll (original)
> +++ llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll Fri Dec  7 13:16:58 2018
> @@ -12,7 +12,7 @@ define void @test_alloca(i8* %result) {
>   ; CHECK-NEXT:    [[A:%.*]] = alloca [[T:%.*]], align 8
>   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
>   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8 0, i64 12, i1 false)
> -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8 0, i64 12, i1 false)
>   ; CHECK-NEXT:    ret void
>   ;
>     %a = alloca %T, align 8
> @@ -28,7 +28,7 @@ define void @test_alloca_with_lifetimes(
>   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
>   ; CHECK-NEXT:    call void @llvm.lifetime.start.p0i8(i64 16, i8* [[B]])
>   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8 0, i64 12, i1 false)
> -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8 0, i64 12, i1 false)
>   ; CHECK-NEXT:    call void @llvm.lifetime.end.p0i8(i64 16, i8* [[B]])
>   ; CHECK-NEXT:    ret void
>   ;
> @@ -46,7 +46,7 @@ define void @test_malloc_with_lifetimes(
>   ; CHECK-NEXT:    [[A:%.*]] = call i8* @malloc(i64 16)
>   ; CHECK-NEXT:    call void @llvm.lifetime.start.p0i8(i64 16, i8* [[A]])
>   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[A]], i8 0, i64 12, i1 false)
> -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[RESULT:%.*]], i8* align 8 [[A]], i64 16, i1 false)
> +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8 0, i64 12, i1 false)
>   ; CHECK-NEXT:    call void @llvm.lifetime.end.p0i8(i64 16, i8* [[A]])
>   ; CHECK-NEXT:    call void @free(i8* [[A]])
>   ; CHECK-NEXT:    ret void
> @@ -98,7 +98,7 @@ define void @test_volatile_memset(i8* %r
>   ; CHECK-NEXT:    [[A:%.*]] = alloca [[T:%.*]], align 8
>   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
>   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8 0, i64 12, i1 true)
> -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8 0, i64 12, i1 false)
>   ; CHECK-NEXT:    ret void
>   ;
>     %a = alloca %T, align 8
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list