[llvm] r348645 - [MemCpyOpt] memset->memcpy forwarding with undef tail

Sat Dec 8 05:09:55 PST 2018

On Sat, Dec 8, 2018 at 12:05 AM Philip Reames <listmail at philipreames.com>
wrote:

> Not a comment on the patch per se, just thinking about implications and
> followons.
>
> If we know that a prefix or suffix of a memcpy can be reduced to a
> memset, would we consider inserting the memset and narrowing the memcpy
> to the remaining width? i.e. produce something like this when M>N:
>
>      memset(a, byte, N);
>      memset(b, byte, M);
>      memcpy(b+B, a+N, M-N);
>
> This could definitely be a pessimization in some cases, but if N is
> small and can be inlined, or very large, then this could be quite
> profitable.
>
> Alternatively, should we extend the MemCpyOpt code to try to identify a
> set of regions which can be used to form the memcpy region?  In this
> case, we could have framed the result as:
>
>      memset(a, byte, N);
>      memset(b, byte, M);
>      memset(b+B, undef, M-N);
>
> Is it worth generalizing this for other cases where we can discover
> non-overlapping regions.  Such as maybe:
>
>      memset(a, byte, N);
>      memset(a+N, byte2, M-N);
>      memcpy(b, a, M);
>
> We do something like this to detect redundant stores covered by a set of
> partial clobbers in DSE after all...  but we don't do the same for loads
> in GVN on the other hand...
>
> Thoughts?
>
> Philip
>

As you already mentioned, my main concern with these transformations would
be that they may not always be wins, so it would be necessary to model when
it is profitable to split up a memcpy into multiple parts. I believe that
right now memcpyopt generally only performs optimizations that do not
increase the number of instructions.

There are still a couple cases that should be obvious wins that are not
handled. In particular, there currently is a requirement for the
destination to be the same, so cases where the memset region starting from
some offset is copied are not handled:

    memset(a, 0, N)
    memcpy(b, a+Offset, N-Offset)
    // is not optimized to
    memset(a, 0, N)
    memset(b+Offset, 0, N-Offset)

To provide some background on this patch, the original problem (which is
very common for Rust code due to frontend deficiencies) is code of the type

    x = alloca N
    stores/memsets to x
    memcpy(y, x, N)
    x not used here

One common variation on this is

    x = alloca N
    memset(x, 0, M)
    memcpy(y, x, N)
    x not used here

where usually M=N, but sometimes M<N due to padding. The M=N case was
already previous optimized, this patch covers M<N. The transformation to

    x = alloca N
    memset(x, 0, M)
    memset(y, 0, M)
    x not used here

in this case is only a means to an end, because it breaks the x->y
dependency and allows x to be eliminated entirely. While converting memcpy
to memset is of course a profitable transformation in its own right, the
dependency breaking is really the more important effect here.

However, this approach is limited in that it will only work with memsets,
but not with more complex initialization patterns. What I would like to
implement is an optimization that does something like this:

    x = alloca N
    stores/memsets to x
    memcpy(y, x, N)
    x not used here
    // hoist memcpy past stores, adjusting store dest
    x = alloca N
    memcpy(y, x, N)
    stores/memsets to y
    x not used here
    // eliminate memcpy of undef
    x = alloca N
    stores/memsets to y
    x not used here

This is basically what the call slot optimization does, but for groups of
stores rather than calls. I'm not yet sure how to best integrate this into
the existing code, but this would solve my original motivation fully (at
least for the single BB case).

Regards,
Nikita

On 12/7/18 1:16 PM, Nikita Popov via llvm-commits wrote:
> > Author: nikic
> > Date: Fri Dec  7 13:16:58 2018
> > New Revision: 348645
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=348645&view=rev
> > Log:
> > [MemCpyOpt] memset->memcpy forwarding with undef tail
> >
> > Currently memcpyopt optimizes cases like
> >
> >      memset(a, byte, N);
> >      memcpy(b, a, M);
> >
> > to
> >
> >      memset(a, byte, N);
> >      memset(b, byte, M);
> >
> > if M <= N. Often this allows further simplifications down the line,
> > which drop the first memset entirely.
> >
> > This patch extends this optimization for the case where M > N, but we
> > know that the bytes a[N..M] are undef due to alloca/lifetime.start.
> >
> > This situation arises relatively often for Rust code, because Rust does
> > not initialize trailing structure padding and loves to insert redundant
> > memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.
> >
> > For the implementation, I'm reusing a bit of code for a similar existing
> > optimization (direct memcpy of undef). I've also added memset support to
> > MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
> > used, but it seems to make more sense to add this to GetLocation and thus
> > make the computation cachable.
> >
> > Differential Revision: https://reviews.llvm.org/D55120
> >
> > Modified:
> >      llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp
> >      llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> >      llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
> >
> > Modified: llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp?rev=348645&r1=348644&r2=348645&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp (original)
> > +++ llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp Fri Dec  7
> 13:16:58 2018
> > @@ -154,6 +154,12 @@ static ModRefInfo GetLocation(const Inst
> >       return ModRefInfo::Mod;
> >     }
> >
> > +  if (const MemSetInst *MI = dyn_cast<MemSetInst>(Inst)) {
> > +    Loc = MemoryLocation::getForDest(MI);
> > +    // Conversatively assume ModRef for volatile memset.
> > +    return MI->isVolatile() ? ModRefInfo::ModRef : ModRefInfo::Mod;
> > +  }
> > +
> >     if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
> >       switch (II->getIntrinsicID()) {
> >       case Intrinsic::lifetime_start:
> >
> > Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=348645&r1=348644&r2=348645&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
> > +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Dec  7
> 13:16:58 2018
> > @@ -1144,6 +1144,21 @@ bool MemCpyOptPass::processMemSetMemCpyD
> >     return true;
> >   }
> >
> > +/// Determine whether the instruction has undefined content for the
> given Size,
> > +/// either because it was freshly alloca'd or started its lifetime.
> > +static bool hasUndefContents(Instruction *I, ConstantInt *Size) {
> > +  if (isa<AllocaInst>(I))
> > +    return true;
> > +
> > +  if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))
> > +    if (II->getIntrinsicID() == Intrinsic::lifetime_start)
> > +      if (ConstantInt *LTSize =
> dyn_cast<ConstantInt>(II->getArgOperand(0)))
> > +        if (LTSize->getZExtValue() >= Size->getZExtValue())
> > +          return true;
> > +
> > +  return false;
> > +}
> > +
> >   /// Transform memcpy to memset when its source was just memset.
> >   /// In other words, turn:
> >   /// \code
> > @@ -1167,12 +1182,23 @@ bool MemCpyOptPass::performMemCpyToMemSe
> >     if (!AA.isMustAlias(MemSet->getRawDest(), MemCpy->getRawSource()))
> >       return false;
> >
> > -  ConstantInt *CopySize = cast<ConstantInt>(MemCpy->getLength());
> > +  // A known memset size is required.
> >     ConstantInt *MemSetSize = dyn_cast<ConstantInt>(MemSet->getLength());
> > +  if (!MemSetSize)
> > +    return false;
> > +
> >     // Make sure the memcpy doesn't read any more than what the memset
> wrote.
> >     // Don't worry about sizes larger than i64.
> > -  if (!MemSetSize || CopySize->getZExtValue() >
> MemSetSize->getZExtValue())
> > -    return false;
> > +  ConstantInt *CopySize = cast<ConstantInt>(MemCpy->getLength());
> > +  if (CopySize->getZExtValue() > MemSetSize->getZExtValue()) {
> > +    // If the memcpy is larger than the memset, but the memory was
> undef prior
> > +    // to the memset, we can just ignore the tail.
> > +    MemDepResult DepInfo = MD->getDependency(MemSet);
> > +    if (DepInfo.isDef() && hasUndefContents(DepInfo.getInst(),
> CopySize))
> > +      CopySize = MemSetSize;
> > +    else
> > +      return false;
> > +  }
> >
> >     IRBuilder<> Builder(MemCpy);
> >     Builder.CreateMemSet(MemCpy->getRawDest(), MemSet->getOperand(1),
> > @@ -1252,19 +1278,7 @@ bool MemCpyOptPass::processMemCpy(MemCpy
> >       if (MemCpyInst *MDep = dyn_cast<MemCpyInst>(SrcDepInfo.getInst()))
> >         return processMemCpyMemCpyDependence(M, MDep);
> >     } else if (SrcDepInfo.isDef()) {
> > -    Instruction *I = SrcDepInfo.getInst();
> > -    bool hasUndefContents = false;
> > -
> > -    if (isa<AllocaInst>(I)) {
> > -      hasUndefContents = true;
> > -    } else if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
> > -      if (II->getIntrinsicID() == Intrinsic::lifetime_start)
> > -        if (ConstantInt *LTSize =
> dyn_cast<ConstantInt>(II->getArgOperand(0)))
> > -          if (LTSize->getZExtValue() >= CopySize->getZExtValue())
> > -            hasUndefContents = true;
> > -    }
> > -
> > -    if (hasUndefContents) {
> > +    if (hasUndefContents(SrcDepInfo.getInst(), CopySize)) {
> >         MD->removeInstruction(M);
> >         M->eraseFromParent();
> >         ++NumMemCpyInstr;
> >
> > Modified: llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll?rev=348645&r1=348644&r2=348645&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll
> (original)
> > +++ llvm/trunk/test/Transforms/MemCpyOpt/memset-memcpy-oversized.ll Fri
> Dec  7 13:16:58 2018
> > @@ -12,7 +12,7 @@ define void @test_alloca(i8* %result) {
> >   ; CHECK-NEXT:    [[A:%.*]] = alloca [[T:%.*]], align 8
> >   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
> >   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8
> 0, i64 12, i1 false)
> > -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8*
> [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> > +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8
> 0, i64 12, i1 false)
> >   ; CHECK-NEXT:    ret void
> >   ;
> >     %a = alloca %T, align 8
> > @@ -28,7 +28,7 @@ define void @test_alloca_with_lifetimes(
> >   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
> >   ; CHECK-NEXT:    call void @llvm.lifetime.start.p0i8(i64 16, i8* [[B]])
> >   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8
> 0, i64 12, i1 false)
> > -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8*
> [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> > +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8
> 0, i64 12, i1 false)
> >   ; CHECK-NEXT:    call void @llvm.lifetime.end.p0i8(i64 16, i8* [[B]])
> >   ; CHECK-NEXT:    ret void
> >   ;
> > @@ -46,7 +46,7 @@ define void @test_malloc_with_lifetimes(
> >   ; CHECK-NEXT:    [[A:%.*]] = call i8* @malloc(i64 16)
> >   ; CHECK-NEXT:    call void @llvm.lifetime.start.p0i8(i64 16, i8* [[A]])
> >   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[A]], i8
> 0, i64 12, i1 false)
> > -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8*
> [[RESULT:%.*]], i8* align 8 [[A]], i64 16, i1 false)
> > +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8
> 0, i64 12, i1 false)
> >   ; CHECK-NEXT:    call void @llvm.lifetime.end.p0i8(i64 16, i8* [[A]])
> >   ; CHECK-NEXT:    call void @free(i8* [[A]])
> >   ; CHECK-NEXT:    ret void
> > @@ -98,7 +98,7 @@ define void @test_volatile_memset(i8* %r
> >   ; CHECK-NEXT:    [[A:%.*]] = alloca [[T:%.*]], align 8
> >   ; CHECK-NEXT:    [[B:%.*]] = bitcast %T* [[A]] to i8*
> >   ; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* align 8 [[B]], i8
> 0, i64 12, i1 true)
> > -; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8*
> [[RESULT:%.*]], i8* align 8 [[B]], i64 16, i1 false)
> > +; CHECK-NEXT:    call void @llvm.memset.p0i8.i64(i8* [[RESULT:%.*]], i8
> 0, i64 12, i1 false)
> >   ; CHECK-NEXT:    ret void
> >   ;
> >     %a = alloca %T, align 8
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181208/87ef3a79/attachment-0001.html>