[llvm-commits] [llvm] r166874 - in /llvm/trunk: lib/Transforms/Scalar/LoopIdiomRecognize.cpp test/Transforms/LoopIdiom/multi-dimensional.ll test/Transforms/LoopIdiom/sideeffect.ll

Benjamin Kramer benny.kra at gmail.com
Tue Oct 30 12:09:35 PDT 2012


On 30.10.2012, at 18:40, Preston Briggs <preston.briggs at gmail.com> wrote:

> 
> 
> On Tue, Oct 30, 2012 at 9:17 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
> >
> > On 29.10.2012, at 19:10, Preston Briggs <preston.briggs at gmail.com> wrote:
> >
> >> I don't understand all this code, but I have a few comments inline.
> >
> > Thanks for the review :) I'm still working out how to make good use of all the details you DA provides.
> >
> > One important thing about this pass (and all other LoopPasses) is that nested
> > loops are visited inside-out, starting with the most deeply nested loop.
> 
> Right, makes good sense.
> 
> 
> >>> Author: d0k
> >>> Date: Sat Oct 27 09:25:44 2012
> >>> New Revision: 166874
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=166874&view=rev
> >>> Log:
> >>> LoopIdiom: Replace custom dependence analysis with DependenceAnalysis.
> >>
> >> :-)
> >>
> >>> Requires a lot less code and complexity on loop-idiom's side and the more
> >>> precise analysis can catch more cases, like the one I included as a test case.
> >>> This also fixes the edge-case miscompilation from PR9481.
> >>>
> >>> Compile time performance seems to be slightly worse, but this is mostly due
> >>> to an extra LCSSA run scheduled by the PassManager and should be fixed there.
> >>>
> >>> Added:
> >>>    llvm/trunk/test/Transforms/LoopIdiom/multi-dimensional.ll
> >>>    llvm/trunk/test/Transforms/LoopIdiom/sideeffect.ll
> >>> Modified:
> >>>    llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> >>>
> >>> Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> >>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp?rev=166874&r1=166873&r2=166874&view=diff
> >>> ==============================================================================
> >>> --- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (original)
> >>> +++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Sat Oct 27 09:25:44 2012
> >>> @@ -48,6 +48,7 @@
> >>> #include "llvm/Module.h"
> >>> #include "llvm/ADT/Statistic.h"
> >>> #include "llvm/Analysis/AliasAnalysis.h"
> >>> +#include "llvm/Analysis/DependenceAnalysis.h"
> >>> #include "llvm/Analysis/LoopPass.h"
> >>> #include "llvm/Analysis/ScalarEvolutionExpander.h"
> >>> #include "llvm/Analysis/ScalarEvolutionExpressions.h"
> >>> @@ -106,6 +107,8 @@
> >>>       AU.addPreserved<AliasAnalysis>();
> >>>       AU.addRequired<ScalarEvolution>();
> >>>       AU.addPreserved<ScalarEvolution>();
> >>> +      AU.addRequired<DependenceAnalysis>();
> >>> +      AU.addPreserved<DependenceAnalysis>();
> >>>       AU.addPreserved<DominatorTree>();
> >>>       AU.addRequired<DominatorTree>();
> >>>       AU.addRequired<TargetLibraryInfo>();
> >>> @@ -122,6 +125,7 @@
> >>> INITIALIZE_PASS_DEPENDENCY(LCSSA)
> >>> INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
> >>> INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfo)
> >>> +INITIALIZE_PASS_DEPENDENCY(DependenceAnalysis)
> >>> INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
> >>> INITIALIZE_PASS_END(LoopIdiomRecognize, "loop-idiom", "Recognize loop idioms",
> >>>                     false, false)
> >>> @@ -163,15 +167,6 @@
> >>>   } while (!NowDeadInsts.empty());
> >>> }
> >>>
> >>> -/// deleteIfDeadInstruction - If the specified value is a dead instruction,
> >>> -/// delete it and any recursively used instructions.
> >>> -static void deleteIfDeadInstruction(Value *V, ScalarEvolution &SE,
> >>> -                                    const TargetLibraryInfo *TLI) {
> >>> -  if (Instruction *I = dyn_cast<Instruction>(V))
> >>> -    if (isInstructionTriviallyDead(I, TLI))
> >>> -      deleteDeadInstruction(I, SE, TLI);
> >>> -}
> >>> -
> >>> bool LoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {
> >>>   CurLoop = L;
> >>>
> >>> @@ -368,40 +363,6 @@
> >>>                                  MSI, Ev, BECount);
> >>> }
> >>>
> >>> -
> >>> -/// mayLoopAccessLocation - Return true if the specified loop might access the
> >>> -/// specified pointer location, which is a loop-strided access.  The 'Access'
> >>> -/// argument specifies what the verboten forms of access are (read or write).
> >>> -static bool mayLoopAccessLocation(Value *Ptr,AliasAnalysis::ModRefResult Access,
> >>> -                                  Loop *L, const SCEV *BECount,
> >>> -                                  unsigned StoreSize, AliasAnalysis &AA,
> >>> -                                  Instruction *IgnoredStore) {
> >>> -  // Get the location that may be stored across the loop.  Since the access is
> >>> -  // strided positively through memory, we say that the modified location starts
> >>> -  // at the pointer and has infinite size.
> >>> -  uint64_t AccessSize = AliasAnalysis::UnknownSize;
> >>> -
> >>> -  // If the loop iterates a fixed number of times, we can refine the access size
> >>> -  // to be exactly the size of the memset, which is (BECount+1)*StoreSize
> >>> -  if (const SCEVConstant *BECst = dyn_cast<SCEVConstant>(BECount))
> >>> -    AccessSize = (BECst->getValue()->getZExtValue()+1)*StoreSize;
> >>> -
> >>> -  // TODO: For this to be really effective, we have to dive into the pointer
> >>> -  // operand in the store.  Store to &A[i] of 100 will always return may alias
> >>> -  // with store of &A[100], we need to StoreLoc to be "A" with size of 100,
> >>> -  // which will then no-alias a store to &A[100].
> >>> -  AliasAnalysis::Location StoreLoc(Ptr, AccessSize);
> >>> -
> >>> -  for (Loop::block_iterator BI = L->block_begin(), E = L->block_end(); BI != E;
> >>> -       ++BI)
> >>> -    for (BasicBlock::iterator I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I)
> >>> -      if (&*I != IgnoredStore &&
> >>> -          (AA.getModRefInfo(I, StoreLoc) & Access))
> >>> -        return true;
> >>> -
> >>> -  return false;
> >>> -}
> >>> -
> >>> /// getMemSetPatternValue - If a strided store of the specified value is safe to
> >>> /// turn into a memset_pattern16, return a ConstantArray of 16 bytes that should
> >>> /// be passed in.  Otherwise, return null.
> >>> @@ -474,6 +435,18 @@
> >>>     return false;
> >>>   }
> >>>
> >>> +  // Make sure the store has no dependencies (i.e. other loads and stores) in
> >>> +  // the loop.
> >>> +  DependenceAnalysis &DA = getAnalysis<DependenceAnalysis>();
> >>> +  for (Loop::block_iterator BI = CurLoop->block_begin(),
> >>> +       BE = CurLoop->block_end(); BI != BE; ++BI)
> >>> +    for (BasicBlock::iterator I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I)
> >>> +      if (&*I != TheStore && I->mayReadOrWriteMemory()) {
> >>> +        OwningPtr<Dependence> D(DA.depends(TheStore, I, true));
> >>> +        if (D)
> >>> +          return false;
> >>
> >> This seems pessimistic. If the candidate loop is nested, we could
> >> still find some idioms in the presence of dependences. We should check
> >> to see if there's a loop-independent dependence or if there's a
> >> dependence at the innermost level.
> >
> > The code was adapted from the previous custom analysis so there is
> > a high possibility that it can be made a lot more aggressive. However,
> > if we can turn the inner loop into a function call, the pass would've done
> > that when visiting the inner loop. Am I missing something?
> 
> Loops are numbered from 1 (outermost) to n (innermost).
> Instructions not contained in a loop are said to be at level 0.
> So if we have code like this
> 
> while (...) {
>    ...
>    for (i = 0; i < n; i++)
>       A[i] = A[i + n];
>    ...
> }
> 
> there will be dependences between the various references to A
> (in your code above, D won't be NULL). D->getDirection(1) will
> be ALL (and D->isScalar(1) will be true), but D->getDirection(2)
> should be NONE, so you ought to be able to replace the inner loop
> with a call to memmove.

This is a cute example that we can handle now thanks to DependenceAnalysis. It can be turned into a memcpy because A[i] and A[i+n] cannot alias :)

> >>> +      }
> >>> +
> >>>   // The trip count of the loop and the base pointer of the addrec SCEV is
> >>>   // guaranteed to be loop invariant, which means that it should dominate the
> >>>   // header.  This allows us to insert code for it in the preheader.
> >>> @@ -484,8 +457,7 @@
> >>>   // Okay, we have a strided store "p[i]" of a splattable value.  We can turn
> >>>   // this into a memset in the loop preheader now if we want.  However, this
> >>>   // would be unsafe to do if there is anything else in the loop that may read
> >>> -  // or write to the aliased location.  Check for any overlap by generating the
> >>> -  // base pointer and checking the region.
> >>> +  // or write to the aliased location.
> >>>   assert(DestPtr->getType()->isPointerTy()
> >>>       && "Must be a pointer type.");
> >>>   unsigned AddrSpace = DestPtr->getType()->getPointerAddressSpace();
> >>> @@ -494,15 +466,6 @@
> >>>                            Preheader->getTerminator());
> >>>
> >>>
> >>> -  if (mayLoopAccessLocation(BasePtr, AliasAnalysis::ModRef,
> >>> -                            CurLoop, BECount,
> >>> -                            StoreSize, getAnalysis<AliasAnalysis>(), TheStore)){
> >>> -    Expander.clear();
> >>> -    // If we generated new code for the base pointer, clean up.
> >>> -    deleteIfDeadInstruction(BasePtr, *SE, TLI);
> >>> -    return false;
> >>> -  }
> >>> -
> >>>   // Okay, everything looks good, insert the memset.
> >>>
> >>>   // The # stored bytes is (BECount+1)*Size.  Expand the trip count out to
> >>> @@ -565,6 +528,33 @@
> >>>
> >>>   LoadInst *LI = cast<LoadInst>(SI->getValueOperand());
> >>>
> >>> +  // Make sure the load and the store have no dependencies (i.e. other loads and
> >>> +  // stores) in the loop. We ignore the direct dependency between SI and LI here
> >>> +  // and check it later.
> >>> +  DependenceAnalysis &DA = getAnalysis<DependenceAnalysis>();
> >>> +  for (Loop::block_iterator BI = CurLoop->block_begin(),
> >>> +       BE = CurLoop->block_end(); BI != BE; ++BI)
> >>> +    for (BasicBlock::iterator I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I)
> >>> +      if (&*I != SI && &*I != LI && I->mayReadOrWriteMemory()) {
> >>> +        // First, check if there is a dependence of the store.
> >>> +        OwningPtr<Dependence> DS(DA.depends(SI, I, true));
> >>> +        if (DS)
> >>> +          return false;
> >>> +        // If the scanned instructon may modify memory then we also have to
> >>> +        // check for dependencys on the load.
> >>> +        if (I->mayWriteToMemory()) {
> >>> +          OwningPtr<Dependence> DL(DA.depends(I, LI, true));
> >>> +          if (DL)
> >>> +            return false;
> >>> +        }
> >>> +      }
> >>> +
> >>> +  // Now check the dependency between SI and LI. If there is no dependency we
> >>> +  // can safely emit a memcpy.
> >>> +  OwningPtr<Dependence> Dep(DA.depends(SI, LI, true));
> >>> +  if (Dep)
> >>> +    return false;
> >>> +
> >>>   // The trip count of the loop and the base pointer of the addrec SCEV is
> >>>   // guaranteed to be loop invariant, which means that it should dominate the
> >>>   // header.  This allows us to insert code for it in the preheader.
> >>> @@ -573,41 +563,16 @@
> >>>   SCEVExpander Expander(*SE, "loop-idiom");
> >>>
> >>>   // Okay, we have a strided store "p[i]" of a loaded value.  We can turn
> >>> -  // this into a memcpy in the loop preheader now if we want.  However, this
> >>> -  // would be unsafe to do if there is anything else in the loop that may read
> >>> -  // or write the memory region we're storing to.  This includes the load that
> >>> -  // feeds the stores.  Check for an alias by generating the base address and
> >>> -  // checking everything.
> >>> +  // this into a memcpy in the loop preheader now if we want.
> >>>   Value *StoreBasePtr =
> >>>     Expander.expandCodeFor(StoreEv->getStart(),
> >>>                            Builder.getInt8PtrTy(SI->getPointerAddressSpace()),
> >>>                            Preheader->getTerminator());
> >>> -
> >>> -  if (mayLoopAccessLocation(StoreBasePtr, AliasAnalysis::ModRef,
> >>> -                            CurLoop, BECount, StoreSize,
> >>> -                            getAnalysis<AliasAnalysis>(), SI)) {
> >>> -    Expander.clear();
> >>> -    // If we generated new code for the base pointer, clean up.
> >>> -    deleteIfDeadInstruction(StoreBasePtr, *SE, TLI);
> >>> -    return false;
> >>> -  }
> >>> -
> >>> -  // For a memcpy, we have to make sure that the input array is not being
> >>> -  // mutated by the loop.
> >>>   Value *LoadBasePtr =
> >>>     Expander.expandCodeFor(LoadEv->getStart(),
> >>>                            Builder.getInt8PtrTy(LI->getPointerAddressSpace()),
> >>>                            Preheader->getTerminator());
> >>>
> >>> -  if (mayLoopAccessLocation(LoadBasePtr, AliasAnalysis::Mod, CurLoop, BECount,
> >>> -                            StoreSize, getAnalysis<AliasAnalysis>(), SI)) {
> >>> -    Expander.clear();
> >>> -    // If we generated new code for the base pointer, clean up.
> >>> -    deleteIfDeadInstruction(LoadBasePtr, *SE, TLI);
> >>> -    deleteIfDeadInstruction(StoreBasePtr, *SE, TLI);
> >>> -    return false;
> >>> -  }
> >>> -
> >>>   // Okay, everything is safe, we can transform this!
> >>>
> >>>
> >>>
> >>> Added: llvm/trunk/test/Transforms/LoopIdiom/multi-dimensional.ll
> >>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/multi-dimensional.ll?rev=166874&view=auto
> >>> ==============================================================================
> >>> --- llvm/trunk/test/Transforms/LoopIdiom/multi-dimensional.ll (added)
> >>> +++ llvm/trunk/test/Transforms/LoopIdiom/multi-dimensional.ll Sat Oct 27 09:25:44 2012
> >>> @@ -0,0 +1,49 @@
> >>> +; RUN: opt -basicaa -loop-idiom < %s -S | FileCheck %s
> >>> +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
> >>> +target triple = "x86_64-apple-darwin10.0.0"
> >>> +
> >>> +%struct.ham = type { [2 x [2 x [2 x [16 x [8 x i32]]]]], i32, %struct.zot }
> >>> +%struct.zot = type { i32, i16, i16, [2 x [1152 x i32]] }
> >>> +
> >>> +define void @test1(%struct.ham* nocapture %arg) nounwind {
> >>> +bb:
> >>> +  br label %bb1
> >>> +
> >>> +bb1:                                              ; preds = %bb11, %bb
> >>> +  %tmp = phi i64 [ 0, %bb ], [ %tmp12, %bb11 ]
> >>> +  br label %bb2
> >>> +
> >>> +bb2:                                              ; preds = %bb2, %bb1
> >>> +  %tmp3 = phi i64 [ 0, %bb1 ], [ %tmp8, %bb2 ]
> >>> +  %tmp4 = getelementptr inbounds %struct.ham* %arg, i64 0, i32 0, i64 0, i64 1, i64 1, i64 %tmp, i64 %tmp3
> >>> +  store i32 0, i32* %tmp4, align 4
> >>> +  %tmp5 = getelementptr inbounds %struct.ham* %arg, i64 0, i32 0, i64 0, i64 1, i64 0, i64 %tmp, i64 %tmp3
> >>> +  store i32 0, i32* %tmp5, align 4
> >>> +  %tmp6 = getelementptr inbounds %struct.ham* %arg, i64 0, i32 0, i64 0, i64 0, i64 1, i64 %tmp, i64 %tmp3
> >>> +  store i32 0, i32* %tmp6, align 4
> >>> +  %tmp7 = getelementptr inbounds %struct.ham* %arg, i64 0, i32 0, i64 0, i64 0, i64 0, i64 %tmp, i64 %tmp3
> >>> +  store i32 0, i32* %tmp7, align 4
> >>> +  %tmp8 = add i64 %tmp3, 1
> >>> +  %tmp9 = trunc i64 %tmp8 to i32
> >>> +  %tmp10 = icmp eq i32 %tmp9, 8
> >>> +  br i1 %tmp10, label %bb11, label %bb2
> >>> +
> >>> +bb11:                                             ; preds = %bb2
> >>> +  %tmp12 = add i64 %tmp, 1
> >>> +  %tmp13 = trunc i64 %tmp12 to i32
> >>> +  %tmp14 = icmp eq i32 %tmp13, 16
> >>> +  br i1 %tmp14, label %bb15, label %bb1
> >>> +
> >>> +bb15:                                             ; preds = %bb11
> >>> +  ret void
> >>> +
> >>> +; CHECK: @test1
> >>> +; CHECK: bb1:
> >>> +; CHECK-NOT: store
> >>> +; CHECK: call void @llvm.memset.p0i8.i64
> >>> +; CHECK-NEXT: call void @llvm.memset.p0i8.i64
> >>> +; CHECK-NEXT: call void @llvm.memset.p0i8.i64
> >>> +; CHECK-NEXT: call void @llvm.memset.p0i8.i64
> >>> +; CHECK-NOT: store
> >>> +; CHECK: br
> >>> +}
> >>>
> >>> Added: llvm/trunk/test/Transforms/LoopIdiom/sideeffect.ll
> >>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/sideeffect.ll?rev=166874&view=auto
> >>> ==============================================================================
> >>> --- llvm/trunk/test/Transforms/LoopIdiom/sideeffect.ll (added)
> >>> +++ llvm/trunk/test/Transforms/LoopIdiom/sideeffect.ll Sat Oct 27 09:25:44 2012
> >>> @@ -0,0 +1,53 @@
> >>> +; RUN: opt -basicaa -loop-idiom < %s -S | FileCheck %s
> >>> +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
> >>> +target triple = "x86_64-apple-darwin10.0.0"
> >>> +
> >>> +; PR9481
> >>> +define i32 @test1() nounwind uwtable ssp {
> >>> +entry:
> >>> +  %a = alloca [10 x i8], align 1
> >>> +  br label %for.body
> >>> +
> >>> +for.cond1.preheader:                              ; preds = %for.body
> >>> +  %arrayidx5.phi.trans.insert = getelementptr inbounds [10 x i8]* %a, i64 0, i64 0
> >>> +  %.pre = load i8* %arrayidx5.phi.trans.insert, align 1
> >>> +  br label %for.body3
> >>> +
> >>> +for.body:                                         ; preds = %for.body, %entry
> >>> +  %indvars.iv29 = phi i64 [ 0, %entry ], [ %indvars.iv.next30, %for.body ]
> >>> +  call void (...)* @bar() nounwind
> >>> +  %arrayidx = getelementptr inbounds [10 x i8]* %a, i64 0, i64 %indvars.iv29
> >>> +  store i8 23, i8* %arrayidx, align 1
> >>> +  %indvars.iv.next30 = add i64 %indvars.iv29, 1
> >>> +  %lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
> >>> +  %exitcond32 = icmp eq i32 %lftr.wideiv31, 1000000
> >>> +  br i1 %exitcond32, label %for.cond1.preheader, label %for.body
> >>> +
> >>> +for.body3:                                        ; preds = %for.body3, %for.cond1.preheader
> >>> +  %0 = phi i8 [ %.pre, %for.cond1.preheader ], [ %add, %for.body3 ]
> >>> +  %indvars.iv = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
> >>> +  call void (...)* @bar() nounwind
> >>> +  %arrayidx7 = getelementptr inbounds [10 x i8]* %a, i64 0, i64 %indvars.iv
> >>> +  %1 = load i8* %arrayidx7, align 1
> >>> +  %add = add i8 %1, %0
> >>> +  store i8 %add, i8* %arrayidx7, align 1
> >>> +  %indvars.iv.next = add i64 %indvars.iv, 1
> >>> +  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
> >>> +  %exitcond = icmp eq i32 %lftr.wideiv, 1000000
> >>> +  br i1 %exitcond, label %for.end12, label %for.body3
> >>> +
> >>> +for.end12:                                        ; preds = %for.body3
> >>> +  %arrayidx13 = getelementptr inbounds [10 x i8]* %a, i64 0, i64 2
> >>> +  %2 = load i8* %arrayidx13, align 1
> >>> +  %conv14 = sext i8 %2 to i32
> >>> +  %arrayidx15 = getelementptr inbounds [10 x i8]* %a, i64 0, i64 6
> >>> +  %3 = load i8* %arrayidx15, align 1
> >>> +  %conv16 = sext i8 %3 to i32
> >>> +  %add17 = add nsw i32 %conv16, %conv14
> >>> +  ret i32 %add17
> >>> +
> >>> +; CHECK: @test1
> >>> +; CHECK-NOT: @llvm.memset
> >>> +}
> >>> +
> >>> +declare void @bar(...)
> >>>
> >>>
> >>>
> >>>
> >>> ------------------------------
> >>>
> >>> Message: 19
> >>> Date: Sat, 27 Oct 2012 14:25:51 -0000
> >>> From: Benjamin Kramer <benny.kra at googlemail.com>
> >>> To: llvm-commits at cs.uiuc.edu
> >>> Subject: [llvm-commits] [llvm] r166875 - in /llvm/trunk:
> >>>        lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> >>>        test/Transforms/LoopIdiom/basic.ll
> >>> Message-ID: <20121027142551.C47DF2A6C065 at llvm.org>
> >>> Content-Type: text/plain; charset="utf-8"
> >>>
> >>> Author: d0k
> >>> Date: Sat Oct 27 09:25:51 2012
> >>> New Revision: 166875
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=166875&view=rev
> >>> Log:
> >>> LoopIdiom: Recognize memmove loops.
> >>>
> >>> This turns loops like
> >>>  for (unsigned i = 0; i != n; ++i)
> >>>    p[i] = p[i+1];
> >>> into memmove, which has a highly optimized implementation in most libcs.
> >>>
> >>> This was really easy with the new DependenceAnalysis :)
> >>>
> >>> Modified:
> >>>    llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> >>>    llvm/trunk/test/Transforms/LoopIdiom/basic.ll
> >>>
> >>> Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
> >>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp?rev=166875&r1=166874&r2=166875&view=diff
> >>> ==============================================================================
> >>> --- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (original)
> >>> +++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Sat Oct 27 09:25:51 2012
> >>> @@ -16,7 +16,7 @@
> >>> // TODO List:
> >>> //
> >>> // Future loop memory idioms to recognize:
> >>> -//   memcmp, memmove, strlen, etc.
> >>> +//   memcmp, strlen, etc.
> >>> // Future floating point idioms to recognize in -ffast-math mode:
> >>> //   fpowi
> >>> // Future integer operation idioms to recognize:
> >>> @@ -60,8 +60,9 @@
> >>> #include "llvm/Transforms/Utils/Local.h"
> >>> using namespace llvm;
> >>>
> >>> -STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
> >>> -STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
> >>> +STATISTIC(NumMemSet, "Number of memsets formed from loop stores");
> >>> +STATISTIC(NumMemCpy, "Number of memcpys formed from loop load+stores");
> >>> +STATISTIC(NumMemMove, "Number of memmoves formed from loop load+stores");
> >>>
> >>> namespace {
> >>>   class LoopIdiomRecognize : public LoopPass {
> >>> @@ -532,6 +533,7 @@
> >>>   // stores) in the loop. We ignore the direct dependency between SI and LI here
> >>>   // and check it later.
> >>>   DependenceAnalysis &DA = getAnalysis<DependenceAnalysis>();
> >>> +  bool isMemcpySafe = true;
> >>>   for (Loop::block_iterator BI = CurLoop->block_begin(),
> >>>        BE = CurLoop->block_end(); BI != BE; ++BI)
> >>>     for (BasicBlock::iterator I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I)
> >>> @@ -552,8 +554,14 @@
> >>>   // Now check the dependency between SI and LI. If there is no dependency we
> >>>   // can safely emit a memcpy.
> >>>   OwningPtr<Dependence> Dep(DA.depends(SI, LI, true));
> >>> -  if (Dep)
> >>> -    return false;
> >>> +  if (Dep) {
> >>> +    // If there is a dependence but the direction is positive we can still
> >>> +    // safely turn this into memmove.
> >>> +    if (Dep->getLevels() != 1 ||
> >>> +        Dep->getDirection(1) != Dependence::DVEntry::GT)
> >>> +      return false;
> >>
> >> Why the restriction to an un-nested loop? Couldn't this work for a
> >> deeply-nested loop?
> >
> > LI and SI are always in the same loop, so the level cannot be != 1 here, right?
> 
> The level of a dependence is the number of enclosing loops that
> the two instructions have in common. If LI and SI are always in the
> same loop, then the level is exactly the number of loops enclosing them.
> For a more complex example, see DependenceAnalysis.h, near line 480.

Ah, I misunderstood how the levels work, and we miss the memmove transform on any loop enclosed in another. Are you suggesting something like "Dep->getDirection(Dep->getLevels())" and then check that it's either GT or NONE?

- Ben



More information about the llvm-commits mailing list