<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Excellent! :) <div><br><div><div>On Jun 23, 2013, at 8:55 PM, Arnold Schwaighofer <<a href="mailto:aschwaighofer@apple.com">aschwaighofer@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Author: arnolds<br>Date: Sun Jun 23 22:55:48 2013<br>New Revision: 184685<br><br>URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project?rev=184685&view=rev">http://llvm.org/viewvc/llvm-project?rev=184685&view=rev</a><br>Log:<br>LoopVectorize: Use the dependence test utility class<br><br>We now no longer need alias analysis - the cases that alias analysis would<br>handle are now handled as accesses with a large dependence distance.<br><br>We can now vectorize loops with simple constant dependence distances.<br><br> for (i = 8; i < 256; ++i) {<br> a[i] = a[i+4] * a[i+8];<br> }<br><br> for (i = 8; i < 256; ++i) {<br> a[i] = a[i-4] * a[i-8];<br> }<br><br>We would be able to vectorize about 200 more loops (in many cases the cost model<br>instructs us no to) in the test suite now. Results on x86-64 are a wash.<br><br>I have seen one degradation in ammp. Interestingly, the function in which we<br>now vectorize a loop is never executed so we probably see some instruction<br>cache effects. There is a 2% improvement in h264ref. There is one or the other<br>TSCV loop kernel that speeds up.<br><br><a href="radar://13681598">radar://13681598</a><br><br>Added:<br> llvm/trunk/test/Transforms/LoopVectorize/memdep.ll<br>Modified:<br> llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp<br> llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll<br> llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll<br><br>Modified: llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp<br>URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp?rev=184685&r1=184684&r2=184685&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp?rev=184685&r1=184684&r2=184685&view=diff</a><br>==============================================================================<br>--- llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp (original)<br>+++ llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp Sun Jun 23 22:55:48 2013<br>@@ -54,7 +54,6 @@<br>#include "llvm/ADT/SmallVector.h"<br>#include "llvm/ADT/StringExtras.h"<br>#include "llvm/Analysis/AliasAnalysis.h"<br>-#include "llvm/Analysis/AliasSetTracker.h"<br>#include "llvm/Analysis/Dominators.h"<br>#include "llvm/Analysis/LoopInfo.h"<br>#include "llvm/Analysis/LoopIterator.h"<br>@@ -409,11 +408,10 @@ bool LoadHoisting::canHoistAllLoads() {<br>class LoopVectorizationLegality {<br>public:<br> LoopVectorizationLegality(Loop *L, ScalarEvolution *SE, DataLayout *DL,<br>- DominatorTree *DT, TargetTransformInfo* TTI,<br>- AliasAnalysis *AA, TargetLibraryInfo *TLI)<br>- : TheLoop(L), SE(SE), DL(DL), DT(DT), TTI(TTI), AA(AA), TLI(TLI),<br>+ DominatorTree *DT, TargetLibraryInfo *TLI)<br>+ : TheLoop(L), SE(SE), DL(DL), DT(DT), TLI(TLI),<br> Induction(0), WidestIndTy(0), HasFunNoNaNAttr(false),<br>- LoadSpeculation(L, DT) {}<br>+ MaxSafeDepDistBytes(-1U), LoadSpeculation(L, DT) {}<br><br> /// This enum represents the kinds of reductions that we support.<br> enum ReductionKind {<br>@@ -500,7 +498,8 @@ public:<br> }<br><br> /// Insert a pointer and calculate the start and end SCEVs.<br>- void insert(ScalarEvolution *SE, Loop *Lp, Value *Ptr, bool WritePtr);<br>+ void insert(ScalarEvolution *SE, Loop *Lp, Value *Ptr, bool WritePtr,<br>+ unsigned DepSetId);<br><br> /// This flag indicates if we need to add the runtime check.<br> bool Need;<br>@@ -512,6 +511,9 @@ public:<br> SmallVector<const SCEV*, 2> Ends;<br> /// Holds the information if this pointer is used for writing to memory.<br> SmallVector<bool, 2> IsWritePtr;<br>+ /// Holds the id of the set of pointers that could be dependent because of a<br>+ /// shared underlying object.<br>+ SmallVector<unsigned, 2> DependencySetId;<br> };<br><br> /// A POD for saving information about induction variables.<br>@@ -532,11 +534,6 @@ public:<br> /// induction descriptor.<br> typedef MapVector<PHINode*, InductionInfo> InductionList;<br><br>- /// Alias(Multi)Map stores the values (GEPs or underlying objects and their<br>- /// respective Store/Load instruction(s) to calculate aliasing.<br>- typedef MapVector<Value*, Instruction* > AliasMap;<br>- typedef DenseMap<Value*, std::vector<Instruction*> > AliasMultiMap;<br>-<br> /// Returns true if it is legal to vectorize this loop.<br> /// This does not mean that it is profitable to vectorize this<br> /// loop, only that it is legal to do so.<br>@@ -583,6 +580,9 @@ public:<br> /// This function returns the identity element (or neutral element) for<br> /// the operation K.<br> static Constant *getReductionIdentity(ReductionKind K, Type *Tp);<br>+<br>+ unsigned getMaxSafeDepDistBytes() { return MaxSafeDepDistBytes; }<br>+<br>private:<br> /// Check if a single basic block loop is vectorizable.<br> /// At this point we know that this is a loop with a constant trip count<br>@@ -623,16 +623,6 @@ private:<br> /// Returns the induction kind of Phi. This function may return NoInduction<br> /// if the PHI is not an induction variable.<br> InductionKind isInductionVariable(PHINode *Phi);<br>- /// Return true if can compute the address bounds of Ptr within the loop.<br>- bool hasComputableBounds(Value *Ptr);<br>- /// Return true if there is the chance of write reorder.<br>- bool hasPossibleGlobalWriteReorder(Value *Object,<br>- Instruction *Inst,<br>- AliasMultiMap &WriteObjects,<br>- unsigned MaxByteWidth);<br>- /// Return the AA location for a load or a store.<br>- AliasAnalysis::Location getLoadStoreLocation(Instruction *Inst);<br>-<br><br> /// The loop that we evaluate.<br> Loop *TheLoop;<br>@@ -642,10 +632,6 @@ private:<br> DataLayout *DL;<br> /// Dominators.<br> DominatorTree *DT;<br>- /// Target Info.<br>- TargetTransformInfo *TTI;<br>- /// Alias Analysis.<br>- AliasAnalysis *AA;<br> /// Target Library Info.<br> TargetLibraryInfo *TLI;<br><br>@@ -675,6 +661,8 @@ private:<br> /// Can we assume the absence of NaNs.<br> bool HasFunNoNaNAttr;<br><br>+ unsigned MaxSafeDepDistBytes;<br>+<br> /// Utility to determine whether loads can be speculated.<br> LoadHoisting LoadSpeculation;<br>};<br>@@ -903,7 +891,6 @@ struct LoopVectorize : public LoopPass {<br> LoopInfo *LI;<br> TargetTransformInfo *TTI;<br> DominatorTree *DT;<br>- AliasAnalysis *AA;<br> TargetLibraryInfo *TLI;<br><br> virtual bool runOnLoop(Loop *L, LPPassManager &LPM) {<br>@@ -916,7 +903,6 @@ struct LoopVectorize : public LoopPass {<br> LI = &getAnalysis<LoopInfo>();<br> TTI = &getAnalysis<TargetTransformInfo>();<br> DT = &getAnalysis<DominatorTree>();<br>- AA = getAnalysisIfAvailable<AliasAnalysis>();<br> TLI = getAnalysisIfAvailable<TargetLibraryInfo>();<br><br> if (DL == NULL) {<br>@@ -935,7 +921,7 @@ struct LoopVectorize : public LoopPass {<br> }<br><br> // Check if it is legal to vectorize the loop.<br>- LoopVectorizationLegality LVL(L, SE, DL, DT, TTI, AA, TLI);<br>+ LoopVectorizationLegality LVL(L, SE, DL, DT, TLI);<br> if (!LVL.canVectorize()) {<br> DEBUG(dbgs() << "LV: Not vectorizing.\n");<br> return false;<br>@@ -1010,7 +996,8 @@ struct LoopVectorize : public LoopPass {<br>void<br>LoopVectorizationLegality::RuntimePointerCheck::insert(ScalarEvolution *SE,<br> Loop *Lp, Value *Ptr,<br>- bool WritePtr) {<br>+ bool WritePtr,<br>+ unsigned DepSetId) {<br> const SCEV *Sc = SE->getSCEV(Ptr);<br> const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);<br> assert(AR && "Invalid addrec expression");<br>@@ -1020,6 +1007,7 @@ LoopVectorizationLegality::RuntimePointe<br> Starts.push_back(AR->getStart());<br> Ends.push_back(ScEnd);<br> IsWritePtr.push_back(WritePtr);<br>+ DependencySetId.push_back(DepSetId);<br>}<br><br>Value *InnerLoopVectorizer::getBroadcastInstrs(Value *V) {<br>@@ -1357,10 +1345,9 @@ InnerLoopVectorizer::addRuntimeCheck(Loo<br> if (!PtrRtCheck->Need)<br> return NULL;<br><br>- Instruction *MemoryRuntimeCheck = 0;<br> unsigned NumPointers = PtrRtCheck->Pointers.size();<br>- SmallVector<Value* , 2> Starts;<br>- SmallVector<Value* , 2> Ends;<br>+ SmallVector<TrackingVH<Value> , 2> Starts;<br>+ SmallVector<TrackingVH<Value> , 2> Ends;<br><br> SCEVExpander Exp(*SE, "induction");<br><br>@@ -1387,13 +1374,18 @@ InnerLoopVectorizer::addRuntimeCheck(Loo<br> }<br><br> IRBuilder<> ChkBuilder(Loc);<br>-<br>+ // Our instructions might fold to a constant.<br>+ Value *MemoryRuntimeCheck = 0;<br> for (unsigned i = 0; i < NumPointers; ++i) {<br> for (unsigned j = i+1; j < NumPointers; ++j) {<br> // No need to check if two readonly pointers intersect.<br> if (!PtrRtCheck->IsWritePtr[i] && !PtrRtCheck->IsWritePtr[j])<br> continue;<br><br>+ // Only need to check pointers between two different dependency sets.<br>+ if (PtrRtCheck->DependencySetId[i] == PtrRtCheck->DependencySetId[j])<br>+ continue;<br>+<br> Value *Start0 = ChkBuilder.CreateBitCast(Starts[i], PtrArithTy, "bc");<br> Value *Start1 = ChkBuilder.CreateBitCast(Starts[j], PtrArithTy, "bc");<br> Value *End0 = ChkBuilder.CreateBitCast(Ends[i], PtrArithTy, "bc");<br>@@ -1405,12 +1397,18 @@ InnerLoopVectorizer::addRuntimeCheck(Loo<br> if (MemoryRuntimeCheck)<br> IsConflict = ChkBuilder.CreateOr(MemoryRuntimeCheck, IsConflict,<br> "conflict.rdx");<br>-<br>- MemoryRuntimeCheck = cast<Instruction>(IsConflict);<br>+ MemoryRuntimeCheck = IsConflict;<br> }<br> }<br><br>- return MemoryRuntimeCheck;<br>+ // We have to do this trickery because the IRBuilder might fold the check to a<br>+ // constant expression in which case there is no Instruction anchored in a<br>+ // the block.<br>+ LLVMContext &Ctx = Loc->getContext();<br>+ Instruction * Check = BinaryOperator::CreateAnd(MemoryRuntimeCheck,<br>+ ConstantInt::getTrue(Ctx));<br>+ ChkBuilder.Insert(Check, "memcheck.conflict");<br>+ return Check;<br>}<br><br>void<br>@@ -2981,7 +2979,7 @@ bool AccessAnalysis::canCheckPtrAtRT(<br> // Each access has its own dependence set.<br> DepId = RunningDepId++;<br><br>- //RtCheck.insert(SE, TheLoop, Ptr, IsWrite, DepId);<br>+ RtCheck.insert(SE, TheLoop, Ptr, IsWrite, DepId);<br><br> DEBUG(dbgs() << "LV: Found a runtime check ptr:" << *Ptr <<"\n");<br> } else {<br>@@ -3463,53 +3461,29 @@ MemoryDepChecker::areDepsSafe(AccessAnal<br> return true;<br>}<br><br>-AliasAnalysis::Location<br>-LoopVectorizationLegality::getLoadStoreLocation(Instruction *Inst) {<br>- if (StoreInst *Store = dyn_cast<StoreInst>(Inst))<br>- return AA->getLocation(Store);<br>- else if (LoadInst *Load = dyn_cast<LoadInst>(Inst))<br>- return AA->getLocation(Load);<br>-<br>- llvm_unreachable("Should be either load or store instruction");<br>-}<br>-<br>-bool<br>-LoopVectorizationLegality::hasPossibleGlobalWriteReorder(<br>- Value *Object,<br>- Instruction *Inst,<br>- AliasMultiMap& WriteObjects,<br>- unsigned MaxByteWidth) {<br>-<br>- AliasAnalysis::Location ThisLoc = getLoadStoreLocation(Inst);<br>-<br>- std::vector<Instruction*>::iterator<br>- it = WriteObjects[Object].begin(),<br>- end = WriteObjects[Object].end();<br>-<br>- for (; it != end; ++it) {<br>- Instruction* I = *it;<br>- if (I == Inst)<br>- continue;<br>-<br>- AliasAnalysis::Location ThatLoc = getLoadStoreLocation(I);<br>- if (AA->alias(ThisLoc.getWithNewSize(MaxByteWidth),<br>- ThatLoc.getWithNewSize(MaxByteWidth)))<br>- return true;<br>- }<br>- return false;<br>-}<br>-<br>bool LoopVectorizationLegality::canVectorizeMemory() {<br><br> typedef SmallVector<Value*, 16> ValueVector;<br> typedef SmallPtrSet<Value*, 16> ValueSet;<br>+<br>+ // Stores a pair of memory access location and whether the access is a store<br>+ // (true) or a load (false).<br>+ typedef std::pair<Value*, char> MemAccessInfo;<br>+ typedef DenseSet<MemAccessInfo> PtrAccessSet;<br>+<br> // Holds the Load and Store *instructions*.<br> ValueVector Loads;<br> ValueVector Stores;<br>+<br>+ // Holds all the different accesses in the loop.<br>+ unsigned NumReads = 0;<br>+ unsigned NumReadWrites = 0;<br>+<br> PtrRtCheck.Pointers.clear();<br> PtrRtCheck.Need = false;<br><br> const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel();<br>+ MemoryDepChecker DepChecker(SE, DL, TheLoop);<br><br> // For each block.<br> for (Loop::block_iterator bb = TheLoop->block_begin(),<br>@@ -3530,6 +3504,7 @@ bool LoopVectorizationLegality::canVecto<br> return false;<br> }<br> Loads.push_back(Ld);<br>+ DepChecker.addAccess(Ld);<br> continue;<br> }<br><br>@@ -3542,6 +3517,7 @@ bool LoopVectorizationLegality::canVecto<br> return false;<br> }<br> Stores.push_back(St);<br>+ DepChecker.addAccess(St);<br> }<br> } // next instr.<br> } // next block.<br>@@ -3556,10 +3532,8 @@ bool LoopVectorizationLegality::canVecto<br> return true;<br> }<br><br>- // Holds the read and read-write *pointers* that we find. These maps hold<br>- // unique values for pointers (so no need for multi-map).<br>- AliasMap Reads;<br>- AliasMap ReadWrites;<br>+ AccessAnalysis::DepCandidates DependentAccesses;<br>+ AccessAnalysis Accesses(DL, DependentAccesses);<br><br> // Holds the analyzed pointers. We don't want to call GetUnderlyingObjects<br> // multiple times on the same object. If the ptr is accessed twice, once<br>@@ -3578,10 +3552,12 @@ bool LoopVectorizationLegality::canVecto<br> return false;<br> }<br><br>- // If we did *not* see this pointer before, insert it to<br>- // the read-write list. At this phase it is only a 'write' list.<br>- if (Seen.insert(Ptr))<br>- ReadWrites.insert(std::make_pair(Ptr, ST));<br>+ // If we did *not* see this pointer before, insert it to the read-write<br>+ // list. At this phase it is only a 'write' list.<br>+ if (Seen.insert(Ptr)) {<br>+ ++NumReadWrites;<br>+ Accesses.addStore(Ptr);<br>+ }<br> }<br><br> if (IsAnnotatedParallel) {<br>@@ -3591,6 +3567,7 @@ bool LoopVectorizationLegality::canVecto<br> return true;<br> }<br><br>+ SmallPtrSet<Value *, 16> ReadOnlyPtr;<br> for (I = Loads.begin(), IE = Loads.end(); I != IE; ++I) {<br> LoadInst *LD = cast<LoadInst>(*I);<br> Value* Ptr = LD->getPointerOperand();<br>@@ -3602,51 +3579,44 @@ bool LoopVectorizationLegality::canVecto<br> // If the address of i is unknown (for example A[B[i]]) then we may<br> // read a few words, modify, and write a few words, and some of the<br> // words may be written to the same address.<br>- if (Seen.insert(Ptr) || 0 == isConsecutivePtr(Ptr))<br>- Reads.insert(std::make_pair(Ptr, LD));<br>+ bool IsReadOnlyPtr = false;<br>+ if (Seen.insert(Ptr) || !isStridedPtr(SE, DL, Ptr, TheLoop)) {<br>+ ++NumReads;<br>+ IsReadOnlyPtr = true;<br>+ }<br>+ Accesses.addLoad(Ptr, IsReadOnlyPtr);<br> }<br><br> // If we write (or read-write) to a single destination and there are no<br> // other reads in this loop then is it safe to vectorize.<br>- if (ReadWrites.size() == 1 && Reads.size() == 0) {<br>+ if (NumReadWrites == 1 && NumReads == 0) {<br> DEBUG(dbgs() << "LV: Found a write-only loop!\n");<br> return true;<br> }<br><br>- unsigned NumReadPtrs = 0;<br>- unsigned NumWritePtrs = 0;<br>+ // Build dependence sets and check whether we need a runtime pointer bounds<br>+ // check.<br>+ Accesses.buildDependenceSets();<br>+ bool NeedRTCheck = Accesses.isRTCheckNeeded();<br><br> // Find pointers with computable bounds. We are going to use this information<br> // to place a runtime bound check.<br>- bool CanDoRT = true;<br>- AliasMap::iterator MI, ME;<br>- for (MI = ReadWrites.begin(), ME = ReadWrites.end(); MI != ME; ++MI) {<br>- Value *V = (*MI).first;<br>- if (hasComputableBounds(V)) {<br>- PtrRtCheck.insert(SE, TheLoop, V, true);<br>- NumWritePtrs++;<br>- DEBUG(dbgs() << "LV: Found a runtime check ptr:" << *V <<"\n");<br>- } else {<br>- CanDoRT = false;<br>- break;<br>- }<br>- }<br>- for (MI = Reads.begin(), ME = Reads.end(); MI != ME; ++MI) {<br>- Value *V = (*MI).first;<br>- if (hasComputableBounds(V)) {<br>- PtrRtCheck.insert(SE, TheLoop, V, false);<br>- NumReadPtrs++;<br>- DEBUG(dbgs() << "LV: Found a runtime check ptr:" << *V <<"\n");<br>- } else {<br>- CanDoRT = false;<br>- break;<br>- }<br>- }<br>+ unsigned NumComparisons = 0;<br>+ bool CanDoRT = false;<br>+ if (NeedRTCheck)<br>+ CanDoRT = Accesses.canCheckPtrAtRT(PtrRtCheck, NumComparisons, SE, TheLoop);<br>+<br>+<br>+ DEBUG(dbgs() << "LV: We need to do " << NumComparisons <<<br>+ " pointer comparisons.\n");<br>+<br>+ // If we only have one set of dependences to check pointers among we don't<br>+ // need a runtime check.<br>+ if (NumComparisons == 0 && NeedRTCheck)<br>+ NeedRTCheck = false;<br><br>- // Check that we did not collect too many pointers or found a<br>- // unsizeable pointer.<br>- unsigned NumComparisons = (NumWritePtrs * (NumReadPtrs + NumWritePtrs - 1));<br>- DEBUG(dbgs() << "LV: We need to compare " << NumComparisons << " ptrs.\n");<br>+ // Check that we did not collect too many pointers or found a unsizeable<br>+ // pointer.<br> if (!CanDoRT || NumComparisons > RuntimeMemoryCheckThreshold) {<br> PtrRtCheck.reset();<br> CanDoRT = false;<br>@@ -3656,113 +3626,6 @@ bool LoopVectorizationLegality::canVecto<br> DEBUG(dbgs() << "LV: We can perform a memory runtime check if needed.\n");<br> }<br><br>- bool NeedRTCheck = false;<br>-<br>- // Biggest vectorized access possible, vector width * unroll factor.<br>- // TODO: We're being very pessimistic here, find a way to know the<br>- // real access width before getting here.<br>- unsigned MaxByteWidth = (TTI->getRegisterBitWidth(true) / 8) *<br>- TTI->getMaximumUnrollFactor();<br>- // Now that the pointers are in two lists (Reads and ReadWrites), we<br>- // can check that there are no conflicts between each of the writes and<br>- // between the writes to the reads.<br>- // Note that WriteObjects duplicates the stores (indexed now by underlying<br>- // objects) to avoid pointing to elements inside ReadWrites.<br>- // TODO: Maybe create a new type where they can interact without duplication.<br>- AliasMultiMap WriteObjects;<br>- ValueVector TempObjects;<br>-<br>- // Check that the read-writes do not conflict with other read-write<br>- // pointers.<br>- bool AllWritesIdentified = true;<br>- for (MI = ReadWrites.begin(), ME = ReadWrites.end(); MI != ME; ++MI) {<br>- Value *Val = (*MI).first;<br>- Instruction *Inst = (*MI).second;<br>-<br>- GetUnderlyingObjects(Val, TempObjects, DL);<br>- for (ValueVector::iterator UI=TempObjects.begin(), UE=TempObjects.end();<br>- UI != UE; ++UI) {<br>- if (!isIdentifiedObject(*UI)) {<br>- DEBUG(dbgs() << "LV: Found an unidentified write ptr:"<< **UI <<"\n");<br>- NeedRTCheck = true;<br>- AllWritesIdentified = false;<br>- }<br>-<br>- // Never seen it before, can't alias.<br>- if (WriteObjects[*UI].empty()) {<br>- DEBUG(dbgs() << "LV: Adding Underlying value:" << **UI <<"\n");<br>- WriteObjects[*UI].push_back(Inst);<br>- continue;<br>- }<br>- // Direct alias found.<br>- if (!AA || dyn_cast<GlobalValue>(*UI) == NULL) {<br>- DEBUG(dbgs() << "LV: Found a possible write-write reorder:"<br>- << **UI <<"\n");<br>- return false;<br>- }<br>- DEBUG(dbgs() << "LV: Found a conflicting global value:"<br>- << **UI <<"\n");<br>- DEBUG(dbgs() << "LV: While examining store:" << *Inst <<"\n");<br>- DEBUG(dbgs() << "LV: On value:" << *Val <<"\n");<br>-<br>- // If global alias, make sure they do alias.<br>- if (hasPossibleGlobalWriteReorder(*UI,<br>- Inst,<br>- WriteObjects,<br>- MaxByteWidth)) {<br>- DEBUG(dbgs() << "LV: Found a possible write-write reorder:" << **UI<br>- << "\n");<br>- return false;<br>- }<br>-<br>- // Didn't alias, insert into map for further reference.<br>- WriteObjects[*UI].push_back(Inst);<br>- }<br>- TempObjects.clear();<br>- }<br>-<br>- /// Check that the reads don't conflict with the read-writes.<br>- for (MI = Reads.begin(), ME = Reads.end(); MI != ME; ++MI) {<br>- Value *Val = (*MI).first;<br>- GetUnderlyingObjects(Val, TempObjects, DL);<br>- for (ValueVector::iterator UI=TempObjects.begin(), UE=TempObjects.end();<br>- UI != UE; ++UI) {<br>- // If all of the writes are identified then we don't care if the read<br>- // pointer is identified or not.<br>- if (!AllWritesIdentified && !isIdentifiedObject(*UI)) {<br>- DEBUG(dbgs() << "LV: Found an unidentified read ptr:"<< **UI <<"\n");<br>- NeedRTCheck = true;<br>- }<br>-<br>- // Never seen it before, can't alias.<br>- if (WriteObjects[*UI].empty())<br>- continue;<br>- // Direct alias found.<br>- if (!AA || dyn_cast<GlobalValue>(*UI) == NULL) {<br>- DEBUG(dbgs() << "LV: Found a possible write-write reorder:"<br>- << **UI <<"\n");<br>- return false;<br>- }<br>- DEBUG(dbgs() << "LV: Found a global value: "<br>- << **UI <<"\n");<br>- Instruction *Inst = (*MI).second;<br>- DEBUG(dbgs() << "LV: While examining load:" << *Inst <<"\n");<br>- DEBUG(dbgs() << "LV: On value:" << *Val <<"\n");<br>-<br>- // If global alias, make sure they do alias.<br>- if (hasPossibleGlobalWriteReorder(*UI,<br>- Inst,<br>- WriteObjects,<br>- MaxByteWidth)) {<br>- DEBUG(dbgs() << "LV: Found a possible read-write reorder:" << **UI<br>- << "\n");<br>- return false;<br>- }<br>- }<br>- TempObjects.clear();<br>- }<br>-<br>- PtrRtCheck.Need = NeedRTCheck;<br> if (NeedRTCheck && !CanDoRT) {<br> DEBUG(dbgs() << "LV: We can't vectorize because we can't find " <<<br> "the array bounds.\n");<br>@@ -3770,9 +3633,20 @@ bool LoopVectorizationLegality::canVecto<br> return false;<br> }<br><br>+ PtrRtCheck.Need = NeedRTCheck;<br>+<br>+ bool CanVecMem = true;<br>+ if (Accesses.isDependencyCheckNeeded()) {<br>+ DEBUG(dbgs() << "LV: Checking memory dependencies\n");<br>+ CanVecMem = DepChecker.areDepsSafe(DependentAccesses,<br>+ Accesses.getDependenciesToCheck());<br>+ MaxSafeDepDistBytes = DepChecker.getMaxSafeDepDistBytes();<br>+ }<br>+<br> DEBUG(dbgs() << "LV: We "<< (NeedRTCheck ? "" : "don't") <<<br> " need a runtime memory check.\n");<br>- return true;<br>+<br>+ return CanVecMem;<br>}<br><br>static bool hasMultipleUsesOf(Instruction *I,<br>@@ -4125,15 +3999,6 @@ bool LoopVectorizationLegality::blockCan<br> return true;<br>}<br><br>-bool LoopVectorizationLegality::hasComputableBounds(Value *Ptr) {<br>- const SCEV *PhiScev = SE->getSCEV(Ptr);<br>- const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PhiScev);<br>- if (!AR)<br>- return false;<br>-<br>- return AR->isAffine();<br>-}<br>-<br>LoopVectorizationCostModel::VectorizationFactor<br>LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize,<br> unsigned UserVF) {<br>@@ -4150,6 +4015,10 @@ LoopVectorizationCostModel::selectVector<br><br> unsigned WidestType = getWidestType();<br> unsigned WidestRegister = TTI.getRegisterBitWidth(true);<br>+ unsigned MaxSafeDepDist = -1U;<br>+ if (Legal->getMaxSafeDepDistBytes() != -1U)<br>+ MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;<br>+ WidestRegister = WidestRegister < MaxSafeDepDist ? WidestRegister : MaxSafeDepDist;<br> unsigned MaxVectorSize = WidestRegister / WidestType;<br> DEBUG(dbgs() << "LV: The Widest type: " << WidestType << " bits.\n");<br> DEBUG(dbgs() << "LV: The Widest register is:" << WidestRegister << "bits.\n");<br>@@ -4283,6 +4152,10 @@ LoopVectorizationCostModel::selectUnroll<br> if (OptForSize)<br> return 1;<br><br>+ // We used the distance for the unroll factor.<br>+ if (Legal->getMaxSafeDepDistBytes() != -1U)<br>+ return 1;<br>+<br> // Do not unroll loops with a relatively small trip count.<br> unsigned TC = SE->getSmallConstantTripCount(TheLoop,<br> TheLoop->getLoopLatch());<br>@@ -4679,7 +4552,6 @@ Type* LoopVectorizationCostModel::ToVect<br>char LoopVectorize::ID = 0;<br>static const char lv_name[] = "Loop Vectorization";<br>INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)<br>-INITIALIZE_AG_DEPENDENCY(AliasAnalysis)<br>INITIALIZE_AG_DEPENDENCY(TargetTransformInfo)<br>INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)<br>INITIALIZE_PASS_DEPENDENCY(LoopSimplify)<br><br>Modified: llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll<br>URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll?rev=184685&r1=184684&r2=184685&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll?rev=184685&r1=184684&r2=184685&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll (original)<br>+++ llvm/trunk/test/Transforms/LoopVectorize/12-12-11-if-conv.ll Sun Jun 23 22:55:48 2013<br>@@ -30,7 +30,7 @@ if.then:<br>if.end: ; preds = %for.body, %if.then<br> %z.0 = phi i32 [ %add1, %if.then ], [ 9, %for.body ]<br> store i32 %z.0, i32* %arrayidx, align 4<br>- %indvars.iv.next = add i64 %indvars.iv, 1<br>+ %indvars.iv.next = add nsw i64 %indvars.iv, 1<br> %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br> %exitcond = icmp eq i32 %lftr.wideiv, %x<br> br i1 %exitcond, label %for.end, label %for.body<br><br>Added: llvm/trunk/test/Transforms/LoopVectorize/memdep.ll<br>URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/memdep.ll?rev=184685&view=auto">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/memdep.ll?rev=184685&view=auto</a><br>==============================================================================<br>--- llvm/trunk/test/Transforms/LoopVectorize/memdep.ll (added)<br>+++ llvm/trunk/test/Transforms/LoopVectorize/memdep.ll Sun Jun 23 22:55:48 2013<br>@@ -0,0 +1,222 @@<br>+; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-unroll=1 -S | FileCheck %s<br>+; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-unroll=1 -S | FileCheck %s -check-prefix=WIDTH<br>+<br>+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"<br>+<br>+; Vectorization with dependence checks.<br>+<br>+; No plausible dependence - can be vectorized.<br>+; for (i = 0; i < 1024; ++i)<br>+; A[i] = A[i + 1] + 1;<br>+<br>+; CHECK: f1_vec<br>+; CHECK: <2 x i32><br>+<br>+define void @f1_vec(i32* %A) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %indvars.iv.next = add i32 %indvars.iv, 1<br>+ %arrayidx = getelementptr inbounds i32* %A, i32 %indvars.iv.next<br>+ %0 = load i32* %arrayidx, align 4<br>+ %add1 = add nsw i32 %0, 1<br>+ %arrayidx3 = getelementptr inbounds i32* %A, i32 %indvars.iv<br>+ store i32 %add1, i32* %arrayidx3, align 4<br>+ %exitcond = icmp ne i32 %indvars.iv.next, 1024<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br>+<br>+; Plausible dependence of distance 1 - can't be vectorized.<br>+; for (i = 0; i < 1024; ++i)<br>+; A[i+1] = A[i] + 1;<br>+<br>+; CHECK: f2_novec<br>+; CHECK-NOT: <2 x i32><br>+<br>+define void @f2_novec(i32* %A) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %arrayidx = getelementptr inbounds i32* %A, i32 %indvars.iv<br>+ %0 = load i32* %arrayidx, align 4<br>+ %add = add nsw i32 %0, 1<br>+ %indvars.iv.next = add i32 %indvars.iv, 1<br>+ %arrayidx3 = getelementptr inbounds i32* %A, i32 %indvars.iv.next<br>+ store i32 %add, i32* %arrayidx3, align 4<br>+ %exitcond = icmp ne i32 %indvars.iv.next, 1024<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br>+<br>+; Plausible dependence of distance 2 - can be vectorized with a width of 2.<br>+; for (i = 0; i < 1024; ++i)<br>+; A[i+2] = A[i] + 1;<br>+<br>+; CHECK: f3_vec_len<br>+; CHECK: <2 x i32><br>+<br>+; WIDTH: f3_vec_len<br>+; WIDTH-NOT: <4 x i32><br>+<br>+define void @f3_vec_len(i32* %A) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>+ %idxprom = sext i32 %i.01 to i64<br>+ %arrayidx = getelementptr inbounds i32* %A, i64 %idxprom<br>+ %0 = load i32* %arrayidx, align 4<br>+ %add = add nsw i32 %0, 1<br>+ %add1 = add nsw i32 %i.01, 2<br>+ %idxprom2 = sext i32 %add1 to i64<br>+ %arrayidx3 = getelementptr inbounds i32* %A, i64 %idxprom2<br>+ store i32 %add, i32* %arrayidx3, align 4<br>+ %inc = add nsw i32 %i.01, 1<br>+ %cmp = icmp slt i32 %inc, 1024<br>+ br i1 %cmp, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br>+<br>+; Plausible dependence of distance 1 - cannot be vectorized (without reordering<br>+; accesses).<br>+; for (i = 0; i < 1024; ++i) {<br>+; B[i] = A[i];<br>+; A[i] = B[i + 1];<br>+; }<br>+<br>+; CHECK: f5<br>+; CHECK-NOT: <2 x i32><br>+<br>+define void @f5(i32* %A, i32* %B) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %arrayidx = getelementptr inbounds i32* %A, i64 %indvars.iv<br>+ %0 = load i32* %arrayidx, align 4<br>+ %arrayidx2 = getelementptr inbounds i32* %B, i64 %indvars.iv<br>+ store i32 %0, i32* %arrayidx2, align 4<br>+ %indvars.iv.next = add nsw i64 %indvars.iv, 1<br>+ %arrayidx4 = getelementptr inbounds i32* %B, i64 %indvars.iv.next<br>+ %1 = load i32* %arrayidx4, align 4<br>+ store i32 %1, i32* %arrayidx, align 4<br>+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br>+ %exitcond = icmp ne i32 %lftr.wideiv, 1024<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br>+<br>+; Dependence through a phi node - must not vectorize.<br>+; for (i = 0; i < 1024; ++i) {<br>+; a[i+1] = tmp;<br>+; tmp = a[i];<br>+; }<br>+<br>+; CHECK: f6<br>+; CHECK-NOT: <2 x i32><br>+<br>+define i32 @f6(i32* %a, i32 %tmp) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %tmp.addr.08 = phi i32 [ %tmp, %entry ], [ %0, %for.body ]<br>+ %indvars.iv.next = add nsw i64 %indvars.iv, 1<br>+ %arrayidx = getelementptr inbounds i32* %a, i64 %indvars.iv.next<br>+ store i32 %tmp.addr.08, i32* %arrayidx, align 4<br>+ %arrayidx3 = getelementptr inbounds i32* %a, i64 %indvars.iv<br>+ %0 = load i32* %arrayidx3, align 4<br>+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br>+ %exitcond = icmp ne i32 %lftr.wideiv, 1024<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret i32 undef<br>+}<br>+<br>+; Don't vectorize true loop carried dependencies that are not a multiple of the<br>+; vector width.<br>+; Example:<br>+; for (int i = ...; ++i) {<br>+; a[i] = a[i-3] + ...;<br>+; It is a bad idea to vectorize this loop because store-load forwarding will not<br>+; happen.<br>+;<br>+<br>+; CHECK: @nostoreloadforward<br>+; CHECK-NOT: <2 x i32><br>+<br>+define void @nostoreloadforward(i32* %A) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i64 [ 16, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %0 = add nsw i64 %indvars.iv, -3<br>+ %arrayidx = getelementptr inbounds i32* %A, i64 %0<br>+ %1 = load i32* %arrayidx, align 4<br>+ %2 = add nsw i64 %indvars.iv, 4<br>+ %arrayidx2 = getelementptr inbounds i32* %A, i64 %2<br>+ %3 = load i32* %arrayidx2, align 4<br>+ %add3 = add nsw i32 %3, %1<br>+ %arrayidx5 = getelementptr inbounds i32* %A, i64 %indvars.iv<br>+ store i32 %add3, i32* %arrayidx5, align 4<br>+ %indvars.iv.next = add i64 %indvars.iv, 1<br>+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br>+ %exitcond = icmp ne i32 %lftr.wideiv, 128<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br>+<br>+; Example:<br>+; for (int i = ...; ++i) {<br>+; a[i] = b[i];<br>+; c[i] = a[i-3] + ...;<br>+; It is a bad idea to vectorize this loop because store-load forwarding will not<br>+; happen.<br>+;<br>+<br>+; CHECK: @nostoreloadforward2<br>+; CHECK-NOT: <2 x i32><br>+<br>+define void @nostoreloadforward2(i32* noalias %A, i32* noalias %B, i32* noalias %C) {<br>+entry:<br>+ br label %for.body<br>+<br>+for.body:<br>+ %indvars.iv = phi i64 [ 16, %entry ], [ %indvars.iv.next, %for.body ]<br>+ %arrayidx = getelementptr inbounds i32* %B, i64 %indvars.iv<br>+ %0 = load i32* %arrayidx, align 4<br>+ %arrayidx2 = getelementptr inbounds i32* %A, i64 %indvars.iv<br>+ store i32 %0, i32* %arrayidx2, align 4<br>+ %1 = add nsw i64 %indvars.iv, -3<br>+ %arrayidx4 = getelementptr inbounds i32* %A, i64 %1<br>+ %2 = load i32* %arrayidx4, align 4<br>+ %arrayidx6 = getelementptr inbounds i32* %C, i64 %indvars.iv<br>+ store i32 %2, i32* %arrayidx6, align 4<br>+ %indvars.iv.next = add i64 %indvars.iv, 1<br>+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br>+ %exitcond = icmp ne i32 %lftr.wideiv, 128<br>+ br i1 %exitcond, label %for.body, label %for.end<br>+<br>+for.end:<br>+ ret void<br>+}<br><br>Modified: llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll<br>URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll?rev=184685&r1=184684&r2=184685&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll?rev=184685&r1=184684&r2=184685&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll (original)<br>+++ llvm/trunk/test/Transforms/LoopVectorize/runtime-check.ll Sun Jun 23 22:55:48 2013<br>@@ -12,7 +12,7 @@ target triple = "x86_64-apple-macosx10.9<br>;CHECK: for.body.preheader:<br>;CHECK: br i1 %cmp.zero, label %middle.block, label %vector.memcheck<br>;CHECK: vector.memcheck:<br>-;CHECK: br i1 %found.conflict, label %middle.block, label %vector.ph<br>+;CHECK: br i1 %memcheck.conflict, label %middle.block, label %vector.ph<br>;CHECK: load <4 x float><br>define i32 @foo(float* nocapture %a, float* nocapture %b, i32 %n) nounwind uwtable ssp {<br>entry:<br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></div></blockquote></div><br></div></body></html>