<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Reverted in <span style="font-family: Menlo; font-size: 11px;" class="">Committed revision 258703.</span><div><br class=""></div><div>I am trying to produce a test case.</div><div><br class=""></div><div>Cheers,</div><div>-Quentin<br class=""><blockquote type="cite" class=""><div class="">On Jan 25, 2016, at 10:43 AM, Quentin Colombet via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi Haicheng,<br class=""><br class="">David suggested that this commit may cause:<br class=""><a href="https://llvm.org/bugs/show_bug.cgi?id=26293" class="">https://llvm.org/bugs/show_bug.cgi?id=26293</a><br class=""><br class="">Could you have a look please?<br class=""><br class="">In the meantime, I am going to revert to check if this is actually the problem.<br class=""><br class="">Thanks,<br class="">-Quentin<br class=""><blockquote type="cite" class="">On Jan 22, 2016, at 10:52 PM, Haicheng Wu via llvm-commits <llvm-commits@lists.llvm.org> wrote:<br class=""><br class="">Author: haicheng<br class="">Date: Sat Jan 23 00:52:41 2016<br class="">New Revision: 258620<br class=""><br class="">URL: http://llvm.org/viewvc/llvm-project?rev=258620&view=rev<br class="">Log:<br class="">[LIR] Add support for structs and hand unrolled loops<br class=""><br class="">Now LIR can turn following codes into memset:<br class=""><br class="">typedef struct foo {<br class=""> int a;<br class=""> int b;<br class="">} foo_t;<br class=""><br class="">void bar(foo_t *f, unsigned n) {<br class=""> for (unsigned i = 0; i < n; ++i) {<br class=""> f[i].a = 0;<br class=""> f[i].b = 0;<br class=""> }<br class="">}<br class=""><br class="">void test(foo_t *f, unsigned n) {<br class=""> for (unsigned i = 0; i < n; i += 2) {<br class=""> f[i] = 0;<br class=""> f[i+1] = 0;<br class=""> }<br class="">}<br class=""><br class="">Added:<br class=""> llvm/trunk/test/Transforms/LoopIdiom/struct.ll<br class=""> llvm/trunk/test/Transforms/LoopIdiom/struct_pattern.ll<br class=""> llvm/trunk/test/Transforms/LoopIdiom/unroll.ll<br class="">Modified:<br class=""> llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h<br class=""> llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp<br class=""> llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp<br class=""> llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br class=""><br class="">Modified: llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=258620&r1=258619&r2=258620&view=diff<br class="">==============================================================================<br class="">--- llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h (original)<br class="">+++ llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h Sat Jan 23 00:52:41 2016<br class="">@@ -659,6 +659,11 @@ const SCEV *replaceSymbolicStrideSCEV(Pr<br class="">int isStridedPtr(PredicatedScalarEvolution &PSE, Value *Ptr, const Loop *Lp,<br class=""> const ValueToValueMap &StridesMap);<br class=""><br class="">+/// \brief Returns true if the memory operations \p A and \p B are consecutive.<br class="">+/// This is a simple API that does not depend on the analysis pass. <br class="">+bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,<br class="">+ ScalarEvolution &SE, bool CheckType = true);<br class="">+<br class="">/// \brief This analysis provides dependence information for the memory accesses<br class="">/// of a loop.<br class="">///<br class=""><br class="">Modified: llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=258620&r1=258619&r2=258620&view=diff<br class="">==============================================================================<br class="">--- llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp (original)<br class="">+++ llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp Sat Jan 23 00:52:41 2016<br class="">@@ -901,6 +901,78 @@ int llvm::isStridedPtr(PredicatedScalarE<br class=""> return Stride;<br class="">}<br class=""><br class="">+/// Take the pointer operand from the Load/Store instruction.<br class="">+/// Returns NULL if this is not a valid Load/Store instruction.<br class="">+static Value *getPointerOperand(Value *I) {<br class="">+ if (LoadInst *LI = dyn_cast<LoadInst>(I))<br class="">+ return LI->getPointerOperand();<br class="">+ if (StoreInst *SI = dyn_cast<StoreInst>(I))<br class="">+ return SI->getPointerOperand();<br class="">+ return nullptr;<br class="">+}<br class="">+<br class="">+/// Take the address space operand from the Load/Store instruction.<br class="">+/// Returns -1 if this is not a valid Load/Store instruction.<br class="">+static unsigned getAddressSpaceOperand(Value *I) {<br class="">+ if (LoadInst *L = dyn_cast<LoadInst>(I))<br class="">+ return L->getPointerAddressSpace();<br class="">+ if (StoreInst *S = dyn_cast<StoreInst>(I))<br class="">+ return S->getPointerAddressSpace();<br class="">+ return -1;<br class="">+}<br class="">+<br class="">+/// Returns true if the memory operations \p A and \p B are consecutive.<br class="">+bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,<br class="">+ ScalarEvolution &SE, bool CheckType) {<br class="">+ Value *PtrA = getPointerOperand(A);<br class="">+ Value *PtrB = getPointerOperand(B);<br class="">+ unsigned ASA = getAddressSpaceOperand(A);<br class="">+ unsigned ASB = getAddressSpaceOperand(B);<br class="">+<br class="">+ // Check that the address spaces match and that the pointers are valid.<br class="">+ if (!PtrA || !PtrB || (ASA != ASB))<br class="">+ return false;<br class="">+<br class="">+ // Make sure that A and B are different pointers.<br class="">+ if (PtrA == PtrB)<br class="">+ return false;<br class="">+<br class="">+ // Make sure that A and B have the same type if required.<br class="">+ if(CheckType && PtrA->getType() != PtrB->getType())<br class="">+ return false;<br class="">+<br class="">+ unsigned PtrBitWidth = DL.getPointerSizeInBits(ASA);<br class="">+ Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();<br class="">+ APInt Size(PtrBitWidth, DL.getTypeStoreSize(Ty));<br class="">+<br class="">+ APInt OffsetA(PtrBitWidth, 0), OffsetB(PtrBitWidth, 0);<br class="">+ PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);<br class="">+ PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);<br class="">+<br class="">+ // OffsetDelta = OffsetB - OffsetA;<br class="">+ const SCEV *OffsetSCEVA = SE.getConstant(OffsetA);<br class="">+ const SCEV *OffsetSCEVB = SE.getConstant(OffsetB);<br class="">+ const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB, OffsetSCEVA);<br class="">+ const SCEVConstant *OffsetDeltaC = dyn_cast<SCEVConstant>(OffsetDeltaSCEV);<br class="">+ const APInt &OffsetDelta = OffsetDeltaC->getAPInt();<br class="">+ // Check if they are based on the same pointer. That makes the offsets<br class="">+ // sufficient.<br class="">+ if (PtrA == PtrB)<br class="">+ return OffsetDelta == Size;<br class="">+<br class="">+ // Compute the necessary base pointer delta to have the necessary final delta<br class="">+ // equal to the size.<br class="">+ // BaseDelta = Size - OffsetDelta;<br class="">+ const SCEV *SizeSCEV = SE.getConstant(Size);<br class="">+ const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV);<br class="">+<br class="">+ // Otherwise compute the distance with SCEV between the base pointers.<br class="">+ const SCEV *PtrSCEVA = SE.getSCEV(PtrA);<br class="">+ const SCEV *PtrSCEVB = SE.getSCEV(PtrB);<br class="">+ const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta);<br class="">+ return X == PtrSCEVB;<br class="">+}<br class="">+<br class="">bool MemoryDepChecker::Dependence::isSafeForVectorization(DepType Type) {<br class=""> switch (Type) {<br class=""> case NoDep:<br class=""><br class="">Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp?rev=258620&r1=258619&r2=258620&view=diff<br class="">==============================================================================<br class="">--- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (original)<br class="">+++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Sat Jan 23 00:52:41 2016<br class="">@@ -26,22 +26,20 @@<br class="">// i64 and larger types when i64 is legal and the value has few bits set. It<br class="">// would be good to enhance isel to emit a loop for ctpop in this case.<br class="">//<br class="">-// We should enhance the memset/memcpy recognition to handle multiple stores in<br class="">-// the loop. This would handle things like:<br class="">-// void foo(_Complex float *P)<br class="">-// for (i) { __real__(*P) = 0; __imag__(*P) = 0; }<br class="">-//<br class="">// This could recognize common matrix multiplies and dot product idioms and<br class="">// replace them with calls to BLAS (if linked in??).<br class="">//<br class="">//===----------------------------------------------------------------------===//<br class=""><br class="">#include "llvm/Transforms/Scalar.h"<br class="">+#include "llvm/ADT/MapVector.h"<br class="">+#include "llvm/ADT/SetVector.h"<br class="">#include "llvm/ADT/Statistic.h"<br class="">#include "llvm/Analysis/AliasAnalysis.h"<br class="">#include "llvm/Analysis/BasicAliasAnalysis.h"<br class="">#include "llvm/Analysis/GlobalsModRef.h"<br class="">#include "llvm/Analysis/LoopPass.h"<br class="">+#include "llvm/Analysis/LoopAccessAnalysis.h"<br class="">#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"<br class="">#include "llvm/Analysis/ScalarEvolutionExpander.h"<br class="">#include "llvm/Analysis/ScalarEvolutionExpressions.h"<br class="">@@ -108,7 +106,9 @@ public:<br class=""><br class="">private:<br class=""> typedef SmallVector<StoreInst *, 8> StoreList;<br class="">- StoreList StoreRefsForMemset;<br class="">+ typedef MapVector<Value *, StoreList> StoreListMap;<br class="">+ StoreListMap StoreRefsForMemset;<br class="">+ StoreListMap StoreRefsForMemsetPattern;<br class=""> StoreList StoreRefsForMemcpy;<br class=""> bool HasMemset;<br class=""> bool HasMemsetPattern;<br class="">@@ -122,14 +122,18 @@ private:<br class=""> SmallVectorImpl<BasicBlock *> &ExitBlocks);<br class=""><br class=""> void collectStores(BasicBlock *BB);<br class="">- bool isLegalStore(StoreInst *SI, bool &ForMemset, bool &ForMemcpy);<br class="">- bool processLoopStore(StoreInst *SI, const SCEV *BECount);<br class="">+ bool isLegalStore(StoreInst *SI, bool &ForMemset, bool &ForMemsetPattern,<br class="">+ bool &ForMemcpy);<br class="">+ bool processLoopStores(SmallVectorImpl<StoreInst *> &SL, const SCEV *BECount,<br class="">+ bool ForMemset);<br class=""> bool processLoopMemSet(MemSetInst *MSI, const SCEV *BECount);<br class=""><br class=""> bool processLoopStridedStore(Value *DestPtr, unsigned StoreSize,<br class=""> unsigned StoreAlignment, Value *StoredVal,<br class="">- Instruction *TheStore, const SCEVAddRecExpr *Ev,<br class="">- const SCEV *BECount, bool NegStride);<br class="">+ Instruction *TheStore,<br class="">+ SmallPtrSetImpl<Instruction *> &Stores,<br class="">+ const SCEVAddRecExpr *Ev, const SCEV *BECount,<br class="">+ bool NegStride);<br class=""> bool processLoopStoreOfLoopLoad(StoreInst *SI, const SCEV *BECount);<br class=""><br class=""> /// @}<br class="">@@ -305,7 +309,7 @@ static Constant *getMemSetPatternValue(V<br class="">}<br class=""><br class="">bool LoopIdiomRecognize::isLegalStore(StoreInst *SI, bool &ForMemset,<br class="">- bool &ForMemcpy) {<br class="">+ bool &ForMemsetPattern, bool &ForMemcpy) {<br class=""> // Don't touch volatile stores.<br class=""> if (!SI->isSimple())<br class=""> return false;<br class="">@@ -353,7 +357,7 @@ bool LoopIdiomRecognize::isLegalStore(St<br class=""> StorePtr->getType()->getPointerAddressSpace() == 0 &&<br class=""> (PatternValue = getMemSetPatternValue(StoredVal, DL))) {<br class=""> // It looks like we can use PatternValue!<br class="">- ForMemset = true;<br class="">+ ForMemsetPattern = true;<br class=""> return true;<br class=""> }<br class=""><br class="">@@ -393,6 +397,7 @@ bool LoopIdiomRecognize::isLegalStore(St<br class=""><br class="">void LoopIdiomRecognize::collectStores(BasicBlock *BB) {<br class=""> StoreRefsForMemset.clear();<br class="">+ StoreRefsForMemsetPattern.clear();<br class=""> StoreRefsForMemcpy.clear();<br class=""> for (Instruction &I : *BB) {<br class=""> StoreInst *SI = dyn_cast<StoreInst>(&I);<br class="">@@ -400,15 +405,22 @@ void LoopIdiomRecognize::collectStores(B<br class=""> continue;<br class=""><br class=""> bool ForMemset = false;<br class="">+ bool ForMemsetPattern = false;<br class=""> bool ForMemcpy = false;<br class=""> // Make sure this is a strided store with a constant stride.<br class="">- if (!isLegalStore(SI, ForMemset, ForMemcpy))<br class="">+ if (!isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy))<br class=""> continue;<br class=""><br class=""> // Save the store locations.<br class="">- if (ForMemset)<br class="">- StoreRefsForMemset.push_back(SI);<br class="">- else if (ForMemcpy)<br class="">+ if (ForMemset) {<br class="">+ // Find the base pointer.<br class="">+ Value *Ptr = GetUnderlyingObject(SI->getPointerOperand(), *DL);<br class="">+ StoreRefsForMemset[Ptr].push_back(SI);<br class="">+ } else if (ForMemsetPattern) {<br class="">+ // Find the base pointer.<br class="">+ Value *Ptr = GetUnderlyingObject(SI->getPointerOperand(), *DL);<br class="">+ StoreRefsForMemsetPattern[Ptr].push_back(SI);<br class="">+ } else if (ForMemcpy)<br class=""> StoreRefsForMemcpy.push_back(SI);<br class=""> }<br class="">}<br class="">@@ -430,9 +442,14 @@ bool LoopIdiomRecognize::runOnLoopBlock(<br class=""> // Look for store instructions, which may be optimized to memset/memcpy.<br class=""> collectStores(BB);<br class=""><br class="">- // Look for a single store which can be optimized into a memset.<br class="">- for (auto &SI : StoreRefsForMemset)<br class="">- MadeChange |= processLoopStore(SI, BECount);<br class="">+ // Look for a single store or sets of stores with a common base, which can be<br class="">+ // optimized into a memset (memset_pattern). The latter most commonly happens<br class="">+ // with structs and handunrolled loops.<br class="">+ for (auto &SL : StoreRefsForMemset)<br class="">+ MadeChange |= processLoopStores(SL.second, BECount, true);<br class="">+<br class="">+ for (auto &SL : StoreRefsForMemsetPattern)<br class="">+ MadeChange |= processLoopStores(SL.second, BECount, false);<br class=""><br class=""> // Optimize the store into a memcpy, if it feeds an similarly strided load.<br class=""> for (auto &SI : StoreRefsForMemcpy)<br class="">@@ -458,26 +475,155 @@ bool LoopIdiomRecognize::runOnLoopBlock(<br class=""> return MadeChange;<br class="">}<br class=""><br class="">-/// processLoopStore - See if this store can be promoted to a memset.<br class="">-bool LoopIdiomRecognize::processLoopStore(StoreInst *SI, const SCEV *BECount) {<br class="">- assert(SI->isSimple() && "Expected only non-volatile stores.");<br class="">+/// processLoopStores - See if this store(s) can be promoted to a memset.<br class="">+bool LoopIdiomRecognize::processLoopStores(SmallVectorImpl<StoreInst *> &SL,<br class="">+ const SCEV *BECount,<br class="">+ bool ForMemset) {<br class="">+ // Try to find consecutive stores that can be transformed into memsets.<br class="">+ SetVector<StoreInst *> Heads, Tails;<br class="">+ SmallDenseMap<StoreInst *, StoreInst *> ConsecutiveChain;<br class="">+<br class="">+ // Do a quadratic search on all of the given stores and find<br class="">+ // all of the pairs of stores that follow each other.<br class="">+ SmallVector<unsigned, 16> IndexQueue;<br class="">+ for (unsigned i = 0, e = SL.size(); i < e; ++i) {<br class="">+ assert(SL[i]->isSimple() && "Expected only non-volatile stores.");<br class="">+<br class="">+ Value *FirstStoredVal = SL[i]->getValueOperand();<br class="">+ Value *FirstStorePtr = SL[i]->getPointerOperand();<br class="">+ const SCEVAddRecExpr *FirstStoreEv =<br class="">+ cast<SCEVAddRecExpr>(SE->getSCEV(FirstStorePtr));<br class="">+ unsigned FirstStride = getStoreStride(FirstStoreEv);<br class="">+ unsigned FirstStoreSize = getStoreSizeInBytes(SL[i], DL);<br class="">+<br class="">+ // See if we can optimize just this store in isolation.<br class="">+ if (FirstStride == FirstStoreSize || FirstStride == -FirstStoreSize) {<br class="">+ Heads.insert(SL[i]);<br class="">+ continue;<br class="">+ }<br class=""><br class="">- Value *StoredVal = SI->getValueOperand();<br class="">- Value *StorePtr = SI->getPointerOperand();<br class="">+ Value *FirstSplatValue = nullptr;<br class="">+ Constant *FirstPatternValue = nullptr;<br class=""><br class="">- // Check to see if the stride matches the size of the store. If so, then we<br class="">- // know that every byte is touched in the loop.<br class="">- const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));<br class="">- unsigned Stride = getStoreStride(StoreEv);<br class="">- unsigned StoreSize = getStoreSizeInBytes(SI, DL);<br class="">- if (StoreSize != Stride && StoreSize != -Stride)<br class="">- return false;<br class="">+ if (ForMemset)<br class="">+ FirstSplatValue = isBytewiseValue(FirstStoredVal);<br class="">+ else<br class="">+ FirstPatternValue = getMemSetPatternValue(FirstStoredVal, DL);<br class="">+<br class="">+ assert((FirstSplatValue || FirstPatternValue) &&<br class="">+ "Expected either splat value or pattern value.");<br class="">+<br class="">+ IndexQueue.clear();<br class="">+ // If a store has multiple consecutive store candidates, search Stores<br class="">+ // array according to the sequence: from i+1 to e, then from i-1 to 0.<br class="">+ // This is because usually pairing with immediate succeeding or preceding<br class="">+ // candidate create the best chance to find memset opportunity.<br class="">+ unsigned j = 0;<br class="">+ for (j = i + 1; j < e; ++j)<br class="">+ IndexQueue.push_back(j);<br class="">+ for (j = i; j > 0; --j)<br class="">+ IndexQueue.push_back(j - 1);<br class="">+<br class="">+ for (auto &k : IndexQueue) {<br class="">+ assert(SL[k]->isSimple() && "Expected only non-volatile stores.");<br class="">+ Value *SecondStorePtr = SL[k]->getPointerOperand();<br class="">+ const SCEVAddRecExpr *SecondStoreEv =<br class="">+ cast<SCEVAddRecExpr>(SE->getSCEV(SecondStorePtr));<br class="">+ unsigned SecondStride = getStoreStride(SecondStoreEv);<br class=""><br class="">- bool NegStride = StoreSize == -Stride;<br class="">+ if (FirstStride != SecondStride)<br class="">+ continue;<br class="">+<br class="">+ Value *SecondStoredVal = SL[k]->getValueOperand();<br class="">+ Value *SecondSplatValue = nullptr;<br class="">+ Constant *SecondPatternValue = nullptr;<br class="">+<br class="">+ if (ForMemset)<br class="">+ SecondSplatValue = isBytewiseValue(SecondStoredVal);<br class="">+ else<br class="">+ SecondPatternValue = getMemSetPatternValue(SecondStoredVal, DL);<br class="">+<br class="">+ assert((SecondSplatValue || SecondPatternValue) &&<br class="">+ "Expected either splat value or pattern value.");<br class="">+<br class="">+ if (isConsecutiveAccess(SL[i], SL[k], *DL, *SE, false)) {<br class="">+ if (ForMemset) {<br class="">+ ConstantInt *C1 = dyn_cast<ConstantInt>(FirstSplatValue);<br class="">+ ConstantInt *C2 = dyn_cast<ConstantInt>(SecondSplatValue);<br class="">+ if (!C1 || !C2 || C1 != C2)<br class="">+ continue;<br class="">+ } else {<br class="">+ Constant *C1 = FirstPatternValue;<br class="">+ Constant *C2 = SecondPatternValue;<br class="">+<br class="">+ if (ConstantArray *CA1 = dyn_cast<ConstantArray>(C1))<br class="">+ C1 = CA1->getSplatValue();<br class="">+<br class="">+ if (ConstantArray *CA2 = dyn_cast<ConstantArray>(C2))<br class="">+ C2 = CA2->getSplatValue();<br class="">+<br class="">+ if (C1 != C2)<br class="">+ continue;<br class="">+ }<br class="">+ Tails.insert(SL[k]);<br class="">+ Heads.insert(SL[i]);<br class="">+ ConsecutiveChain[SL[i]] = SL[k];<br class="">+ break;<br class="">+ }<br class="">+ }<br class="">+ }<br class="">+<br class="">+ // We may run into multiple chains that merge into a single chain. We mark the<br class="">+ // stores that we transformed so that we don't visit the same store twice.<br class="">+ SmallPtrSet<Value *, 16> TransformedStores;<br class="">+ bool Changed = false;<br class="">+<br class="">+ // For stores that start but don't end a link in the chain:<br class="">+ for (SetVector<StoreInst *>::iterator it = Heads.begin(), e = Heads.end();<br class="">+ it != e; ++it) {<br class="">+ if (Tails.count(*it))<br class="">+ continue;<br class="">+<br class="">+ // We found a store instr that starts a chain. Now follow the chain and try<br class="">+ // to transform it.<br class="">+ SmallPtrSet<Instruction *, 8> AdjacentStores;<br class="">+ StoreInst *I = *it;<br class="">+<br class="">+ StoreInst *HeadStore = I;<br class="">+ unsigned StoreSize = 0;<br class="">+<br class="">+ // Collect the chain into a list.<br class="">+ while (Tails.count(I) || Heads.count(I)) {<br class="">+ if (TransformedStores.count(I))<br class="">+ break;<br class="">+ AdjacentStores.insert(I);<br class="">+<br class="">+ StoreSize += getStoreSizeInBytes(I, DL);<br class="">+ // Move to the next value in the chain.<br class="">+ I = ConsecutiveChain[I];<br class="">+ }<br class="">+<br class="">+ Value *StoredVal = HeadStore->getValueOperand();<br class="">+ Value *StorePtr = HeadStore->getPointerOperand();<br class="">+ const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));<br class="">+ unsigned Stride = getStoreStride(StoreEv);<br class="">+<br class="">+ // Check to see if the stride matches the size of the stores. If so, then<br class="">+ // we know that every byte is touched in the loop.<br class="">+ if (StoreSize != Stride && StoreSize != -Stride)<br class="">+ continue;<br class="">+<br class="">+ bool NegStride = StoreSize == -Stride;<br class="">+<br class="">+ if (processLoopStridedStore(StorePtr, StoreSize, HeadStore->getAlignment(),<br class="">+ StoredVal, HeadStore, AdjacentStores, StoreEv,<br class="">+ BECount, NegStride)) {<br class="">+ TransformedStores.insert(AdjacentStores.begin(), AdjacentStores.end());<br class="">+ Changed = true;<br class="">+ }<br class="">+ }<br class=""><br class="">- // See if we can optimize just this store in isolation.<br class="">- return processLoopStridedStore(StorePtr, StoreSize, SI->getAlignment(),<br class="">- StoredVal, SI, StoreEv, BECount, NegStride);<br class="">+ return Changed;<br class="">}<br class=""><br class="">/// processLoopMemSet - See if this memset can be promoted to a large memset.<br class="">@@ -520,18 +666,21 @@ bool LoopIdiomRecognize::processLoopMemS<br class=""> if (!SplatValue || !CurLoop->isLoopInvariant(SplatValue))<br class=""> return false;<br class=""><br class="">+ SmallPtrSet<Instruction *, 1> MSIs;<br class="">+ MSIs.insert(MSI);<br class=""> return processLoopStridedStore(Pointer, (unsigned)SizeInBytes,<br class="">- MSI->getAlignment(), SplatValue, MSI, Ev,<br class="">+ MSI->getAlignment(), SplatValue, MSI, MSIs, Ev,<br class=""> BECount, /*NegStride=*/false);<br class="">}<br class=""><br class="">/// mayLoopAccessLocation - Return true if the specified loop might access the<br class="">/// specified pointer location, which is a loop-strided access. The 'Access'<br class="">/// argument specifies what the verboten forms of access are (read or write).<br class="">-static bool mayLoopAccessLocation(Value *Ptr, ModRefInfo Access, Loop *L,<br class="">- const SCEV *BECount, unsigned StoreSize,<br class="">- AliasAnalysis &AA,<br class="">- Instruction *IgnoredStore) {<br class="">+static bool<br class="">+mayLoopAccessLocation(Value *Ptr, ModRefInfo Access, Loop *L,<br class="">+ const SCEV *BECount, unsigned StoreSize,<br class="">+ AliasAnalysis &AA,<br class="">+ SmallPtrSetImpl<Instruction *> &IgnoredStores) {<br class=""> // Get the location that may be stored across the loop. Since the access is<br class=""> // strided positively through memory, we say that the modified location starts<br class=""> // at the pointer and has infinite size.<br class="">@@ -551,7 +700,8 @@ static bool mayLoopAccessLocation(Value<br class=""> for (Loop::block_iterator BI = L->block_begin(), E = L->block_end(); BI != E;<br class=""> ++BI)<br class=""> for (BasicBlock::iterator I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I)<br class="">- if (&*I != IgnoredStore && (AA.getModRefInfo(&*I, StoreLoc) & Access))<br class="">+ if (IgnoredStores.count(&*I) == 0 &&<br class="">+ (AA.getModRefInfo(&*I, StoreLoc) & Access))<br class=""> return true;<br class=""><br class=""> return false;<br class="">@@ -574,7 +724,8 @@ static const SCEV *getStartForNegStride(<br class="">/// transform this into a memset or memset_pattern in the loop preheader, do so.<br class="">bool LoopIdiomRecognize::processLoopStridedStore(<br class=""> Value *DestPtr, unsigned StoreSize, unsigned StoreAlignment,<br class="">- Value *StoredVal, Instruction *TheStore, const SCEVAddRecExpr *Ev,<br class="">+ Value *StoredVal, Instruction *TheStore,<br class="">+ SmallPtrSetImpl<Instruction *> &Stores, const SCEVAddRecExpr *Ev,<br class=""> const SCEV *BECount, bool NegStride) {<br class=""> Value *SplatValue = isBytewiseValue(StoredVal);<br class=""> Constant *PatternValue = nullptr;<br class="">@@ -609,7 +760,7 @@ bool LoopIdiomRecognize::processLoopStri<br class=""> Value *BasePtr =<br class=""> Expander.expandCodeFor(Start, DestInt8PtrTy, Preheader->getTerminator());<br class=""> if (mayLoopAccessLocation(BasePtr, MRI_ModRef, CurLoop, BECount, StoreSize,<br class="">- *AA, TheStore)) {<br class="">+ *AA, Stores)) {<br class=""> Expander.clear();<br class=""> // If we generated new code for the base pointer, clean up.<br class=""> RecursivelyDeleteTriviallyDeadInstructions(BasePtr, TLI);<br class="">@@ -662,7 +813,8 @@ bool LoopIdiomRecognize::processLoopStri<br class=""><br class=""> // Okay, the memset has been formed. Zap the original store and anything that<br class=""> // feeds into it.<br class="">- deleteDeadInstruction(TheStore, TLI);<br class="">+ for (auto *I : Stores)<br class="">+ deleteDeadInstruction(I, TLI);<br class=""> ++NumMemSet;<br class=""> return true;<br class="">}<br class="">@@ -714,8 +866,10 @@ bool LoopIdiomRecognize::processLoopStor<br class=""> Value *StoreBasePtr = Expander.expandCodeFor(<br class=""> StrStart, Builder.getInt8PtrTy(StrAS), Preheader->getTerminator());<br class=""><br class="">+ SmallPtrSet<Instruction *, 1> Stores;<br class="">+ Stores.insert(SI);<br class=""> if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,<br class="">- StoreSize, *AA, SI)) {<br class="">+ StoreSize, *AA, Stores)) {<br class=""> Expander.clear();<br class=""> // If we generated new code for the base pointer, clean up.<br class=""> RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);<br class="">@@ -735,7 +889,7 @@ bool LoopIdiomRecognize::processLoopStor<br class=""> LdStart, Builder.getInt8PtrTy(LdAS), Preheader->getTerminator());<br class=""><br class=""> if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,<br class="">- *AA, SI)) {<br class="">+ *AA, Stores)) {<br class=""> Expander.clear();<br class=""> // If we generated new code for the base pointer, clean up.<br class=""> RecursivelyDeleteTriviallyDeadInstructions(LoadBasePtr, TLI);<br class=""><br class="">Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=258620&r1=258619&r2=258620&view=diff<br class="">==============================================================================<br class="">--- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)<br class="">+++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Sat Jan 23 00:52:41 2016<br class="">@@ -26,6 +26,7 @@<br class="">#include "llvm/Analysis/AssumptionCache.h"<br class="">#include "llvm/Analysis/CodeMetrics.h"<br class="">#include "llvm/Analysis/LoopInfo.h"<br class="">+#include "llvm/Analysis/LoopAccessAnalysis.h"<br class="">#include "llvm/Analysis/ScalarEvolution.h"<br class="">#include "llvm/Analysis/ScalarEvolutionExpressions.h"<br class="">#include "llvm/Analysis/TargetTransformInfo.h"<br class="">@@ -401,9 +402,6 @@ public:<br class=""> }<br class=""> }<br class=""><br class="">- /// \returns true if the memory operations A and B are consecutive.<br class="">- bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL);<br class="">-<br class=""> /// \brief Perform LICM and CSE on the newly generated gather sequences.<br class=""> void optimizeGatherSequence();<br class=""><br class="">@@ -438,14 +436,6 @@ private:<br class=""> /// vectorized, or NULL. They may happen in cycles.<br class=""> Value *alreadyVectorized(ArrayRef<Value *> VL) const;<br class=""><br class="">- /// \brief Take the pointer operand from the Load/Store instruction.<br class="">- /// \returns NULL if this is not a valid Load/Store instruction.<br class="">- static Value *getPointerOperand(Value *I);<br class="">-<br class="">- /// \brief Take the address space operand from the Load/Store instruction.<br class="">- /// \returns -1 if this is not a valid Load/Store instruction.<br class="">- static unsigned getAddressSpaceOperand(Value *I);<br class="">-<br class=""> /// \returns the scalarization cost for this type. Scalarization in this<br class=""> /// context means the creation of vectors from a group of scalars.<br class=""> int getGatherCost(Type *Ty);<br class="">@@ -1191,8 +1181,8 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> return;<br class=""> }<br class=""><br class="">- if (!isConsecutiveAccess(VL[i], VL[i + 1], DL)) {<br class="">- if (VL.size() == 2 && isConsecutiveAccess(VL[1], VL[0], DL)) {<br class="">+ if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, *SE)) {<br class="">+ if (VL.size() == 2 && isConsecutiveAccess(VL[1], VL[0], DL, *SE)) {<br class=""> ++NumLoadsWantToChangeOrder;<br class=""> }<br class=""> BS.cancelScheduling(VL);<br class="">@@ -1364,7 +1354,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br class=""> const DataLayout &DL = F->getParent()->getDataLayout();<br class=""> // Check if the stores are consecutive or of we need to swizzle them.<br class=""> for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)<br class="">- if (!isConsecutiveAccess(VL[i], VL[i + 1], DL)) {<br class="">+ if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, *SE)) {<br class=""> BS.cancelScheduling(VL);<br class=""> newTreeEntry(VL, false);<br class=""> DEBUG(dbgs() << "SLP: Non-consecutive store.\n");<br class="">@@ -1837,63 +1827,6 @@ int BoUpSLP::getGatherCost(ArrayRef<Valu<br class=""> return getGatherCost(VecTy);<br class="">}<br class=""><br class="">-Value *BoUpSLP::getPointerOperand(Value *I) {<br class="">- if (LoadInst *LI = dyn_cast<LoadInst>(I))<br class="">- return LI->getPointerOperand();<br class="">- if (StoreInst *SI = dyn_cast<StoreInst>(I))<br class="">- return SI->getPointerOperand();<br class="">- return nullptr;<br class="">-}<br class="">-<br class="">-unsigned BoUpSLP::getAddressSpaceOperand(Value *I) {<br class="">- if (LoadInst *L = dyn_cast<LoadInst>(I))<br class="">- return L->getPointerAddressSpace();<br class="">- if (StoreInst *S = dyn_cast<StoreInst>(I))<br class="">- return S->getPointerAddressSpace();<br class="">- return -1;<br class="">-}<br class="">-<br class="">-bool BoUpSLP::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL) {<br class="">- Value *PtrA = getPointerOperand(A);<br class="">- Value *PtrB = getPointerOperand(B);<br class="">- unsigned ASA = getAddressSpaceOperand(A);<br class="">- unsigned ASB = getAddressSpaceOperand(B);<br class="">-<br class="">- // Check that the address spaces match and that the pointers are valid.<br class="">- if (!PtrA || !PtrB || (ASA != ASB))<br class="">- return false;<br class="">-<br class="">- // Make sure that A and B are different pointers of the same type.<br class="">- if (PtrA == PtrB || PtrA->getType() != PtrB->getType())<br class="">- return false;<br class="">-<br class="">- unsigned PtrBitWidth = DL.getPointerSizeInBits(ASA);<br class="">- Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();<br class="">- APInt Size(PtrBitWidth, DL.getTypeStoreSize(Ty));<br class="">-<br class="">- APInt OffsetA(PtrBitWidth, 0), OffsetB(PtrBitWidth, 0);<br class="">- PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);<br class="">- PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);<br class="">-<br class="">- APInt OffsetDelta = OffsetB - OffsetA;<br class="">-<br class="">- // Check if they are based on the same pointer. That makes the offsets<br class="">- // sufficient.<br class="">- if (PtrA == PtrB)<br class="">- return OffsetDelta == Size;<br class="">-<br class="">- // Compute the necessary base pointer delta to have the necessary final delta<br class="">- // equal to the size.<br class="">- APInt BaseDelta = Size - OffsetDelta;<br class="">-<br class="">- // Otherwise compute the distance with SCEV between the base pointers.<br class="">- const SCEV *PtrSCEVA = SE->getSCEV(PtrA);<br class="">- const SCEV *PtrSCEVB = SE->getSCEV(PtrB);<br class="">- const SCEV *C = SE->getConstant(BaseDelta);<br class="">- const SCEV *X = SE->getAddExpr(PtrSCEVA, C);<br class="">- return X == PtrSCEVB;<br class="">-}<br class="">-<br class="">// Reorder commutative operations in alternate shuffle if the resulting vectors<br class="">// are consecutive loads. This would allow us to vectorize the tree.<br class="">// If we have something like-<br class="">@@ -1921,10 +1854,10 @@ void BoUpSLP::reorderAltShuffleOperands(<br class=""> if (LoadInst *L1 = dyn_cast<LoadInst>(Right[j + 1])) {<br class=""> Instruction *VL1 = cast<Instruction>(VL[j]);<br class=""> Instruction *VL2 = cast<Instruction>(VL[j + 1]);<br class="">- if (isConsecutiveAccess(L, L1, DL) && VL1->isCommutative()) {<br class="">+ if (isConsecutiveAccess(L, L1, DL, *SE) && VL1->isCommutative()) {<br class=""> std::swap(Left[j], Right[j]);<br class=""> continue;<br class="">- } else if (isConsecutiveAccess(L, L1, DL) && VL2->isCommutative()) {<br class="">+ } else if (isConsecutiveAccess(L, L1, DL, *SE) && VL2->isCommutative()) {<br class=""> std::swap(Left[j + 1], Right[j + 1]);<br class=""> continue;<br class=""> }<br class="">@@ -1935,10 +1868,10 @@ void BoUpSLP::reorderAltShuffleOperands(<br class=""> if (LoadInst *L1 = dyn_cast<LoadInst>(Left[j + 1])) {<br class=""> Instruction *VL1 = cast<Instruction>(VL[j]);<br class=""> Instruction *VL2 = cast<Instruction>(VL[j + 1]);<br class="">- if (isConsecutiveAccess(L, L1, DL) && VL1->isCommutative()) {<br class="">+ if (isConsecutiveAccess(L, L1, DL, *SE) && VL1->isCommutative()) {<br class=""> std::swap(Left[j], Right[j]);<br class=""> continue;<br class="">- } else if (isConsecutiveAccess(L, L1, DL) && VL2->isCommutative()) {<br class="">+ } else if (isConsecutiveAccess(L, L1, DL, *SE) && VL2->isCommutative()) {<br class=""> std::swap(Left[j + 1], Right[j + 1]);<br class=""> continue;<br class=""> }<br class="">@@ -2088,7 +2021,7 @@ void BoUpSLP::reorderInputsAccordingToOp<br class=""> for (unsigned j = 0; j < VL.size() - 1; ++j) {<br class=""> if (LoadInst *L = dyn_cast<LoadInst>(Left[j])) {<br class=""> if (LoadInst *L1 = dyn_cast<LoadInst>(Right[j + 1])) {<br class="">- if (isConsecutiveAccess(L, L1, DL)) {<br class="">+ if (isConsecutiveAccess(L, L1, DL, *SE)) {<br class=""> std::swap(Left[j + 1], Right[j + 1]);<br class=""> continue;<br class=""> }<br class="">@@ -2096,7 +2029,7 @@ void BoUpSLP::reorderInputsAccordingToOp<br class=""> }<br class=""> if (LoadInst *L = dyn_cast<LoadInst>(Right[j])) {<br class=""> if (LoadInst *L1 = dyn_cast<LoadInst>(Left[j + 1])) {<br class="">- if (isConsecutiveAccess(L, L1, DL)) {<br class="">+ if (isConsecutiveAccess(L, L1, DL, *SE)) {<br class=""> std::swap(Left[j + 1], Right[j + 1]);<br class=""> continue;<br class=""> }<br class="">@@ -3461,7 +3394,7 @@ bool SLPVectorizer::vectorizeStores(Arra<br class=""> IndexQueue.push_back(j - 1);<br class=""><br class=""> for (auto &k : IndexQueue) {<br class="">- if (R.isConsecutiveAccess(Stores[i], Stores[k], DL)) {<br class="">+ if (isConsecutiveAccess(Stores[i], Stores[k], DL, *SE)) {<br class=""> Tails.insert(Stores[k]);<br class=""> Heads.insert(Stores[i]);<br class=""> ConsecutiveChain[Stores[i]] = Stores[k];<br class=""><br class="">Added: llvm/trunk/test/Transforms/LoopIdiom/struct.ll<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/struct.ll?rev=258620&view=auto<br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopIdiom/struct.ll (added)<br class="">+++ llvm/trunk/test/Transforms/LoopIdiom/struct.ll Sat Jan 23 00:52:41 2016<br class="">@@ -0,0 +1,221 @@<br class="">+; RUN: opt -basicaa -loop-idiom < %s -S | FileCheck %s<br class="">+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"<br class="">+<br class="">+target triple = "x86_64-apple-darwin10.0.0"<br class="">+<br class="">+%struct.foo = type { i32, i32 }<br class="">+%struct.foo1 = type { i32, i32, i32 }<br class="">+%struct.foo2 = type { i32, i16, i16 }<br class="">+<br class="">+;void bar1(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 0;<br class="">+; }<br class="">+;}<br class="">+define void @bar1(%struct.foo* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 0, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar1(<br class="">+; CHECK: call void @llvm.memset<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar2(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].b = 0;<br class="">+; f[i].a = 0;<br class="">+; }<br class="">+;}<br class="">+define void @bar2(%struct.foo* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 0, i32* %b, align 4<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar2(<br class="">+; CHECK: call void @llvm.memset<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar3(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = n; i > 0; --i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 0;<br class="">+; }<br class="">+;}<br class="">+define void @bar3(%struct.foo* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ %0 = zext i32 %n to i64<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 0, i32* %b, align 4<br class="">+ %1 = trunc i64 %indvars.iv to i32<br class="">+ %dec = add i32 %1, -1<br class="">+ %cmp = icmp eq i32 %dec, 0<br class="">+ %indvars.iv.next = add nsw i64 %indvars.iv, -1<br class="">+ br i1 %cmp, label %for.end.loopexit, label %for.body<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar3(<br class="">+; CHECK: call void @llvm.memset<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar4(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 1;<br class="">+; }<br class="">+;}<br class="">+define void @bar4(%struct.foo* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 1, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar4(<br class="">+; CHECK-NOT: call void @llvm.memset <br class="">+}<br class="">+<br class="">+;void bar5(foo1_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 0;<br class="">+; }<br class="">+;}<br class="">+define void @bar5(%struct.foo1* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo1, %struct.foo1* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo1, %struct.foo1* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 0, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar5(<br class="">+; CHECK-NOT: call void @llvm.memset <br class="">+}<br class="">+<br class="">+;void bar6(foo2_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 0;<br class="">+; f[i].c = 0;<br class="">+; }<br class="">+;}<br class="">+define void @bar6(%struct.foo2* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo2, %struct.foo2* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo2, %struct.foo2* %f, i64 %indvars.iv, i32 1<br class="">+ store i16 0, i16* %b, align 4<br class="">+ %c = getelementptr inbounds %struct.foo2, %struct.foo2* %f, i64 %indvars.iv, i32 2<br class="">+ store i16 0, i16* %c, align 2<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar6(<br class="">+; CHECK: call void @llvm.memset<br class="">+; CHECK-NOT: store<br class="">+}<br class=""><br class="">Added: llvm/trunk/test/Transforms/LoopIdiom/struct_pattern.ll<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/struct_pattern.ll?rev=258620&view=auto<br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopIdiom/struct_pattern.ll (added)<br class="">+++ llvm/trunk/test/Transforms/LoopIdiom/struct_pattern.ll Sat Jan 23 00:52:41 2016<br class="">@@ -0,0 +1,186 @@<br class="">+; RUN: opt -basicaa -loop-idiom < %s -S | FileCheck %s<br class="">+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"<br class="">+<br class="">+; CHECK: @.memset_pattern = private unnamed_addr constant [4 x i32] [i32 2, i32 2, i32 2, i32 2], align 16<br class="">+; CHECK: @.memset_pattern.1 = private unnamed_addr constant [4 x i32] [i32 2, i32 2, i32 2, i32 2], align 16<br class="">+; CHECK: @.memset_pattern.2 = private unnamed_addr constant [4 x i32] [i32 2, i32 2, i32 2, i32 2], align 16<br class="">+<br class="">+target triple = "x86_64-apple-darwin10.0.0"<br class="">+<br class="">+%struct.foo = type { i32, i32 }<br class="">+%struct.foo1 = type { i32, i32, i32 }<br class="">+<br class="">+;void bar1(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 2;<br class="">+; f[i].b = 2;<br class="">+; }<br class="">+;}<br class="">+define void @bar1(%struct.foo* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 2, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 2, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar1(<br class="">+; CHECK: call void @memset_pattern16<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar2(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].b = 2;<br class="">+; f[i].a = 2;<br class="">+; }<br class="">+;}<br class="">+define void @bar2(%struct.foo* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 2, i32* %b, align 4<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 2, i32* %a, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar2(<br class="">+; CHECK: call void @memset_pattern16<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar3(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = n; i > 0; --i) {<br class="">+; f[i].a = 2;<br class="">+; f[i].b = 2;<br class="">+; }<br class="">+;}<br class="">+define void @bar3(%struct.foo* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ %0 = zext i32 %n to i64<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 2, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 2, i32* %b, align 4<br class="">+ %1 = trunc i64 %indvars.iv to i32<br class="">+ %dec = add i32 %1, -1<br class="">+ %cmp = icmp eq i32 %dec, 0<br class="">+ %indvars.iv.next = add nsw i64 %indvars.iv, -1<br class="">+ br i1 %cmp, label %for.end.loopexit, label %for.body<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar3(<br class="">+; CHECK: call void @memset_pattern16<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void bar4(foo_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 0;<br class="">+; f[i].b = 1;<br class="">+; }<br class="">+;}<br class="">+define void @bar4(%struct.foo* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 0, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo, %struct.foo* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 1, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar4(<br class="">+; CHECK-NOT: call void @memset_pattern16 <br class="">+}<br class="">+<br class="">+;void bar5(foo1_t *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < n; ++i) {<br class="">+; f[i].a = 1;<br class="">+; f[i].b = 1;<br class="">+; }<br class="">+;}<br class="">+define void @bar5(%struct.foo1* nocapture %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %cmp1 = icmp eq i32 %n, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %a = getelementptr inbounds %struct.foo1, %struct.foo1* %f, i64 %indvars.iv, i32 0<br class="">+ store i32 1, i32* %a, align 4<br class="">+ %b = getelementptr inbounds %struct.foo1, %struct.foo1* %f, i64 %indvars.iv, i32 1<br class="">+ store i32 1, i32* %b, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br class="">+ %lftr.wideiv = trunc i64 %indvars.iv.next to i32<br class="">+ %exitcond = icmp ne i32 %lftr.wideiv, %n<br class="">+ br i1 %exitcond, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @bar5(<br class="">+; CHECK-NOT: call void @memset_pattern16<br class="">+}<br class=""><br class="">Added: llvm/trunk/test/Transforms/LoopIdiom/unroll.ll<br class="">URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/unroll.ll?rev=258620&view=auto<br class="">==============================================================================<br class="">--- llvm/trunk/test/Transforms/LoopIdiom/unroll.ll (added)<br class="">+++ llvm/trunk/test/Transforms/LoopIdiom/unroll.ll Sat Jan 23 00:52:41 2016<br class="">@@ -0,0 +1,80 @@<br class="">+; RUN: opt -basicaa -loop-idiom < %s -S | FileCheck %s<br class="">+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"<br class="">+<br class="">+; CHECK @.memset_pattern = private unnamed_addr constant [4 x i32] [i32 2, i32 2, i32 2, i32 2], align 16<br class="">+<br class="">+target triple = "x86_64-apple-darwin10.0.0"<br class="">+<br class="">+;void test(int *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < 2 * n; i += 2) {<br class="">+; f[i] = 0;<br class="">+; f[i+1] = 0;<br class="">+; }<br class="">+;}<br class="">+define void @test(i32* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %mul = shl i32 %n, 1<br class="">+ %cmp1 = icmp eq i32 %mul, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ %0 = zext i32 %mul to i64<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv<br class="">+ store i32 0, i32* %arrayidx, align 4<br class="">+ %1 = or i64 %indvars.iv, 1<br class="">+ %arrayidx2 = getelementptr inbounds i32, i32* %f, i64 %1<br class="">+ store i32 0, i32* %arrayidx2, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2<br class="">+ %cmp = icmp ult i64 %indvars.iv.next, %0<br class="">+ br i1 %cmp, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @test(<br class="">+; CHECK: call void @llvm.memset<br class="">+; CHECK-NOT: store<br class="">+}<br class="">+<br class="">+;void test_pattern(int *f, unsigned n) {<br class="">+; for (unsigned i = 0; i < 2 * n; i += 2) {<br class="">+; f[i] = 2;<br class="">+; f[i+1] = 2;<br class="">+; }<br class="">+;}<br class="">+define void @test_pattern(i32* %f, i32 %n) nounwind ssp {<br class="">+entry:<br class="">+ %mul = shl i32 %n, 1<br class="">+ %cmp1 = icmp eq i32 %mul, 0<br class="">+ br i1 %cmp1, label %for.end, label %for.body.preheader<br class="">+<br class="">+for.body.preheader: ; preds = %entry<br class="">+ %0 = zext i32 %mul to i64<br class="">+ br label %for.body<br class="">+<br class="">+for.body: ; preds = %for.body.preheader, %for.body<br class="">+ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]<br class="">+ %arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv<br class="">+ store i32 2, i32* %arrayidx, align 4<br class="">+ %1 = or i64 %indvars.iv, 1<br class="">+ %arrayidx2 = getelementptr inbounds i32, i32* %f, i64 %1<br class="">+ store i32 2, i32* %arrayidx2, align 4<br class="">+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2<br class="">+ %cmp = icmp ult i64 %indvars.iv.next, %0<br class="">+ br i1 %cmp, label %for.body, label %for.end.loopexit<br class="">+<br class="">+for.end.loopexit: ; preds = %for.body<br class="">+ br label %for.end<br class="">+<br class="">+for.end: ; preds = %for.end.loopexit, %entry<br class="">+ ret void<br class="">+; CHECK-LABEL: @test_pattern(<br class="">+; CHECK: call void @memset_pattern16<br class="">+; CHECK-NOT: store<br class="">+}<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class="">llvm-commits@lists.llvm.org<br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits<br class=""></blockquote><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class="">llvm-commits@lists.llvm.org<br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits<br class=""></div></div></blockquote></div><br class=""></body></html>