<div dir="ltr">FYI, I've reverted the commit in r313758. Before reapplying, please fix the -Wsign-compare warning (r313753).</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 20, 2017 at 2:29 PM, Alexander Kornienko <span dir="ltr"><<a href="mailto:alexfh@google.com" target="_blank">alexfh@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">This patch has broken buildbots: <a href="http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio" target="_blank">http://lab.llvm.<wbr>org:8011/builders/clang-x86_<wbr>64-debian-fast/builds/6694/<wbr>steps/test/logs/stdio</a><div><br></div><div>Please fix or revert.</div><div><br><div>...</div><div><div>FAIL: LLVM :: Transforms/SLPVectorizer/<wbr>AArch64/gather-root.ll (32439 of 33934)</div><div>******************** TEST 'LLVM :: Transforms/SLPVectorizer/<wbr>AArch64/gather-root.ll' FAILED ********************</div><div>Script:</div><div>--</div><div>/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt < /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll -slp-vectorizer -S | /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/FileCheck /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll --check-prefix=DEFAULT</div><div>/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt < /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S | /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/FileCheck /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll --check-prefix=GATHER</div><div>/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt < /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll -slp-schedule-budget=0 -slp-threshold=-30 -slp-vectorizer -S | /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/FileCheck /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll --check-prefix=MAX-COST</div><div>--</div><div>Exit Code: 2</div><div><br></div><div>Command Output (stderr):</div><div>--</div><div>opt: /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/lib/Transforms/Vectorize/<wbr>SLPVectorizer.cpp:3292: llvm::Value *llvm::slpvectorizer::BoUpSLP:<wbr>:vectorizeTree(<wbr>ExtraValueToDebugLocsMap &): Assertion `!E->NeedToGather && "Extracting from a gather list"' failed.</div><div>#0 0x0000000001c49c34 PrintStackTraceSignalHandler(<wbr>void*) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1c49c34)</div><div>#1 0x0000000001c49f76 SignalHandler(int) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1c49f76)</div><div>#2 0x00007fc461e8e0c0 __restore_rt (/lib/x86_64-linux-gnu/<wbr>libpthread.so.0+0x110c0)</div><div>#3 0x00007fc460a28fff gsignal (/lib/x86_64-linux-gnu/libc.<wbr>so.6+0x32fff)</div><div>#4 0x00007fc460a2a42a abort (/lib/x86_64-linux-gnu/libc.<wbr>so.6+0x3442a)</div><div>#5 0x00007fc460a21e67 (/lib/x86_64-linux-gnu/libc.<wbr>so.6+0x2be67)</div><div>#6 0x00007fc460a21f12 (/lib/x86_64-linux-gnu/libc.<wbr>so.6+0x2bf12)</div><div>#7 0x0000000001d7c5fd llvm::slpvectorizer::BoUpSLP::<wbr>vectorizeTree(llvm::MapVector<<wbr>llvm::Value*, llvm::SmallVector<llvm::<wbr>Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::<wbr>Value*>, llvm::detail::DenseMapPair<<wbr>llvm::Value*, unsigned int> >, std::vector<std::pair<llvm::<wbr>Value*, llvm::SmallVector<llvm::<wbr>Instruction*, 2u> >, std::allocator<std::pair<llvm:<wbr>:Value*, llvm::SmallVector<llvm::<wbr>Instruction*, 2u> > > > >&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1d7c5fd)</div><div>#8 0x0000000001d87aab llvm::SLPVectorizerPass::<wbr>vectorizeRootInstruction(llvm:<wbr>:PHINode*, llvm::Value*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1d87aab)</div><div>#9 0x0000000001d831e6 llvm::SLPVectorizerPass::<wbr>vectorizeChainsInBlock(llvm::<wbr>BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1d831e6)</div><div>#10 0x0000000001d81c10 llvm::SLPVectorizerPass::<wbr>runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::<wbr>OptimizationRemarkEmitter*) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1d81c10)</div><div>#11 0x0000000001d8e1d6 (anonymous namespace)::SLPVectorizer::<wbr>runOnFunction(llvm::Function&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1d8e1d6)</div><div>#12 0x000000000177646f llvm::FPPassManager::<wbr>runOnFunction(llvm::Function&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x177646f)</div><div>#13 0x00000000017766c3 llvm::FPPassManager::<wbr>runOnModule(llvm::Module&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x17766c3)</div><div>#14 0x0000000001776bc6 llvm::legacy::PassManagerImpl:<wbr>:run(llvm::Module&) (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x1776bc6)</div><div>#15 0x00000000006f6e0f main (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x6f6e0f)</div><div>#16 0x00007fc460a162e1 __libc_start_main (/lib/x86_64-linux-gnu/libc.<wbr>so.6+0x202e1)</div><div>#17 0x00000000006e803a _start (/home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt+0x6e803a)</div><div>Stack dump:</div><div>0.<span style="white-space:pre-wrap"> </span>Program arguments: /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/opt -slp-schedule-budget=0 -slp-min-tree-size=0 -slp-threshold=-30 -slp-vectorizer -S </div><div>1.<span style="white-space:pre-wrap"> </span>Running pass 'Function Pass Manager' on module '<stdin>'.</div><div>2.<span style="white-space:pre-wrap"> </span>Running pass 'SLP Vectorizer' on function '@PR28330'</div><div>FileCheck error: '-' is empty.</div><div>FileCheck command line: /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>obj/./bin/FileCheck /home/llvmbb/llvm-build-dir/<wbr>clang-x86_64-debian-fast/llvm.<wbr>src/test/Transforms/<wbr>SLPVectorizer/AArch64/gather-<wbr>root.ll --check-prefix=GATHER</div><div><br></div><div>--</div><div><br></div><div>********************</div></div><div>...</div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 20, 2017 at 10:18 AM, Mohammad Shahid via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: ashahid<br>
Date: Wed Sep 20 01:18:28 2017<br>
New Revision: 313736<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=313736&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=313736&view=rev</a><br>
Log:<br>
[SLP] Vectorize jumbled memory loads.<br>
<br>
Summary:<br>
This patch tries to vectorize loads of consecutive memory accesses, accessed<br>
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905<br>
which was reverted back due to some basic issue with representing the 'use mask' of<br>
jumbled accesses.<br>
<br>
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.<br>
<br>
Change-Id: I9fe7f5045f065d84c126fa307ef6e<wbr>be0787296df<br>
<br>
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh<br>
<br>
Reviewed By: Ayal<br>
<br>
Subscribers: mzolotukhin<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D36130" rel="noreferrer" target="_blank">https://reviews.llvm.org/D3613<wbr>0</a><br>
<br>
Commit after rebase for patch D36130<br>
<br>
Change-Id: I8add1c265455669ef288d880f870a<wbr>9522c8c08ab<br>
<br>
Added:<br>
llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>shuffle-placement.ll<br>
Modified:<br>
llvm/trunk/include/llvm/Analys<wbr>is/LoopAccessAnalysis.h<br>
llvm/trunk/lib/Analysis/LoopAc<wbr>cessAnalysis.cpp<br>
llvm/trunk/lib/Transforms/Vect<wbr>orize/SLPVectorizer.cpp<br>
llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>multiuse.ll<br>
llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load.ll<br>
llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/store-jumbled.<wbr>ll<br>
<br>
Modified: llvm/trunk/include/llvm/Analys<wbr>is/LoopAccessAnalysis.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/include/llvm/<wbr>Analysis/LoopAccessAnalysis.h?<wbr>rev=313736&r1=313735&r2=<wbr>313736&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/include/llvm/Analys<wbr>is/LoopAccessAnalysis.h (original)<br>
+++ llvm/trunk/include/llvm/Analys<wbr>is/LoopAccessAnalysis.h Wed Sep 20 01:18:28 2017<br>
@@ -667,6 +667,21 @@ int64_t getPtrStride(PredicatedScalarE<wbr>vo<br>
const ValueToValueMap &StridesMap = ValueToValueMap(),<br>
bool Assume = false, bool ShouldCheckWrap = true);<br>
<br>
+/// \brief Attempt to sort the 'loads' in \p VL and return the sorted values in<br>
+/// \p Sorted.<br>
+///<br>
+/// Returns 'false' if sorting is not legal or feasible, otherwise returns<br>
+/// 'true'. If \p Mask is not null, it also returns the \p Mask which is the<br>
+/// shuffle mask for actual memory access order.<br>
+///<br>
+/// For example, for a given VL of memory accesses in program order, a[i+2],<br>
+/// a[i+0], a[i+1] and a[i+3], this function will sort the VL and save the<br>
+/// sorted value in 'Sorted' as a[i+0], a[i+1], a[i+2], a[i+3] and saves the<br>
+/// mask for actual memory accesses in program order in 'Mask' as <2,0,1,3><br>
+bool sortLoadAccesses(ArrayRef<Valu<wbr>e *> VL, const DataLayout &DL,<br>
+ ScalarEvolution &SE, SmallVectorImpl<Value *> &Sorted,<br>
+ SmallVectorImpl<unsigned> *Mask = nullptr);<br>
+<br>
/// \brief Returns true if the memory operations \p A and \p B are consecutive.<br>
/// This is a simple API that does not depend on the analysis pass.<br>
bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,<br>
<br>
Modified: llvm/trunk/lib/Analysis/LoopAc<wbr>cessAnalysis.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Analysis/<wbr>LoopAccessAnalysis.cpp?rev=<wbr>313736&r1=313735&r2=313736&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Analysis/LoopAc<wbr>cessAnalysis.cpp (original)<br>
+++ llvm/trunk/lib/Analysis/LoopAc<wbr>cessAnalysis.cpp Wed Sep 20 01:18:28 2017<br>
@@ -1107,6 +1107,76 @@ static unsigned getAddressSpaceOperand(V<br>
return -1;<br>
}<br>
<br>
+// TODO:This API can be improved by using the permutation of given width as the<br>
+// accesses are entered into the map.<br>
+bool llvm::sortLoadAccesses(ArrayRe<wbr>f<Value *> VL, const DataLayout &DL,<br>
+ ScalarEvolution &SE,<br>
+ SmallVectorImpl<Value *> &Sorted,<br>
+ SmallVectorImpl<unsigned> *Mask) {<br>
+ SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;<br>
+ OffValPairs.reserve(VL.size())<wbr>;<br>
+ Sorted.reserve(VL.size());<br>
+<br>
+ // Walk over the pointers, and map each of them to an offset relative to<br>
+ // first pointer in the array.<br>
+ Value *Ptr0 = getPointerOperand(VL[0]);<br>
+ const SCEV *Scev0 = SE.getSCEV(Ptr0);<br>
+ Value *Obj0 = GetUnderlyingObject(Ptr0, DL);<br>
+ PointerType *PtrTy = dyn_cast<PointerType>(Ptr0->ge<wbr>tType());<br>
+ uint64_t Size = DL.getTypeAllocSize(PtrTy->get<wbr>ElementType());<br>
+<br>
+ for (auto *Val : VL) {<br>
+ // The only kind of access we care about here is load.<br>
+ if (!isa<LoadInst>(Val))<br>
+ return false;<br>
+<br>
+ Value *Ptr = getPointerOperand(Val);<br>
+ assert(Ptr && "Expected value to have a pointer operand.");<br>
+ // If a pointer refers to a different underlying object, bail - the<br>
+ // pointers are by definition incomparable.<br>
+ Value *CurrObj = GetUnderlyingObject(Ptr, DL);<br>
+ if (CurrObj != Obj0)<br>
+ return false;<br>
+<br>
+ const SCEVConstant *Diff =<br>
+ dyn_cast<SCEVConstant>(SE.getM<wbr>inusSCEV(SE.getSCEV(Ptr), Scev0));<br>
+ // The pointers may not have a constant offset from each other, or SCEV<br>
+ // may just not be smart enough to figure out they do. Regardless,<br>
+ // there's nothing we can do.<br>
+ if (!Diff || Diff->getAPInt().abs().getSExt<wbr>Value() > (VL.size() - 1) * Size)<br>
+ return false;<br>
+<br>
+ OffValPairs.emplace_back(Diff-<wbr>>getAPInt().getSExtValue(), Val);<br>
+ }<br>
+ SmallVector<unsigned, 4> UseOrder(VL.size());<br>
+ for (unsigned i = 0; i < VL.size(); i++) {<br>
+ UseOrder[i] = i;<br>
+ }<br>
+<br>
+ // Sort the memory accesses and keep the order of their uses in UseOrder.<br>
+ std::sort(UseOrder.begin(), UseOrder.end(),<br>
+ [&OffValPairs](unsigned Left, unsigned Right) {<br>
+ return OffValPairs[Left].first < OffValPairs[Right].first;<br>
+ });<br>
+<br>
+ for (unsigned i = 0; i < VL.size(); i++)<br>
+ Sorted.emplace_back(OffValPair<wbr>s[UseOrder[i]].second);<br>
+<br>
+ // Sort UseOrder to compute the Mask.<br>
+ if (Mask) {<br>
+ Mask->reserve(VL.size());<br>
+ for (unsigned i = 0; i < VL.size(); i++)<br>
+ Mask->emplace_back(i);<br>
+ std::sort(Mask->begin(), Mask->end(),<br>
+ [&UseOrder](unsigned Left, unsigned Right) {<br>
+ return UseOrder[Left] < UseOrder[Right];<br>
+ });<br>
+ }<br>
+<br>
+ return true;<br>
+}<br>
+<br>
+<br>
/// Returns true if the memory operations \p A and \p B are consecutive.<br>
bool llvm::isConsecutiveAccess(Valu<wbr>e *A, Value *B, const DataLayout &DL,<br>
ScalarEvolution &SE, bool CheckType) {<br>
<br>
Modified: llvm/trunk/lib/Transforms/Vect<wbr>orize/SLPVectorizer.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Transform<wbr>s/Vectorize/SLPVectorizer.cpp?<wbr>rev=313736&r1=313735&r2=<wbr>313736&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Transforms/Vect<wbr>orize/SLPVectorizer.cpp (original)<br>
+++ llvm/trunk/lib/Transforms/Vect<wbr>orize/SLPVectorizer.cpp Wed Sep 20 01:18:28 2017<br>
@@ -637,17 +637,23 @@ private:<br>
int getEntryCost(TreeEntry *E);<br>
<br>
/// This is the recursive part of buildTree.<br>
- void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);<br>
+ void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int UserIndx = -1,<br>
+ int OpdNum = 0);<br>
<br>
/// \returns True if the ExtractElement/ExtractValue instructions in VL can<br>
/// be vectorized to use the original vector (or aggregate "bitcast" to a vector).<br>
bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue) const;<br>
<br>
- /// Vectorize a single entry in the tree.<br>
- Value *vectorizeTree(TreeEntry *E);<br>
-<br>
- /// Vectorize a single entry in the tree, starting in \p VL.<br>
- Value *vectorizeTree(ArrayRef<Value *> VL);<br>
+ /// Vectorize a single entry in the tree.\p OpdNum indicate the ordinality of<br>
+ /// operand corrsponding to this tree entry \p E for the user tree entry<br>
+ /// indicated by \p UserIndx.<br>
+ // In other words, "E == TreeEntry[UserIndx].getOperand<wbr>(OpdNum)".<br>
+ Value *vectorizeTree(TreeEntry *E, int OpdNum = 0, int UserIndx = -1);<br>
+<br>
+ /// Vectorize a single entry in the tree, starting in \p VL.\p OpdNum indicate<br>
+ /// the ordinality of operand corrsponding to the \p VL of scalar values for the<br>
+ /// user indicated by \p UserIndx this \p VL feeds into.<br>
+ Value *vectorizeTree(ArrayRef<Value *> VL, int OpdNum = 0, int UserIndx = -1);<br>
<br>
/// \returns the pointer to the vectorized value if \p VL is already<br>
/// vectorized, or NULL. They may happen in cycles.<br>
@@ -685,7 +691,7 @@ private:<br>
SmallVectorImpl<Value *> &Left,<br>
SmallVectorImpl<Value *> &Right);<br>
struct TreeEntry {<br>
- TreeEntry(std::vector<TreeEntr<wbr>y> &Container) : Container(Container) {}<br>
+ TreeEntry(std::vector<TreeEntr<wbr>y> &Container) : ShuffleMask(), Container(Container) {}<br>
<br>
/// \returns true if the scalars in VL are equal to this entry.<br>
bool isSame(ArrayRef<Value *> VL) const {<br>
@@ -693,6 +699,16 @@ private:<br>
return std::equal(VL.begin(), VL.end(), Scalars.begin());<br>
}<br>
<br>
+ /// \returns true if the scalars in VL are found in this tree entry.<br>
+ bool isFoundJumbled(ArrayRef<Value *> VL, const DataLayout &DL,<br>
+ ScalarEvolution &SE) const {<br>
+ assert(VL.size() == Scalars.size() && "Invalid size");<br>
+ SmallVector<Value *, 8> List;<br>
+ if (!sortLoadAccesses(VL, DL, SE, List))<br>
+ return false;<br>
+ return std::equal(List.begin(), List.end(), Scalars.begin());<br>
+ }<br>
+<br>
/// A vector of scalars.<br>
ValueList Scalars;<br>
<br>
@@ -702,6 +718,14 @@ private:<br>
/// Do we need to gather this sequence ?<br>
bool NeedToGather = false;<br>
<br>
+ /// Records optional shuffle mask for the uses of jumbled memory accesses.<br>
+ /// For example, a non-empty ShuffleMask[1] represents the permutation of<br>
+ /// lanes that operand #1 of this vectorized instruction should undergo<br>
+ /// before feeding this vectorized instruction, whereas an empty<br>
+ /// ShuffleMask[0] indicates that the lanes of operand #0 of this vectorized<br>
+ /// instruction need not be permuted at all.<br>
+ SmallVector<unsigned, 4> ShuffleMask[3];<br>
+<br>
/// Points back to the VectorizableTree.<br>
///<br>
/// Only used for Graphviz right now. Unfortunately GraphTrait::NodeRef has<br>
@@ -717,12 +741,25 @@ private:<br>
<br>
/// Create a new VectorizableTree entry.<br>
TreeEntry *newTreeEntry(ArrayRef<Value *> VL, bool Vectorized,<br>
- int &UserTreeIdx) {<br>
+ int &UserTreeIdx, const InstructionsState &S,<br>
+ ArrayRef<unsigned> ShuffleMask = None,<br>
+ int OpdNum = 0) {<br>
+ assert((!Vectorized || S.Opcode != 0) &&<br>
+ "Vectorized TreeEntry without opcode");<br>
VectorizableTree.emplace_<wbr>back(VectorizableTree);<br>
+<br>
int idx = VectorizableTree.size() - 1;<br>
TreeEntry *Last = &VectorizableTree[idx];<br>
Last->Scalars.insert(Last->Sc<wbr>alars.begin(), VL.begin(), VL.end());<br>
Last->NeedToGather = !Vectorized;<br>
+<br>
+ TreeEntry *UserEntry = &VectorizableTree[UserTreeIdx]<wbr>;<br>
+ if (!ShuffleMask.empty()) {<br>
+ assert(UserEntry->ShuffleMask[<wbr>OpdNum].empty() && "Mask already present!");<br>
+ UserEntry->ShuffleMask[OpdNum]<wbr>.insert(<br>
+ UserEntry->ShuffleMask[OpdNum]<wbr>.begin(), ShuffleMask.begin(),<br>
+ ShuffleMask.end());<br>
+ }<br>
if (Vectorized) {<br>
for (int i = 0, e = VL.size(); i != e; ++i) {<br>
assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");<br>
@@ -1373,34 +1410,34 @@ void BoUpSLP::buildTree(ArrayRef<Va<wbr>lue *<br>
}<br>
<br>
void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Value *> VL, unsigned Depth,<br>
- int UserTreeIdx) {<br>
+ int UserTreeIdx, int OpdNum) {<br>
assert((allConstant(VL) || allSameType(VL)) && "Invalid types!");<br>
<br>
InstructionsState S = getSameOpcode(VL);<br>
if (Depth == RecursionMaxDepth) {<br>
DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
// Don't handle vectors.<br>
if (S.OpValue->getType()->isVecto<wbr>rTy()) {<br>
DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue)<wbr>)<br>
if (SI->getValueOperand()->getTyp<wbr>e()->isVectorTy()) {<br>
DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
// If all of the operands are identical or constant we have a simple solution.<br>
if (allConstant(VL) || isSplat(VL) || !allSameBlock(VL) || !S.Opcode) {<br>
DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
@@ -1412,7 +1449,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (EphValues.count(VL[i])) {<br>
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br>
") is ephemeral.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1423,7 +1460,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");<br>
if (E->Scalars[i] != VL[i]) {<br>
DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1442,7 +1479,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (getTreeEntry(I)) {<br>
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br>
") is already in tree.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1452,7 +1489,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (unsigned i = 0, e = VL.size(); i != e; ++i) {<br>
if (MustGather.count(VL[i])) {<br>
DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1466,7 +1503,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
// Don't go into unreachable blocks. They may contain instructions with<br>
// dependency cycles which confuse the final scheduling.<br>
DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
@@ -1475,7 +1512,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (unsigned j = i+1; j < e; ++j)<br>
if (VL[i] == VL[j]) {<br>
DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
<br>
@@ -1490,7 +1527,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
assert((!BS.getScheduleData(V<wbr>L0) ||<br>
!BS.getScheduleData(VL0)->isP<wbr>artOfBundle()) &&<br>
"tryScheduleBundle should cancelScheduling on failure");<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");<br>
@@ -1509,12 +1546,12 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (Term) {<br>
DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");<br>
<br>
for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {<br>
@@ -1524,7 +1561,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
Operands.push_back(cast<PHINo<wbr>de>(j)->getIncomingValueForBlo<wbr>ck(<br>
PH->getIncomingBlock(i)));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
}<br>
@@ -1536,7 +1573,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
} else {<br>
BS.cancelScheduling(VL, VL0);<br>
}<br>
- newTreeEntry(VL, Reuse, UserTreeIdx);<br>
+ newTreeEntry(VL, Reuse, UserTreeIdx, S);<br>
return;<br>
}<br>
case Instruction::Load: {<br>
@@ -1552,7 +1589,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (DL->getTypeSizeInBits(ScalarT<wbr>y) !=<br>
DL->getTypeAllocSizeInBits(Sc<wbr>alarTy)) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");<br>
return;<br>
}<br>
@@ -1563,15 +1600,13 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
LoadInst *L = cast<LoadInst>(VL[i]);<br>
if (!L->isSimple()) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");<br>
return;<br>
}<br>
}<br>
<br>
// Check if the loads are consecutive, reversed, or neither.<br>
- // TODO: What we really want is to sort the loads, but for now, check<br>
- // the two likely directions.<br>
bool Consecutive = true;<br>
bool ReverseConsecutive = true;<br>
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {<br>
@@ -1585,7 +1620,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
<br>
if (Consecutive) {<br>
++NumLoadsWantToKeepOrder;<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of loads.\n");<br>
return;<br>
}<br>
@@ -1599,15 +1634,41 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
break;<br>
}<br>
<br>
- BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
-<br>
if (ReverseConsecutive) {<br>
- ++NumLoadsWantToChangeOrder;<br>
DEBUG(dbgs() << "SLP: Gathering reversed loads.\n");<br>
- } else {<br>
- DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");<br>
+ ++NumLoadsWantToChangeOrder;<br>
+ BS.cancelScheduling(VL, VL0);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
+ return;<br>
+ }<br>
+<br>
+ if (VL.size() > 2) {<br>
+ bool ShuffledLoads = true;<br>
+ SmallVector<Value *, 8> Sorted;<br>
+ SmallVector<unsigned, 4> Mask;<br>
+ if (sortLoadAccesses(VL, *DL, *SE, Sorted, &Mask)) {<br>
+ auto NewVL = makeArrayRef(Sorted.begin(), Sorted.end());<br>
+ for (unsigned i = 0, e = NewVL.size() - 1; i < e; ++i) {<br>
+ if (!isConsecutiveAccess(NewVL[i]<wbr>, NewVL[i + 1], *DL, *SE)) {<br>
+ ShuffledLoads = false;<br>
+ break;<br>
+ }<br>
+ }<br>
+ // TODO: Tracking how many load wants to have arbitrary shuffled order<br>
+ // would be usefull.<br>
+ if (ShuffledLoads) {<br>
+ DEBUG(dbgs() << "SLP: added a vector of loads which needs "<br>
+ "permutation of loaded lanes.\n");<br>
+ newTreeEntry(NewVL, true, UserTreeIdx, S,<br>
+ makeArrayRef(Mask.begin(), Mask.end()), OpdNum);<br>
+ return;<br>
+ }<br>
+ }<br>
}<br>
+<br>
+ DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");<br>
+ BS.cancelScheduling(VL, VL0);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
case Instruction::ZExt:<br>
@@ -1627,12 +1688,12 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
Type *Ty = cast<Instruction>(VL[i])->getO<wbr>perand(0)->getType();<br>
if (Ty != SrcTy || !isValidElementType(Ty)) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");<br>
return;<br>
}<br>
}<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of casts.\n");<br>
<br>
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br>
@@ -1641,7 +1702,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (Value *j : VL)<br>
Operands.push_back(cast<Instr<wbr>uction>(j)->getOperand(i));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
}<br>
@@ -1655,13 +1716,13 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (Cmp->getPredicate() != P0 ||<br>
Cmp->getOperand(0)->getType() != ComparedTy) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");<br>
return;<br>
}<br>
}<br>
<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of compares.\n");<br>
<br>
for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br>
@@ -1670,7 +1731,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (Value *j : VL)<br>
Operands.push_back(cast<Instr<wbr>uction>(j)->getOperand(i));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
}<br>
@@ -1693,7 +1754,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
case Instruction::And:<br>
case Instruction::Or:<br>
case Instruction::Xor:<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of bin op.\n");<br>
<br>
// Sort operands of the instructions so that each side is more likely to<br>
@@ -1702,7 +1763,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
ValueList Left, Right;<br>
<wbr>reorderInputsAccordingToOpcode<wbr>(S.Opcode, VL, Left, Right);<br>
buildTree_rec(Left, Depth + 1, UserTreeIdx);<br>
- buildTree_rec(Right, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Right, Depth + 1, UserTreeIdx, 1);<br>
return;<br>
}<br>
<br>
@@ -1712,7 +1773,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (Value *j : VL)<br>
Operands.push_back(cast<Instr<wbr>uction>(j)->getOperand(i));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
<br>
@@ -1722,7 +1783,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (cast<Instruction>(VL[j])->get<wbr>NumOperands() != 2) {<br>
DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1735,7 +1796,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
if (Ty0 != CurTy) {<br>
DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
@@ -1747,12 +1808,12 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
DEBUG(<br>
dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
return;<br>
}<br>
}<br>
<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");<br>
for (unsigned i = 0, e = 2; i < e; ++i) {<br>
ValueList Operands;<br>
@@ -1760,7 +1821,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (Value *j : VL)<br>
Operands.push_back(cast<Instr<wbr>uction>(j)->getOperand(i));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
}<br>
@@ -1769,12 +1830,12 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)<br>
if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Non-consecutive store.\n");<br>
return;<br>
}<br>
<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a vector of stores.\n");<br>
<br>
ValueList Operands;<br>
@@ -1792,7 +1853,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI<wbr>, TLI);<br>
if (!isTriviallyVectorizable(ID)) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");<br>
return;<br>
}<br>
@@ -1806,7 +1867,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
getVectorIntrinsicIDForCall(C<wbr>I2, TLI) != ID ||<br>
!CI->hasIdenticalOperandBundl<wbr>eSchema(*CI2)) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: mismatched calls:" << *CI << "!=" << *VL[i]<br>
<< "\n");<br>
return;<br>
@@ -1817,7 +1878,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
Value *A1J = CI2->getArgOperand(1);<br>
if (A1I != A1J) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI<br>
<< " argument "<< A1I<<"!=" << A1J<br>
<< "\n");<br>
@@ -1830,14 +1891,14 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
CI->op_begin() + CI->getBundleOperandsEndIndex(<wbr>),<br>
CI2->op_begin() + CI2->getBundleOperandsStartInd<wbr>ex())) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="<br>
<< *VL[i] << '\n');<br>
return;<br>
}<br>
}<br>
<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {<br>
ValueList Operands;<br>
// Prepare the operand vector.<br>
@@ -1845,7 +1906,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
CallInst *CI2 = dyn_cast<CallInst>(j);<br>
Operands.push_back(CI2->getAr<wbr>gOperand(i));<br>
}<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
}<br>
@@ -1854,11 +1915,11 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
// then do not vectorize this instruction.<br>
if (!S.IsAltShuffle) {<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");<br>
return;<br>
}<br>
- newTreeEntry(VL, true, UserTreeIdx);<br>
+ newTreeEntry(VL, true, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");<br>
<br>
// Reorder operands if reordering would enable vectorization.<br>
@@ -1866,7 +1927,7 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
ValueList Left, Right;<br>
reorderAltShuffleOperands(S.O<wbr>pcode, VL, Left, Right);<br>
buildTree_rec(Left, Depth + 1, UserTreeIdx);<br>
- buildTree_rec(Right, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Right, Depth + 1, UserTreeIdx, 1);<br>
return;<br>
}<br>
<br>
@@ -1876,13 +1937,13 @@ void BoUpSLP::buildTree_rec(ArrayRe<wbr>f<Val<br>
for (Value *j : VL)<br>
Operands.push_back(cast<Instr<wbr>uction>(j)->getOperand(i));<br>
<br>
- buildTree_rec(Operands, Depth + 1, UserTreeIdx);<br>
+ buildTree_rec(Operands, Depth + 1, UserTreeIdx, i);<br>
}<br>
return;<br>
<br>
default:<br>
BS.cancelScheduling(VL, VL0);<br>
- newTreeEntry(VL, false, UserTreeIdx);<br>
+ newTreeEntry(VL, false, UserTreeIdx, S);<br>
DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");<br>
return;<br>
}<br>
@@ -2720,12 +2781,15 @@ Value *BoUpSLP::alreadyVectorized(Ar<wbr>rayR<br>
return nullptr;<br>
}<br>
<br>
-Value *BoUpSLP::vectorizeTree(ArrayR<wbr>ef<Value *> VL) {<br>
+Value *BoUpSLP::vectorizeTree(ArrayR<wbr>ef<Value *> VL, int OpdNum, int UserIndx) {<br>
InstructionsState S = getSameOpcode(VL);<br>
if (S.Opcode) {<br>
if (TreeEntry *E = getTreeEntry(S.OpValue)) {<br>
- if (E->isSame(VL))<br>
- return vectorizeTree(E);<br>
+ TreeEntry *UserTreeEntry = &VectorizableTree[UserIndx];<br>
+ if (E->isSame(VL) ||<br>
+ (UserTreeEntry && !UserTreeEntry->ShuffleMask[Op<wbr>dNum].empty() &&<br>
+ E->isFoundJumbled(VL, *DL, *SE)))<br>
+ return vectorizeTree(E, OpdNum, UserIndx);<br>
}<br>
}<br>
<br>
@@ -2737,9 +2801,11 @@ Value *BoUpSLP::vectorizeTree(ArrayR<wbr>ef<V<br>
return Gather(VL, VecTy);<br>
}<br>
<br>
-Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try *E) {<br>
+Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try *E, int OpdNum, int UserIndx) {<br>
IRBuilder<>::InsertPointGuard Guard(Builder);<br>
<br>
+ int CurrIndx = ScalarToTreeEntry[E->Scalars[0<wbr>]];<br>
+ TreeEntry *UserTreeEntry = nullptr;<br>
if (E->VectorizedValue) {<br>
DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");<br>
return E->VectorizedValue;<br>
@@ -2788,7 +2854,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
Builder.SetInsertPoint(IBB->g<wbr>etTerminator());<br>
Builder.SetCurrentDebugLocati<wbr>on(PH->getDebugLoc());<br>
- Value *Vec = vectorizeTree(Operands);<br>
+ Value *Vec = vectorizeTree(Operands, i, CurrIndx);<br>
NewPhi->addIncoming(Vec, IBB);<br>
}<br>
<br>
@@ -2841,7 +2907,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *InVec = vectorizeTree(INVL);<br>
+ Value *InVec = vectorizeTree(INVL, 0, CurrIndx);<br>
<br>
if (Value *V = alreadyVectorized(E->Scalars, VL0))<br>
return V;<br>
@@ -2862,8 +2928,8 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *L = vectorizeTree(LHSV);<br>
- Value *R = vectorizeTree(RHSV);<br>
+ Value *L = vectorizeTree(LHSV, 0, CurrIndx);<br>
+ Value *R = vectorizeTree(RHSV, 1, CurrIndx);<br>
<br>
if (Value *V = alreadyVectorized(E->Scalars, VL0))<br>
return V;<br>
@@ -2890,9 +2956,9 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *Cond = vectorizeTree(CondVec);<br>
- Value *True = vectorizeTree(TrueVec);<br>
- Value *False = vectorizeTree(FalseVec);<br>
+ Value *Cond = vectorizeTree(CondVec, 0, CurrIndx);<br>
+ Value *True = vectorizeTree(TrueVec, 1, CurrIndx);<br>
+ Value *False = vectorizeTree(FalseVec, 2, CurrIndx);<br>
<br>
if (Value *V = alreadyVectorized(E->Scalars, VL0))<br>
return V;<br>
@@ -2933,8 +2999,8 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *LHS = vectorizeTree(LHSVL);<br>
- Value *RHS = vectorizeTree(RHSVL);<br>
+ Value *LHS = vectorizeTree(LHSVL, 0, CurrIndx);<br>
+ Value *RHS = vectorizeTree(RHSVL, 1, CurrIndx);<br>
<br>
if (Value *V = alreadyVectorized(E->Scalars, VL0))<br>
return V;<br>
@@ -2955,7 +3021,17 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
// sink them all the way down past store instructions.<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- LoadInst *LI = cast<LoadInst>(VL0);<br>
+ if(UserIndx != -1) {<br>
+ UserTreeEntry = &VectorizableTree[UserIndx];<br>
+ }<br>
+<br>
+ LoadInst *LI = NULL;<br>
+ if (UserTreeEntry && !UserTreeEntry->ShuffleMask[Op<wbr>dNum].empty()) {<br>
+ LI = cast<LoadInst>(E->Scalars[0]);<br>
+ } else {<br>
+ LI = cast<LoadInst>(VL0);<br>
+ }<br>
+<br>
Type *ScalarLoadTy = LI->getType();<br>
unsigned AS = LI->getPointerAddressSpace();<br>
<br>
@@ -2977,7 +3053,24 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
LI->setAlignment(Alignment);<br>
E->VectorizedValue = LI;<br>
++NumVectorInstructions;<br>
- return propagateMetadata(LI, E->Scalars);<br>
+ propagateMetadata(LI, E->Scalars);<br>
+<br>
+ if (UserTreeEntry && !UserTreeEntry->ShuffleMask[Op<wbr>dNum].empty()) {<br>
+ SmallVector<Constant *, 8> Mask;<br>
+ for (unsigned Lane = 0, LE = UserTreeEntry->ShuffleMask[Opd<wbr>Num].size();<br>
+ Lane != LE; ++Lane) {<br>
+ Mask.push_back(<br>
+ Builder.getInt32(UserTreeEntry<wbr>->ShuffleMask[OpdNum][Lane]));<br>
+ }<br>
+ // Generate shuffle for jumbled memory access<br>
+ Value *Undef = UndefValue::get(VecTy);<br>
+ Value *Shuf = Builder.CreateShuffleVector((V<wbr>alue *)LI, Undef,<br>
+ ConstantVector::get(Mask));<br>
+ E->VectorizedValue = Shuf;<br>
+ ++NumVectorInstructions;<br>
+ return Shuf;<br>
+ }<br>
+ return LI;<br>
}<br>
case Instruction::Store: {<br>
StoreInst *SI = cast<StoreInst>(VL0);<br>
@@ -2990,7 +3083,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *VecValue = vectorizeTree(ScalarStoreValue<wbr>s);<br>
+ Value *VecValue = vectorizeTree(ScalarStoreValue<wbr>s, 0, CurrIndx);<br>
Value *ScalarPtr = SI->getPointerOperand();<br>
Value *VecPtr = Builder.CreateBitCast(ScalarPt<wbr>r, VecTy->getPointerTo(AS));<br>
StoreInst *S = Builder.CreateStore(VecValue, VecPtr);<br>
@@ -3016,7 +3109,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
for (Value *V : E->Scalars)<br>
Op0VL.push_back(cast<GetEleme<wbr>ntPtrInst>(V)->getOperand(0));<br>
<br>
- Value *Op0 = vectorizeTree(Op0VL);<br>
+ Value *Op0 = vectorizeTree(Op0VL, 0, CurrIndx);<br>
<br>
std::vector<Value *> OpVecs;<br>
for (int j = 1, e = cast<GetElementPtrInst>(VL0)-><wbr>getNumOperands(); j < e;<br>
@@ -3025,7 +3118,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
for (Value *V : E->Scalars)<br>
OpVL.push_back(cast<GetElemen<wbr>tPtrInst>(V)->getOperand(j));<br>
<br>
- Value *OpVec = vectorizeTree(OpVL);<br>
+ Value *OpVec = vectorizeTree(OpVL, j, CurrIndx);<br>
OpVecs.push_back(OpVec);<br>
}<br>
<br>
@@ -3064,7 +3157,7 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
OpVL.push_back(CEI->getArgOpe<wbr>rand(j));<br>
}<br>
<br>
- Value *OpVec = vectorizeTree(OpVL);<br>
+ Value *OpVec = vectorizeTree(OpVL, j, CurrIndx);<br>
DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");<br>
OpVecs.push_back(OpVec);<br>
}<br>
@@ -3095,8 +3188,8 @@ Value *BoUpSLP::vectorizeTree(TreeEn<wbr>try<br>
reorderAltShuffleOperands(S.O<wbr>pcode, E->Scalars, LHSVL, RHSVL);<br>
setInsertPointAfterBundle(E-><wbr>Scalars, VL0);<br>
<br>
- Value *LHS = vectorizeTree(LHSVL);<br>
- Value *RHS = vectorizeTree(RHSVL);<br>
+ Value *LHS = vectorizeTree(LHSVL, 0, CurrIndx);<br>
+ Value *RHS = vectorizeTree(RHSVL, 1, CurrIndx);<br>
<br>
if (Value *V = alreadyVectorized(E->Scalars, VL0))<br>
return V;<br>
@@ -3198,7 +3291,13 @@ BoUpSLP::vectorizeTree(ExtraVa<wbr>lueToDebug<br>
assert(E && "Invalid scalar");<br>
assert(!E->NeedToGather && "Extracting from a gather list");<br>
<br>
- Value *Vec = E->VectorizedValue;<br>
+ Value *Vec = nullptr;<br>
+ if ((Vec = dyn_cast<ShuffleVectorInst>(E-<wbr>>VectorizedValue)) &&<br>
+ dyn_cast<LoadInst>(cast<Instru<wbr>ction>(Vec)->getOperand(0))) {<br>
+ Vec = cast<Instruction>(E->Vectorize<wbr>dValue)->getOperand(0);<br>
+ } else {<br>
+ Vec = E->VectorizedValue;<br>
+ }<br>
assert(Vec && "Can't find vectorizable value");<br>
<br>
Value *Lane = Builder.getInt32(ExternalUse.L<wbr>ane);<br>
<br>
Modified: llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>multiuse.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/Transfor<wbr>ms/SLPVectorizer/X86/jumbled-<wbr>load-multiuse.ll?rev=313736&<wbr>r1=313735&r2=313736&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>multiuse.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>multiuse.ll Wed Sep 20 01:18:28 2017<br>
@@ -11,20 +11,16 @@<br>
define i32 @fn1() {<br>
; CHECK-LABEL: @fn1(<br>
; CHECK-NEXT: entry:<br>
-; CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4<br>
-; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4<br>
-; CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 2), align 4<br>
-; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 3), align 4<br>
-; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1]], i32 0<br>
-; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP2]], i32 1<br>
-; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP3]], i32 2<br>
-; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP0]], i32 3<br>
-; CHECK-NEXT: [[TMP8:%.*]] = icmp sgt <4 x i32> [[TMP7]], zeroinitializer<br>
-; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 1<br>
-; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 2<br>
-; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 8, i32 3<br>
-; CHECK-NEXT: [[TMP12:%.*]] = select <4 x i1> [[TMP8]], <4 x i32> [[TMP11]], <4 x i32> <i32 6, i32 0, i32 0, i32 0><br>
-; CHECK-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4<br>
+; CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* bitcast ([4 x i32]* @b to <4 x i32>*), align 4<br>
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0><br>
+; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer<br>
+; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1<br>
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP3]], i32 0<br>
+; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 1<br>
+; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 2<br>
+; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 8, i32 3<br>
+; CHECK-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP7]], <4 x i32> <i32 6, i32 0, i32 0, i32 0><br>
+; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4<br>
; CHECK-NEXT: ret i32 0<br>
;<br>
entry:<br>
<br>
Added: llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>shuffle-placement.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll?rev=313736&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/Transfor<wbr>ms/SLPVectorizer/X86/jumbled-<wbr>load-shuffle-placement.ll?rev=<wbr>313736&view=auto</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>shuffle-placement.ll (added)<br>
+++ llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load-<wbr>shuffle-placement.ll Wed Sep 20 01:18:28 2017<br>
@@ -0,0 +1,68 @@<br>
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py<br>
+; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer | FileCheck %s<br>
+<br>
+<br>
+;void jumble (int * restrict A, int * restrict B) {<br>
+ ; int tmp0 = A[10]*A[0];<br>
+ ; int tmp1 = A[11]*A[1];<br>
+ ; int tmp2 = A[12]*A[3];<br>
+ ; int tmp3 = A[13]*A[2];<br>
+ ; B[0] = tmp0;<br>
+ ; B[1] = tmp1;<br>
+ ; B[2] = tmp2;<br>
+ ; B[3] = tmp3;<br>
+ ;}<br>
+ ; Function Attrs: norecurse nounwind uwtable<br>
+ define void @jumble(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) {<br>
+; CHECK-LABEL: @jumble(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 10<br>
+; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 11<br>
+; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 1<br>
+; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 12<br>
+; CHECK-NEXT: [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 3<br>
+; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 13<br>
+; CHECK-NEXT: [[TMP0:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br>
+; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 2<br>
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32* [[A]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP3:%.*]] = load <4 x i32>, <4 x i32>* [[TMP2]], align 4<br>
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
+; CHECK-NEXT: [[TMP5:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP1]]<br>
+; CHECK-NEXT: [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 1<br>
+; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 2<br>
+; CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 3<br>
+; CHECK-NEXT: [[TMP6:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
+; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4<br>
+; CHECK-NEXT: ret void<br>
+;<br>
+entry:<br>
+ %arrayidx = getelementptr inbounds i32, i32* %A, i64 10<br>
+ %0 = load i32, i32* %arrayidx, align 4<br>
+ %1 = load i32, i32* %A, align 4<br>
+ %mul = mul nsw i32 %1, %0<br>
+ %arrayidx2 = getelementptr inbounds i32, i32* %A, i64 11<br>
+ %2 = load i32, i32* %arrayidx2, align 4<br>
+ %arrayidx3 = getelementptr inbounds i32, i32* %A, i64 1<br>
+ %3 = load i32, i32* %arrayidx3, align 4<br>
+ %mul4 = mul nsw i32 %3, %2<br>
+ %arrayidx5 = getelementptr inbounds i32, i32* %A, i64 12<br>
+ %4 = load i32, i32* %arrayidx5, align 4<br>
+ %arrayidx6 = getelementptr inbounds i32, i32* %A, i64 3<br>
+ %5 = load i32, i32* %arrayidx6, align 4<br>
+ %mul7 = mul nsw i32 %5, %4<br>
+ %arrayidx8 = getelementptr inbounds i32, i32* %A, i64 13<br>
+ %6 = load i32, i32* %arrayidx8, align 4<br>
+ %arrayidx9 = getelementptr inbounds i32, i32* %A, i64 2<br>
+ %7 = load i32, i32* %arrayidx9, align 4<br>
+ %mul10 = mul nsw i32 %7, %6<br>
+ store i32 %mul, i32* %B, align 4<br>
+ %arrayidx12 = getelementptr inbounds i32, i32* %B, i64 1<br>
+ store i32 %mul4, i32* %arrayidx12, align 4<br>
+ %arrayidx13 = getelementptr inbounds i32, i32* %B, i64 2<br>
+ store i32 %mul7, i32* %arrayidx13, align 4<br>
+ %arrayidx14 = getelementptr inbounds i32, i32* %B, i64 3<br>
+ store i32 %mul10, i32* %arrayidx14, align 4<br>
+ ret void<br>
+ }<br>
+<br>
<br>
Modified: llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/Transfor<wbr>ms/SLPVectorizer/X86/jumbled-<wbr>load.ll?rev=313736&r1=313735&<wbr>r2=313736&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/jumbled-load.ll Wed Sep 20 01:18:28 2017<br>
@@ -5,34 +5,27 @@<br>
<br>
define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {<br>
; CHECK-LABEL: @jumbled-load(<br>
-; CHECK-NEXT: [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* %in, i64 0<br>
-; CHECK-NEXT: [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4<br>
+; CHECK-NEXT: [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[IN:%.*]], i64 0<br>
; CHECK-NEXT: [[GEP_1:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 3<br>
-; CHECK-NEXT: [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4<br>
; CHECK-NEXT: [[GEP_2:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 1<br>
-; CHECK-NEXT: [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4<br>
; CHECK-NEXT: [[GEP_3:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 2<br>
-; CHECK-NEXT: [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4<br>
-; CHECK-NEXT: [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* %inn, i64 0<br>
-; CHECK-NEXT: [[LOAD_5:%.*]] = load i32, i32* [[INN_ADDR]], align 4<br>
+; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br>
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0><br>
+; CHECK-NEXT: [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0<br>
; CHECK-NEXT: [[GEP_4:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 2<br>
-; CHECK-NEXT: [[LOAD_6:%.*]] = load i32, i32* [[GEP_4]], align 4<br>
; CHECK-NEXT: [[GEP_5:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 3<br>
-; CHECK-NEXT: [[LOAD_7:%.*]] = load i32, i32* [[GEP_5]], align 4<br>
; CHECK-NEXT: [[GEP_6:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 1<br>
-; CHECK-NEXT: [[LOAD_8:%.*]] = load i32, i32* [[GEP_6]], align 4<br>
-; CHECK-NEXT: [[MUL_1:%.*]] = mul i32 [[LOAD_3]], [[LOAD_5]]<br>
-; CHECK-NEXT: [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_8]]<br>
-; CHECK-NEXT: [[MUL_3:%.*]] = mul i32 [[LOAD_4]], [[LOAD_7]]<br>
-; CHECK-NEXT: [[MUL_4:%.*]] = mul i32 [[LOAD_1]], [[LOAD_6]]<br>
-; CHECK-NEXT: [[GEP_7:%.*]] = getelementptr inbounds i32, i32* %out, i64 0<br>
-; CHECK-NEXT: store i32 [[MUL_1]], i32* [[GEP_7]], align 4<br>
-; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr inbounds i32, i32* %out, i64 1<br>
-; CHECK-NEXT: store i32 [[MUL_2]], i32* [[GEP_8]], align 4<br>
-; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i32, i32* %out, i64 2<br>
-; CHECK-NEXT: store i32 [[MUL_3]], i32* [[GEP_9]], align 4<br>
-; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i32, i32* %out, i64 3<br>
-; CHECK-NEXT: store i32 [[MUL_4]], i32* [[GEP_10]], align 4<br>
+; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[INN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]], align 4<br>
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
+; CHECK-NEXT: [[TMP7:%.*]] = mul <4 x i32> [[TMP3]], [[TMP6]]<br>
+; CHECK-NEXT: [[GEP_7:%.*]] = getelementptr inbounds i32, i32* [[OUT:%.*]], i64 0<br>
+; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 1<br>
+; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 2<br>
+; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 3<br>
+; CHECK-NEXT: [[TMP8:%.*]] = bitcast i32* [[GEP_7]] to <4 x i32>*<br>
+; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4<br>
; CHECK-NEXT: ret i32 undef<br>
;<br>
%in.addr = getelementptr inbounds i32, i32* %in, i64 0<br>
<br>
Modified: llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/store-jumbled.<wbr>ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/store-jumbled.ll?rev=313736&r1=313735&r2=313736&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/Transfor<wbr>ms/SLPVectorizer/X86/store-<wbr>jumbled.ll?rev=313736&r1=<wbr>313735&r2=313736&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/store-jumbled.<wbr>ll (original)<br>
+++ llvm/trunk/test/Transforms/SLP<wbr>Vectorizer/X86/store-jumbled.<wbr>ll Wed Sep 20 01:18:28 2017<br>
@@ -6,33 +6,26 @@<br>
define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {<br>
; CHECK-LABEL: @jumbled-load(<br>
; CHECK-NEXT: [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[IN:%.*]], i64 0<br>
-; CHECK-NEXT: [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4<br>
; CHECK-NEXT: [[GEP_1:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 1<br>
-; CHECK-NEXT: [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4<br>
; CHECK-NEXT: [[GEP_2:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 2<br>
-; CHECK-NEXT: [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4<br>
; CHECK-NEXT: [[GEP_3:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 3<br>
-; CHECK-NEXT: [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4<br>
+; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br>
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 0, i32 2><br>
; CHECK-NEXT: [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0<br>
-; CHECK-NEXT: [[LOAD_5:%.*]] = load i32, i32* [[INN_ADDR]], align 4<br>
; CHECK-NEXT: [[GEP_4:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 1<br>
-; CHECK-NEXT: [[LOAD_6:%.*]] = load i32, i32* [[GEP_4]], align 4<br>
; CHECK-NEXT: [[GEP_5:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 2<br>
-; CHECK-NEXT: [[LOAD_7:%.*]] = load i32, i32* [[GEP_5]], align 4<br>
; CHECK-NEXT: [[GEP_6:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 3<br>
-; CHECK-NEXT: [[LOAD_8:%.*]] = load i32, i32* [[GEP_6]], align 4<br>
-; CHECK-NEXT: [[MUL_1:%.*]] = mul i32 [[LOAD_1]], [[LOAD_5]]<br>
-; CHECK-NEXT: [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_6]]<br>
-; CHECK-NEXT: [[MUL_3:%.*]] = mul i32 [[LOAD_3]], [[LOAD_7]]<br>
-; CHECK-NEXT: [[MUL_4:%.*]] = mul i32 [[LOAD_4]], [[LOAD_8]]<br>
+; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[INN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]], align 4<br>
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 0, i32 2><br>
+; CHECK-NEXT: [[TMP7:%.*]] = mul <4 x i32> [[TMP3]], [[TMP6]]<br>
; CHECK-NEXT: [[GEP_7:%.*]] = getelementptr inbounds i32, i32* [[OUT:%.*]], i64 0<br>
; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 1<br>
; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 2<br>
; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 3<br>
-; CHECK-NEXT: store i32 [[MUL_1]], i32* [[GEP_9]], align 4<br>
-; CHECK-NEXT: store i32 [[MUL_2]], i32* [[GEP_7]], align 4<br>
-; CHECK-NEXT: store i32 [[MUL_3]], i32* [[GEP_10]], align 4<br>
-; CHECK-NEXT: store i32 [[MUL_4]], i32* [[GEP_8]], align 4<br>
+; CHECK-NEXT: [[TMP8:%.*]] = bitcast i32* [[GEP_7]] to <4 x i32>*<br>
+; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4<br>
; CHECK-NEXT: ret i32 undef<br>
;<br>
%in.addr = getelementptr inbounds i32, i32* %in, i64 0<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>