<div dir="ltr">NP, and awesome work on the SLP vectorizer!</div><br><div class="gmail_quote"><div dir="ltr">On Tue, Apr 3, 2018 at 3:33 AM Alexey Bataev <<a href="mailto:a.bataev@hotmail.com">a.bataev@hotmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



<div dir="auto">
Chandler, thanks for the fix and sorry for the mess. <br>
<br>
<div id="m_1634489278617791328AppleMailSignature">Best regards,
<div>Alexey Bataev</div>
</div>
<div><br>
3 апр. 2018 г., в 1:30, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> написал(а):<br>
<br>
</div></div><div dir="auto">
<blockquote type="cite">
<div>
<div dir="ltr">FIxed this an some other issues in r329046.</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Mon, Apr 2, 2018 at 8:13 PM Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">This appears to print to stderr unconditionally in !NDEBUG builds? =[ It's causing lots of full-screen for me.
<div><br>
</div>
<div>If you or someone else don't get to fixing this soon, I guess I will.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Mon, Apr 2, 2018 at 7:54 AM Alexey Bataev via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: abataev<br>
Date: Mon Apr  2 07:51:37 2018<br>
New Revision: 328980<br>
<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%3Frev%3D328980%26view%3Drev&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524320558&sdata=slt6Hirrbh%2FCqG8hQKMljZYzASU1BrW5ZEBVpMJbvGg%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project?rev=328980&view=rev</a><br>
Log:<br>
[SLP] Fix PR36481: vectorize reassociated instructions.<br>
<br>
Summary:<br>
If the load/extractelement/extractvalue instructions are not originally<br>
consecutive, the SLP vectorizer is unable to vectorize them. Patch<br>
allows reordering of such instructions.<br>
<br>
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid<br>
<br>
Subscribers: llvm-commits<br>
<br>
Differential Revision: <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD43776&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=CUfJYrmzjl%2FB2LgPMv8e%2FBticQTGTlN3CZ8iLUEDtYw%3D&reserved=0" rel="noreferrer" target="_blank">
https://reviews.llvm.org/D43776</a><br>
<br>
Modified:<br>
    llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h<br>
    llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp<br>
    llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll<br>
    llvm/trunk/test/Transforms/SLPVectorizer/X86/store-jumbled.ll<br>
<br>
Modified: llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Finclude%2Fllvm%2FAnalysis%2FLoopAccessAnalysis.h%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=3C8okehop%2FZhyIbBnqfGkWbvReOl0Sg0r118ENjstQk%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h (original)<br>
+++ llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h Mon Apr  2 07:51:37 2018<br>
@@ -667,6 +667,20 @@ int64_t getPtrStride(PredicatedScalarEvo<br>
                      const ValueToValueMap &StridesMap = ValueToValueMap(),<br>
                      bool Assume = false, bool ShouldCheckWrap = true);<br>
<br>
+/// \brief Attempt to sort the pointers in \p VL and return the sorted indices<br>
+/// in \p SortedIndices, if reordering is required.<br>
+///<br>
+/// Returns 'true' if sorting is legal, otherwise returns 'false'.<br>
+///<br>
+/// For example, for a given \p VL of memory accesses in program order, a[i+4],<br>
+/// a[i+0], a[i+1] and a[i+7], this function will sort the \p VL and save the<br>
+/// sorted indices in \p SortedIndices as a[i+0], a[i+1], a[i+4], a[i+7] and<br>
+/// saves the mask for actual memory accesses in program order in<br>
+/// \p SortedIndices as <1,2,0,3><br>
+bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,<br>
+                     ScalarEvolution &SE,<br>
+                     SmallVectorImpl<unsigned> &SortedIndices);<br>
+<br>
 /// \brief Returns true if the memory operations \p A and \p B are consecutive.<br>
 /// This is a simple API that does not depend on the analysis pass.<br>
 bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,<br>
<br>
Modified: llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Flib%2FAnalysis%2FLoopAccessAnalysis.cpp%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=3IHWBFxbCEJJF%2BfiUXPgxWTviVVZNoW8QzacBu1sdiw%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp (original)<br>
+++ llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp Mon Apr  2 07:51:37 2018<br>
@@ -1087,6 +1087,67 @@ int64_t llvm::getPtrStride(PredicatedSca<br>
   return Stride;<br>
 }<br>
<br>
+bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,<br>
+                           ScalarEvolution &SE,<br>
+                           SmallVectorImpl<unsigned> &SortedIndices) {<br>
+  assert(llvm::all_of(<br>
+             VL, [](const Value *V) { return V->getType()->isPointerTy(); }) &&<br>
+         "Expected list of pointer operands.");<br>
+  SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;<br>
+  OffValPairs.reserve(VL.size());<br>
+<br>
+  // Walk over the pointers, and map each of them to an offset relative to<br>
+  // first pointer in the array.<br>
+  Value *Ptr0 = VL[0];<br>
+  const SCEV *Scev0 = SE.getSCEV(Ptr0);<br>
+  Value *Obj0 = GetUnderlyingObject(Ptr0, DL);<br>
+<br>
+  llvm::SmallSet<int64_t, 4> Offsets;<br>
+  for (auto *Ptr : VL) {<br>
+    // TODO: Outline this code as a special, more time consuming, version of<br>
+    // computeConstantDifference() function.<br>
+    if (Ptr->getType()->getPointerAddressSpace() !=<br>
+        Ptr0->getType()->getPointerAddressSpace())<br>
+      return false;<br>
+    // If a pointer refers to a different underlying object, bail - the<br>
+    // pointers are by definition incomparable.<br>
+    Value *CurrObj = GetUnderlyingObject(Ptr, DL);<br>
+    if (CurrObj != Obj0)<br>
+      return false;<br>
+<br>
+    const SCEV *Scev = SE.getSCEV(Ptr);<br>
+    const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev, Scev0));<br>
+    // The pointers may not have a constant offset from each other, or SCEV<br>
+    // may just not be smart enough to figure out they do. Regardless,<br>
+    // there's nothing we can do.<br>
+    if (!Diff)<br>
+      return false;<br>
+<br>
+    // Check if the pointer with the same offset is found.<br>
+    int64_t Offset = Diff->getAPInt().getSExtValue();<br>
+    if (!Offsets.insert(Offset).second)<br>
+      return false;<br>
+    OffValPairs.emplace_back(Offset, Ptr);<br>
+  }<br>
+  SortedIndices.clear();<br>
+  SortedIndices.resize(VL.size());<br>
+  std::iota(SortedIndices.begin(), SortedIndices.end(), 0);<br>
+<br>
+  // Sort the memory accesses and keep the order of their uses in UseOrder.<br>
+  std::stable_sort(SortedIndices.begin(), SortedIndices.end(),<br>
+                   [&OffValPairs](unsigned Left, unsigned Right) {<br>
+                     return OffValPairs[Left].first < OffValPairs[Right].first;<br>
+                   });<br>
+<br>
+  // Check if the order is consecutive already.<br>
+  if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) {<br>
+        return I == SortedIndices[I];<br>
+      }))<br>
+    SortedIndices.clear();<br>
+<br>
+  return true;<br>
+}<br>
+<br>
 /// Take the address space operand from the Load/Store instruction.<br>
 /// Returns -1 if this is not a valid Load/Store instruction.<br>
 static unsigned getAddressSpaceOperand(Value *I) {<br>
<br>
Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Flib%2FTransforms%2FVectorize%2FSLPVectorizer.cpp%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=iE1yi%2BTrzO6ceKblu2mZjGVZ0KQ2FBt1ML%2BrHAukEy0%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)<br>
+++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Mon Apr  2 07:51:37 2018<br>
@@ -452,16 +452,21 @@ static bool allSameType(ArrayRef<Value *<br>
 }<br>
<br>
 /// \returns True if Extract{Value,Element} instruction extracts element Idx.<br>
-static bool matchExtractIndex(Instruction *E, unsigned Idx, unsigned Opcode) {<br>
-  assert(Opcode == Instruction::ExtractElement ||<br>
-         Opcode == Instruction::ExtractValue);<br>
+static Optional<unsigned> getExtractIndex(Instruction *E) {<br>
+  unsigned Opcode = E->getOpcode();<br>
+  assert((Opcode == Instruction::ExtractElement ||<br>
+          Opcode == Instruction::ExtractValue) &&<br>
+         "Expected extractelement or extractvalue instruction.");<br>
   if (Opcode == Instruction::ExtractElement) {<br>
-    ConstantInt *CI = dyn_cast<ConstantInt>(E->getOperand(1));<br>
-    return CI && CI->getZExtValue() == Idx;<br>
-  } else {<br>
-    ExtractValueInst *EI = cast<ExtractValueInst>(E);<br>
-    return EI->getNumIndices() == 1 && *EI->idx_begin() == Idx;<br>
-  }<br>
+    auto *CI = dyn_cast<ConstantInt>(E->getOperand(1));<br>
+    if (!CI)<br>
+      return None;<br>
+    return CI->getZExtValue();<br>
+  }<br>
+  ExtractValueInst *EI = cast<ExtractValueInst>(E);<br>
+  if (EI->getNumIndices() != 1)<br>
+    return None;<br>
+  return *EI->idx_begin();<br>
 }<br>
<br>
 /// \returns True if in-tree use also needs extract. This refers to<br>
@@ -586,6 +591,7 @@ public:<br>
     MustGather.clear();<br>
     ExternalUses.clear();<br>
     NumOpsWantToKeepOrder.clear();<br>
+    NumOpsWantToKeepOriginalOrder = 0;<br>
     for (auto &Iter : BlocksSchedules) {<br>
       BlockScheduling *BS = Iter.second.get();<br>
       BS->clear();<br>
@@ -598,14 +604,18 @@ public:<br>
   /// \brief Perform LICM and CSE on the newly generated gather sequences.<br>
   void optimizeGatherSequence();<br>
<br>
-  /// \returns true if it is beneficial to reverse the vector order.<br>
-  bool shouldReorder() const {<br>
-    return std::accumulate(<br>
-               NumOpsWantToKeepOrder.begin(), NumOpsWantToKeepOrder.end(), 0,<br>
-               [](int Val1,<br>
-                  const decltype(NumOpsWantToKeepOrder)::value_type &Val2) {<br>
-                 return Val1 + (Val2.second < 0 ? 1 : -1);<br>
-               }) > 0;<br>
+  /// \returns The best order of instructions for vectorization.<br>
+  Optional<ArrayRef<unsigned>> bestOrder() const {<br>
+    auto I = std::max_element(<br>
+        NumOpsWantToKeepOrder.begin(), NumOpsWantToKeepOrder.end(),<br>
+        [](const decltype(NumOpsWantToKeepOrder)::value_type &D1,<br>
+           const decltype(NumOpsWantToKeepOrder)::value_type &D2) {<br>
+          return D1.second < D2.second;<br>
+        });<br>
+    if (I == NumOpsWantToKeepOrder.end() || I->getSecond() <= NumOpsWantToKeepOriginalOrder)<br>
+      return None;<br>
+<br>
+    return makeArrayRef(I->getFirst());<br>
   }<br>
<br>
   /// \return The vector element size in bits to use when vectorizing the<br>
@@ -652,9 +662,13 @@ private:<br>
   /// This is the recursive part of buildTree.<br>
   void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);<br>
<br>
-  /// \returns True if the ExtractElement/ExtractValue instructions in VL can<br>
-  /// be vectorized to use the original vector (or aggregate "bitcast" to a vector).<br>
-  bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue) const;<br>
+  /// \returns true if the ExtractElement/ExtractValue instructions in \p VL can<br>
+  /// be vectorized to use the original vector (or aggregate "bitcast" to a<br>
+  /// vector) and sets \p CurrentOrder to the identity permutation; otherwise<br>
+  /// returns false, setting \p CurrentOrder to either an empty vector or a<br>
+  /// non-identity permutation that allows to reuse extract instructions.<br>
+  bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue,<br>
+                       SmallVectorImpl<unsigned> &CurrentOrder) const;<br>
<br>
   /// Vectorize a single entry in the tree.<br>
   Value *vectorizeTree(TreeEntry *E);<br>
@@ -718,6 +732,9 @@ private:<br>
     /// Does this sequence require some shuffling?<br>
     SmallVector<unsigned, 4> ReuseShuffleIndices;<br>
<br>
+    /// Does this entry require reordering?<br>
+    ArrayRef<unsigned> ReorderIndices;<br>
+<br>
     /// Points back to the VectorizableTree.<br>
     ///<br>
     /// Only used for Graphviz right now.  Unfortunately GraphTrait::NodeRef has<br>
@@ -733,7 +750,8 @@ private:<br>
<br>
   /// Create a new VectorizableTree entry.<br>
   void newTreeEntry(ArrayRef<Value *> VL, bool Vectorized, int &UserTreeIdx,<br>
-                    ArrayRef<unsigned> ReuseShuffleIndices = None) {<br>
+                    ArrayRef<unsigned> ReuseShuffleIndices = None,<br>
+                    ArrayRef<unsigned> ReorderIndices = None) {<br>
     VectorizableTree.emplace_back(VectorizableTree);<br>
     int idx = VectorizableTree.size() - 1;<br>
     TreeEntry *Last = &VectorizableTree[idx];<br>
@@ -741,6 +759,7 @@ private:<br>
     Last->NeedToGather = !Vectorized;<br>
     Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),<br>
                                      ReuseShuffleIndices.end());<br>
+    Last->ReorderIndices = ReorderIndices;<br>
     if (Vectorized) {<br>
       for (int i = 0, e = VL.size(); i != e; ++i) {<br>
         assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");<br>
@@ -1202,10 +1221,38 @@ private:<br>
   /// List of users to ignore during scheduling and that don't need extracting.<br>
   ArrayRef<Value *> UserIgnoreList;<br>
<br>
-  /// Number of operation bundles that contain consecutive operations - number<br>
-  /// of operation bundles that contain consecutive operations in reversed<br>
-  /// order.<br>
-  DenseMap<unsigned, int> NumOpsWantToKeepOrder;<br>
+  using OrdersType = SmallVector<unsigned, 4>;<br>
+  /// A DenseMapInfo implementation for holding DenseMaps and DenseSets of<br>
+  /// sorted SmallVectors of unsigned.<br>
+  struct OrdersTypeDenseMapInfo {<br>
+    static OrdersType getEmptyKey() {<br>
+      OrdersType V;<br>
+      V.push_back(~1U);<br>
+      return V;<br>
+    }<br>
+<br>
+    static OrdersType getTombstoneKey() {<br>
+      OrdersType V;<br>
+      V.push_back(~2U);<br>
+      return V;<br>
+    }<br>
+<br>
+    static unsigned getHashValue(const OrdersType &V) {<br>
+      return static_cast<unsigned>(hash_combine_range(V.begin(), V.end()));<br>
+    }<br>
+<br>
+    static bool isEqual(const OrdersType &LHS, const OrdersType &RHS) {<br>
+      return LHS == RHS;<br>
+    }<br>
+  };<br>
+<br>
+  /// Contains orders of operations along with the number of bundles that have<br>
+  /// operations in this order. It stores only those orders that require<br>
+  /// reordering, if reordering is not required it is counted using \a<br>
+  /// NumOpsWantToKeepOriginalOrder.<br>
+  DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo> NumOpsWantToKeepOrder;<br>
+  /// Number of bundles that do not require reordering.<br>
+  unsigned NumOpsWantToKeepOriginalOrder = 0;<br>
<br>
   // Analysis and block reference.<br>
   Function *F;<br>
@@ -1557,17 +1604,35 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br>
     }<br>
     case Instruction::ExtractValue:<br>
     case Instruction::ExtractElement: {<br>
-      bool Reuse = canReuseExtract(VL, VL0);<br>
+      OrdersType CurrentOrder;<br>
+      bool Reuse = canReuseExtract(VL, VL0, CurrentOrder);<br>
       if (Reuse) {<br>
         DEBUG(dbgs() << "SLP: Reusing or shuffling extract sequence.\n");<br>
-        ++NumOpsWantToKeepOrder[S.Opcode];<br>
-      } else {<br>
-        SmallVector<Value *, 4> ReverseVL(VL.rbegin(), VL.rend());<br>
-        if (canReuseExtract(ReverseVL, VL0))<br>
-          --NumOpsWantToKeepOrder[S.Opcode];<br>
-        BS.cancelScheduling(VL, VL0);<br>
+        ++NumOpsWantToKeepOriginalOrder;<br>
+        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,<br>
+                     ReuseShuffleIndicies);<br>
+        return;<br>
       }<br>
-      newTreeEntry(VL, Reuse, UserTreeIdx, ReuseShuffleIndicies);<br>
+      if (!CurrentOrder.empty()) {<br>
+#ifndef NDEBUG<br>
+        dbgs() << "SLP: Reusing or shuffling of reordered extract sequence "<br>
+                  "with order";<br>
+        for (unsigned Idx : CurrentOrder)<br>
+          dbgs() << " " << Idx;<br>
+        dbgs() << "\n";<br>
+#endif // NDEBUG<br>
+        // Insert new order with initial value 0, if it does not exist,<br>
+        // otherwise return the iterator to the existing one.<br>
+        auto StoredCurrentOrderAndNum =<br>
+            NumOpsWantToKeepOrder.try_emplace(CurrentOrder).first;<br>
+        ++StoredCurrentOrderAndNum->getSecond();<br>
+        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx, ReuseShuffleIndicies,<br>
+                     StoredCurrentOrderAndNum->getFirst());<br>
+        return;<br>
+      }<br>
+      DEBUG(dbgs() << "SLP: Gather extract sequence.\n");<br>
+      newTreeEntry(VL, /*Vectorized=*/false, UserTreeIdx, ReuseShuffleIndicies);<br>
+      BS.cancelScheduling(VL, VL0);<br>
       return;<br>
     }<br>
     case Instruction::Load: {<br>
@@ -1589,51 +1654,55 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val<br>
<br>
       // Make sure all loads in the bundle are simple - we can't vectorize<br>
       // atomic or volatile loads.<br>
-      for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {<br>
-        LoadInst *L = cast<LoadInst>(VL[i]);<br>
+      SmallVector<Value *, 4> PointerOps(VL.size());<br>
+      auto POIter = PointerOps.begin();<br>
+      for (Value *V : VL) {<br>
+        auto *L = cast<LoadInst>(V);<br>
         if (!L->isSimple()) {<br>
           BS.cancelScheduling(VL, VL0);<br>
           newTreeEntry(VL, false, UserTreeIdx, ReuseShuffleIndicies);<br>
           DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");<br>
           return;<br>
         }<br>
+        *POIter = L->getPointerOperand();<br>
+        ++POIter;<br>
       }<br>
<br>
-      // Check if the loads are consecutive, reversed, or neither.<br>
-      // TODO: What we really want is to sort the loads, but for now, check<br>
-      // the two likely directions.<br>
-      bool Consecutive = true;<br>
-      bool ReverseConsecutive = true;<br>
-      for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {<br>
-        if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {<br>
-          Consecutive = false;<br>
-          break;<br>
+      OrdersType CurrentOrder;<br>
+      // Check the order of pointer operands.<br>
+      if (llvm::sortPtrAccesses(PointerOps, *DL, *SE, CurrentOrder)) {<br>
+        Value *Ptr0;<br>
+        Value *PtrN;<br>
+        if (CurrentOrder.empty()) {<br>
+          Ptr0 = PointerOps.front();<br>
+          PtrN = PointerOps.back();<br>
         } else {<br>
-          ReverseConsecutive = false;<br>
+          Ptr0 = PointerOps[CurrentOrder.front()];<br>
+          PtrN = PointerOps[CurrentOrder.back()];<br>
         }<br>
-      }<br>
-<br>
-      if (Consecutive) {<br>
-        ++NumOpsWantToKeepOrder[S.Opcode];<br>
-        newTreeEntry(VL, true, UserTreeIdx, ReuseShuffleIndicies);<br>
-        DEBUG(dbgs() << "SLP: added a vector of loads.\n");<br>
-        return;<br>
-      }<br>
-<br>
-      // If none of the load pairs were consecutive when checked in order,<br>
-      // check the reverse order.<br>
-      if (ReverseConsecutive)<br>
-        for (unsigned i = VL.size() - 1; i > 0; --i)<br>
-          if (!isConsecutiveAccess(VL[i], VL[i - 1], *DL, *SE)) {<br>
-            ReverseConsecutive = false;<br>
-            break;<br>
+        const SCEV *Scev0 = SE->getSCEV(Ptr0);<br>
+        const SCEV *ScevN = SE->getSCEV(PtrN);<br>
+        const auto *Diff =<br>
+            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));<br>
+        uint64_t Size = DL->getTypeAllocSize(ScalarTy);<br>
+        // Check that the sorted loads are consecutive.<br>
+        if (Diff && Diff->getAPInt().getZExtValue() == (VL.size() - 1) * Size) {<br>
+          if (CurrentOrder.empty()) {<br>
+            // Original loads are consecutive and does not require reordering.<br>
+            ++NumOpsWantToKeepOriginalOrder;<br>
+            newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,<br>
+                         ReuseShuffleIndicies);<br>
+            DEBUG(dbgs() << "SLP: added a vector of loads.\n");<br>
+          } else {<br>
+            // Need to reorder.<br>
+            auto I = NumOpsWantToKeepOrder.try_emplace(CurrentOrder).first;<br>
+            ++I->getSecond();<br>
+            newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,<br>
+                         ReuseShuffleIndicies, I->getFirst());<br>
+            DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");<br>
           }<br>
-<br>
-      if (ReverseConsecutive) {<br>
-        --NumOpsWantToKeepOrder[S.Opcode];<br>
-        newTreeEntry(VL, true, UserTreeIdx, ReuseShuffleIndicies);<br>
-        DEBUG(dbgs() << "SLP: added a vector of reversed loads.\n");<br>
-        return;<br>
+          return;<br>
+        }<br>
       }<br>
<br>
       DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");<br>
@@ -1944,7 +2013,8 @@ unsigned BoUpSLP::canMapToVector(Type *T<br>
   return N;<br>
 }<br>
<br>
-bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, Value *OpValue) const {<br>
+bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, Value *OpValue,<br>
+                              SmallVectorImpl<unsigned> &CurrentOrder) const {<br>
   Instruction *E0 = cast<Instruction>(OpValue);<br>
   assert(E0->getOpcode() == Instruction::ExtractElement ||<br>
          E0->getOpcode() == Instruction::ExtractValue);<br>
@@ -1953,6 +2023,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<V<br>
   // correct offset.<br>
   Value *Vec = E0->getOperand(0);<br>
<br>
+  CurrentOrder.clear();<br>
+<br>
   // We have to extract from a vector/aggregate with the same number of elements.<br>
   unsigned NElts;<br>
   if (E0->getOpcode() == Instruction::ExtractValue) {<br>
@@ -1972,15 +2044,40 @@ bool BoUpSLP::canReuseExtract(ArrayRef<V<br>
     return false;<br>
<br>
   // Check that all of the indices extract from the correct offset.<br>
-  for (unsigned I = 0, E = VL.size(); I < E; ++I) {<br>
-    Instruction *Inst = cast<Instruction>(VL[I]);<br>
-    if (!matchExtractIndex(Inst, I, Inst->getOpcode()))<br>
-      return false;<br>
+  bool ShouldKeepOrder = true;<br>
+  unsigned E = VL.size();<br>
+  // Assign to all items the initial value E + 1 so we can check if the extract<br>
+  // instruction index was used already.<br>
+  // Also, later we can check that all the indices are used and we have a<br>
+  // consecutive access in the extract instructions, by checking that no<br>
+  // element of CurrentOrder still has value E + 1.<br>
+  CurrentOrder.assign(E, E + 1);<br>
+  unsigned I = 0;<br>
+  for (; I < E; ++I) {<br>
+    auto *Inst = cast<Instruction>(VL[I]);<br>
     if (Inst->getOperand(0) != Vec)<br>
-      return false;<br>
+      break;<br>
+    Optional<unsigned> Idx = getExtractIndex(Inst);<br>
+    if (!Idx)<br>
+      break;<br>
+    const unsigned ExtIdx = *Idx;<br>
+    if (ExtIdx != I) {<br>
+      if (ExtIdx >= E || CurrentOrder[ExtIdx] != E + 1)<br>
+        break;<br>
+      ShouldKeepOrder = false;<br>
+      CurrentOrder[ExtIdx] = I;<br>
+    } else {<br>
+      if (CurrentOrder[I] != E + 1)<br>
+        break;<br>
+      CurrentOrder[I] = I;<br>
+    }<br>
+  }<br>
+  if (I < E) {<br>
+    CurrentOrder.clear();<br>
+    return false;<br>
   }<br>
<br>
-  return true;<br>
+  return ShouldKeepOrder;<br>
 }<br>
<br>
 bool BoUpSLP::areAllUsersVectorized(Instruction *I) const {<br>
@@ -2082,8 +2179,13 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br>
               TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, Idx);<br>
         }<br>
       }<br>
-      if (canReuseExtract(VL, S.OpValue)) {<br>
+      if (!E->NeedToGather) {<br>
         int DeadCost = ReuseShuffleCost;<br>
+        if (!E->ReorderIndices.empty()) {<br>
+          // TODO: Merge this shuffle with the ReuseShuffleCost.<br>
+          DeadCost += TTI->getShuffleCost(<br>
+              TargetTransformInfo::SK_PermuteSingleSrc, VecTy);<br>
+        }<br>
         for (unsigned i = 0, e = VL.size(); i < e; ++i) {<br>
           Instruction *E = cast<Instruction>(VL[i]);<br>
           // If all users are going to be vectorized, instruction can be<br>
@@ -2246,7 +2348,8 @@ int BoUpSLP::getEntryCost(TreeEntry *E)<br>
           TTI->getMemoryOpCost(Instruction::Load, ScalarTy, alignment, 0, VL0);<br>
       int VecLdCost = TTI->getMemoryOpCost(Instruction::Load,<br>
                                            VecTy, alignment, 0, VL0);<br>
-      if (!isConsecutiveAccess(VL[0], VL[1], *DL, *SE)) {<br>
+      if (!E->ReorderIndices.empty()) {<br>
+        // TODO: Merge this shuffle with the ReuseShuffleCost.<br>
         VecLdCost += TTI->getShuffleCost(<br>
             TargetTransformInfo::SK_PermuteSingleSrc, VecTy);<br>
       }<br>
@@ -2944,6 +3047,15 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<V<br>
   return V;<br>
 }<br>
<br>
+static void inversePermutation(ArrayRef<unsigned> Indices,<br>
+                               SmallVectorImpl<unsigned> &Mask) {<br>
+  Mask.clear();<br>
+  const unsigned E = Indices.size();<br>
+  Mask.resize(E);<br>
+  for (unsigned I = 0; I < E; ++I)<br>
+    Mask[Indices[I]] = I;<br>
+}<br>
+<br>
 Value *BoUpSLP::vectorizeTree(TreeEntry *E) {<br>
   IRBuilder<>::InsertPointGuard Guard(Builder);<br>
<br>
@@ -3020,10 +3132,19 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br>
     }<br>
<br>
     case Instruction::ExtractElement: {<br>
-      if (canReuseExtract(E->Scalars, VL0)) {<br>
+      if (!E->NeedToGather) {<br>
         Value *V = VL0->getOperand(0);<br>
-        if (NeedToShuffleReuses) {<br>
+        if (!E->ReorderIndices.empty()) {<br>
+          OrdersType Mask;<br>
+          inversePermutation(E->ReorderIndices, Mask);<br>
           Builder.SetInsertPoint(VL0);<br>
+          V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy), Mask,<br>
+                                          "reorder_shuffle");<br>
+        }<br>
+        if (NeedToShuffleReuses) {<br>
+          // TODO: Merge this shuffle with the ReorderShuffleMask.<br>
+          if (!E->ReorderIndices.empty())<br>
+            Builder.SetInsertPoint(VL0);<br>
           V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),<br>
                                           E->ReuseShuffleIndices, "shuffle");<br>
         }<br>
@@ -3044,14 +3165,21 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br>
       return V;<br>
     }<br>
     case Instruction::ExtractValue: {<br>
-      if (canReuseExtract(E->Scalars, VL0)) {<br>
+      if (!E->NeedToGather) {<br>
         LoadInst *LI = cast<LoadInst>(VL0->getOperand(0));<br>
         Builder.SetInsertPoint(LI);<br>
         PointerType *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());<br>
         Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);<br>
         LoadInst *V = Builder.CreateAlignedLoad(Ptr, LI->getAlignment());<br>
         Value *NewV = propagateMetadata(V, E->Scalars);<br>
+        if (!E->ReorderIndices.empty()) {<br>
+          OrdersType Mask;<br>
+          inversePermutation(E->ReorderIndices, Mask);<br>
+          NewV = Builder.CreateShuffleVector(NewV, UndefValue::get(VecTy), Mask,<br>
+                                             "reorder_shuffle");<br>
+        }<br>
         if (NeedToShuffleReuses) {<br>
+          // TODO: Merge this shuffle with the ReorderShuffleMask.<br>
           NewV = Builder.CreateShuffleVector(<br>
               NewV, UndefValue::get(VecTy), E->ReuseShuffleIndices, "shuffle");<br>
         }<br>
@@ -3225,10 +3353,9 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br>
     case Instruction::Load: {<br>
       // Loads are inserted at the head of the tree because we don't want to<br>
       // sink them all the way down past store instructions.<br>
-      bool IsReversed =<br>
-          !isConsecutiveAccess(E->Scalars[0], E->Scalars[1], *DL, *SE);<br>
-      if (IsReversed)<br>
-        VL0 = cast<Instruction>(E->Scalars.back());<br>
+      bool IsReorder = !E->ReorderIndices.empty();<br>
+      if (IsReorder)<br>
+        VL0 = cast<Instruction>(E->Scalars[E->ReorderIndices.front()]);<br>
       setInsertPointAfterBundle(E->Scalars, VL0);<br>
<br>
       LoadInst *LI = cast<LoadInst>(VL0);<br>
@@ -3252,12 +3379,14 @@ Value *BoUpSLP::vectorizeTree(TreeEntry<br>
       }<br>
       LI->setAlignment(Alignment);<br>
       Value *V = propagateMetadata(LI, E->Scalars);<br>
-      if (IsReversed) {<br>
-        SmallVector<uint32_t, 4> Mask(E->Scalars.size());<br>
-        std::iota(Mask.rbegin(), Mask.rend(), 0);<br>
-        V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()), Mask);<br>
+      if (IsReorder) {<br>
+        OrdersType Mask;<br>
+        inversePermutation(E->ReorderIndices, Mask);<br>
+        V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()),<br>
+                                        Mask, "reorder_shuffle");<br>
       }<br>
       if (NeedToShuffleReuses) {<br>
+        // TODO: Merge this shuffle with the ReorderShuffleMask.<br>
         V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),<br>
                                         E->ReuseShuffleIndices, "shuffle");<br>
       }<br>
@@ -4836,8 +4965,10 @@ bool SLPVectorizerPass::tryToVectorizeLi<br>
       ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);<br>
<br>
       R.buildTree(Ops);<br>
+      Optional<ArrayRef<unsigned>> Order = R.bestOrder();<br>
       // TODO: check if we can allow reordering for more cases.<br>
-      if (AllowReorder && R.shouldReorder()) {<br>
+      if (AllowReorder && Order) {<br>
+        // TODO: reorder tree nodes without tree rebuilding.<br>
         // Conceptually, there is nothing actually preventing us from trying to<br>
         // reorder a larger list. In fact, we do exactly this when vectorizing<br>
         // reductions. However, at this point, we only expect to get here when<br>
@@ -5583,9 +5714,13 @@ public:<br>
     while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {<br>
       auto VL = makeArrayRef(&ReducedVals[i], ReduxWidth);<br>
       V.buildTree(VL, ExternallyUsedValues, IgnoreList);<br>
-      if (V.shouldReorder()) {<br>
-        SmallVector<Value *, 8> Reversed(VL.rbegin(), VL.rend());<br>
-        V.buildTree(Reversed, ExternallyUsedValues, IgnoreList);<br>
+      Optional<ArrayRef<unsigned>> Order = V.bestOrder();<br>
+      if (Order) {<br>
+        // TODO: reorder tree nodes without tree rebuilding.<br>
+        SmallVector<Value *, 4> ReorderedOps(VL.size());<br>
+        llvm::transform(*Order, ReorderedOps.begin(),<br>
+                        [VL](const unsigned Idx) { return VL[Idx]; });<br>
+        V.buildTree(ReorderedOps, ExternallyUsedValues, IgnoreList);<br>
       }<br>
       if (V.isTreeTinyAndNotFullyVectorizable())<br>
         break;<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fexternal_user_jumbled_load.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=rl2mPrFwOGqFpoO4E6C4IILK4nTOcUdfAQJR2eJB6Ik%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll Mon Apr  2 07:51:37 2018<br>
@@ -10,15 +10,16 @@ define void @hoge(i64 %idx, <4 x i32>* %<br>
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]],
<a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B6%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=v33lZYTwn7UkUQSkulV2%2BlX%2BuvALWn3hoirYXKZE%2BdI%3D&reserved=0" target="_blank">
i64 6</a><br>
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]], i64 7<br>
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]], i64 8<br>
-; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[TMP1]] to <2 x i32>*<br>
-; CHECK-NEXT:    [[TMP5:%.*]] = load <2 x i32>, <2 x i32>* [[TMP4]], align 4<br>
-; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[TMP0]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]], align 4<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0><br>
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 0<br>
 ; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0<br>
-; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1<br>
+; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 1<br>
 ; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1<br>
-; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[TMP3]], align 4<br>
+; CHECK-NEXT:    [[TMP10:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 2<br>
 ; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP10]], i32 2<br>
-; CHECK-NEXT:    [[TMP12:%.*]] = load i32, i32* [[TMP0]], align 4<br>
+; CHECK-NEXT:    [[TMP12:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 3<br>
 ; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP12]], i32 3<br>
 ; CHECK-NEXT:    store <4 x i32> [[TMP13]], <4 x i32>* [[SINK:%.*]]<br>
 ; CHECK-NEXT:    ret void<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fextract.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=bV1fP8LXc%2FPNvnN5CR921Tz%2B1LPStDKoBxt46frv1mE%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll Mon Apr  2 07:51:37 2018<br>
@@ -30,14 +30,11 @@ define void @fextr1(double* %ptr) {<br>
 ; CHECK-LABEL: @fextr1(<br>
 ; CHECK-NEXT:  entry:<br>
 ; CHECK-NEXT:    [[LD:%.*]] = load <2 x double>, <2 x double>* undef<br>
-; CHECK-NEXT:    [[V0:%.*]] = extractelement <2 x double> [[LD]], i32 0<br>
-; CHECK-NEXT:    [[V1:%.*]] = extractelement <2 x double> [[LD]], i32 1<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x double> [[LD]], <2 x double> undef, <2 x i32> <i32 1, i32 0><br>
 ; CHECK-NEXT:    [[P1:%.*]] = getelementptr inbounds double, double* [[PTR:%.*]], i64 0<br>
-; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <2 x double> undef, double [[V1]], i32 0<br>
-; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V0]], i32 1<br>
-; CHECK-NEXT:    [[TMP2:%.*]] = fadd <2 x double> <double 3.400000e+00, double 1.200000e+00>, [[TMP1]]<br>
-; CHECK-NEXT:    [[TMP3:%.*]] = bitcast double* [[P1]] to <2 x double>*<br>
-; CHECK-NEXT:    store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 4<br>
+; CHECK-NEXT:    [[TMP0:%.*]] = fadd <2 x double> <double 3.400000e+00, double 1.200000e+00>, [[REORDER_SHUFFLE]]<br>
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast double* [[P1]] to <2 x double>*<br>
+; CHECK-NEXT:    store <2 x double> [[TMP0]], <2 x double>* [[TMP1]], align 4<br>
 ; CHECK-NEXT:    ret void<br>
 ;<br>
 entry:<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-multiuse.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=8XMOu0xXV8jSYkO7kgWzzNidOS2PYWv%2BgOc3bDPQSVI%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll Mon Apr  2 07:51:37 2018<br>
@@ -11,21 +11,16 @@<br>
     define i32 @fn1() {<br>
 ; CHECK-LABEL: @fn1(<br>
 ; CHECK-NEXT:  entry:<br>
-; CHECK-NEXT:    [[TMP0:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4<br>
-; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 1) to <2 x i32>*), align 4<br>
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 3), align 4<br>
-; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0<br>
-; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP3]], i32 0<br>
-; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1<br>
-; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1<br>
-; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP2]], i32 2<br>
-; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP0]], i32 3<br>
-; CHECK-NEXT:    [[TMP9:%.*]] = icmp sgt <4 x i32> [[TMP8]], zeroinitializer<br>
-; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 1<br>
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 2<br>
-; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 8, i32 3<br>
-; CHECK-NEXT:    [[TMP13:%.*]] = select <4 x i1> [[TMP9]], <4 x i32> [[TMP12]], <4 x i32> <i32 6, i32 0, i32 0, i32 0><br>
-; CHECK-NEXT:    store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4<br>
+; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* bitcast ([4 x i32]* @b to <4 x i32>*), align 4<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0><br>
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp sgt <4 x i32> [[REORDER_SHUFFLE]], zeroinitializer<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 0<br>
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2]], i32 0<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 1<br>
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 2<br>
+; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 8, i32 3<br>
+; CHECK-NEXT:    [[TMP7:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP6]], <4 x i32> <i32 6, i32 0, i32 0, i32 0><br>
+; CHECK-NEXT:    store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4<br>
 ; CHECK-NEXT:    ret i32 0<br>
 ;<br>
   entry:<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-shuffle-placement.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=fwMEz9oSB29s2M6HUQ7SF9J0cIZsl7eDkpZTQTJQ%2BGQ%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll Mon Apr  2 07:51:37 2018<br>
@@ -21,28 +21,21 @@<br>
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 10<br>
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 11<br>
 ; CHECK-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=AoZcLfcgO4ecBPvV8yQ6pqzfkfZ8NTuwmz3oLw4ipOI%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[A]] to <2 x i32>*<br>
-; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B12%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=S62a0wXMjld0um2qOX7HC0pCmdWzCvYkmDJkHAs9jUQ%3D&reserved=0" target="_blank">
i64 12</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=olwmtTeQ7dyLHt%2BhtvB5GlsUhkEpM0niCX9dxoj7x9o%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX6]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://maps.google.com/?q=i64+1&entry=gmail&source=g">i64 1</a>3<br>
-; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br>
-; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]], align 4<br>
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=wejUBfkbY93mzs8%2F3ndKDRWu20lA%2BVn2TvhnsKjhk10%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX9]], align 4<br>
-; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0<br>
-; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0<br>
-; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1<br>
-; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1<br>
-; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP2]], i32 2<br>
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP5]], i32 3<br>
-; CHECK-NEXT:    [[TMP12:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP11]]<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[A]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, <4 x i32>* [[TMP2]], align 4<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
+; CHECK-NEXT:    [[TMP5:%.*]] = mul nsw <4 x i32> [[TMP1]], [[TMP4]]<br>
 ; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]],
<a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=AoZcLfcgO4ecBPvV8yQ6pqzfkfZ8NTuwmz3oLw4ipOI%3D&reserved=0" target="_blank">
i64 1</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=wejUBfkbY93mzs8%2F3ndKDRWu20lA%2BVn2TvhnsKjhk10%3D&reserved=0" target="_blank">
i64 2</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[TMP13:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
-; CHECK-NEXT:    store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4<br>
+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
+; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4<br>
 ; CHECK-NEXT:    ret void<br>
 ;<br>
 entry:<br>
@@ -83,28 +76,21 @@ entry:<br>
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 10<br>
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 11<br>
 ; CHECK-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[A]] to <2 x i32>*<br>
-; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B12%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=apewEPIwdtZClt%2B1lJlxH2%2BT4OFOc3ThP798o7hl6cc%3D&reserved=0" target="_blank">
i64 12</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX6]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://maps.google.com/?q=i64+1&entry=gmail&source=g">i64 1</a>3<br>
-; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br>
-; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]], align 4<br>
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]], align 4<br>
 ; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds i32, i32* [[A]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX9]], align 4<br>
-; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0<br>
-; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0<br>
-; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1<br>
-; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1<br>
-; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP2]], i32 2<br>
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP5]], i32 3<br>
-; CHECK-NEXT:    [[TMP12:%.*]] = mul nsw <4 x i32> [[TMP11]], [[TMP4]]<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[A]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, <4 x i32>* [[TMP2]], align 4<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
+; CHECK-NEXT:    [[TMP5:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP1]]<br>
 ; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]],
<a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0" target="_blank">
i64 1</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0" target="_blank">
i64 2</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[TMP13:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
-; CHECK-NEXT:    store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align 4<br>
+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
+; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4<br>
 ; CHECK-NEXT:    ret void<br>
 ;<br>
 entry:<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-used-in-phi.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=WE%2BNqAsVbLBFYjU56aW9dvl4n%2BBgIAd3XqLUZvPmMGg%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll Mon Apr  2 07:51:37 2018<br>
@@ -48,11 +48,11 @@ define void @phiUsingLoads(i32* noalias<br>
 ; CHECK-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0" target="_blank">
i64 2</a><br>
 ; CHECK-NEXT:    [[ARRAYIDX66:%.*]] = getelementptr inbounds i32, i32* [[B]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0" target="_blank">
i64 3</a><br>
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[B]] to <4 x i32>*<br>
-; CHECK-NEXT:    store <4 x i32> [[TMP34:%.*]], <4 x i32>* [[TMP1]], align 4<br>
+; CHECK-NEXT:    store <4 x i32> [[TMP27:%.*]], <4 x i32>* [[TMP1]], align 4<br>
 ; CHECK-NEXT:    ret void<br>
 ; CHECK:       for.body:<br>
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_INC:%.*]] ]<br>
-; CHECK-NEXT:    [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [ [[TMP34]], [[FOR_INC]] ]<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [ [[TMP27]], [[FOR_INC]] ]<br>
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]<br>
 ; CHECK:       if.then:<br>
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV]]<br>
@@ -103,23 +103,16 @@ define void @phiUsingLoads(i32* noalias<br>
 ; CHECK-NEXT:    [[ARRAYIDX49:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV]]<br>
 ; CHECK-NEXT:    [[TMP21:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1<br>
 ; CHECK-NEXT:    [[ARRAYIDX52:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP21]]<br>
-; CHECK-NEXT:    [[TMP22:%.*]] = bitcast i32* [[ARRAYIDX49]] to <2 x i32>*<br>
-; CHECK-NEXT:    [[TMP23:%.*]] = load <2 x i32>, <2 x i32>* [[TMP22]], align 4<br>
-; CHECK-NEXT:    [[TMP24:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3<br>
-; CHECK-NEXT:    [[ARRAYIDX55:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP24]]<br>
-; CHECK-NEXT:    [[TMP25:%.*]] = load i32, i32* [[ARRAYIDX55]], align 4<br>
-; CHECK-NEXT:    [[TMP26:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2<br>
-; CHECK-NEXT:    [[ARRAYIDX58:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP26]]<br>
-; CHECK-NEXT:    [[TMP27:%.*]] = load i32, i32* [[ARRAYIDX58]], align 4<br>
-; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <2 x i32> [[TMP23]], i32 0<br>
-; CHECK-NEXT:    [[TMP29:%.*]] = insertelement <4 x i32> undef, i32 [[TMP28]], i32 0<br>
-; CHECK-NEXT:    [[TMP30:%.*]] = extractelement <2 x i32> [[TMP23]], i32 1<br>
-; CHECK-NEXT:    [[TMP31:%.*]] = insertelement <4 x i32> [[TMP29]], i32 [[TMP30]], i32 1<br>
-; CHECK-NEXT:    [[TMP32:%.*]] = insertelement <4 x i32> [[TMP31]], i32 [[TMP25]], i32 2<br>
-; CHECK-NEXT:    [[TMP33:%.*]] = insertelement <4 x i32> [[TMP32]], i32 [[TMP27]], i32 3<br>
+; CHECK-NEXT:    [[TMP22:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3<br>
+; CHECK-NEXT:    [[ARRAYIDX55:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP22]]<br>
+; CHECK-NEXT:    [[TMP23:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2<br>
+; CHECK-NEXT:    [[ARRAYIDX58:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP23]]<br>
+; CHECK-NEXT:    [[TMP24:%.*]] = bitcast i32* [[ARRAYIDX49]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP25:%.*]] = load <4 x i32>, <4 x i32>* [[TMP24]], align 4<br>
+; CHECK-NEXT:    [[TMP26:%.*]] = shufflevector <4 x i32> [[TMP25]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
 ; CHECK-NEXT:    br label [[FOR_INC]]<br>
 ; CHECK:       for.inc:<br>
-; CHECK-NEXT:    [[TMP34]] = phi <4 x i32> [ [[TMP7]], [[IF_THEN]] ], [ [[TMP13]], [[IF_THEN14]] ], [ [[TMP19]], [[IF_THEN30]] ], [ [[TMP33]], [[IF_THEN46]] ], [ [[TMP2]], [[IF_ELSE43]] ]<br>
+; CHECK-NEXT:    [[TMP27]] = phi <4 x i32> [ [[TMP7]], [[IF_THEN]] ], [ [[TMP13]], [[IF_THEN14]] ], [ [[TMP19]], [[IF_THEN30]] ], [ [[TMP26]], [[IF_THEN46]] ], [ [[TMP2]], [[IF_ELSE43]] ]<br>
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1<br>
 ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100<br>
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]<br>
<br>
Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll<br>
URL: <a href="https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=DV4dtudqqCgMsQ2oDjmymL2O5ZAqEcFYqYSA7Kk5ZVY%3D&reserved=0" rel="noreferrer" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll?rev=328980&r1=328979&r2=328980&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll (original)<br>
+++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll Mon Apr  2 07:51:37 2018<br>
@@ -6,33 +6,26 @@<br>
 define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {<br>
 ; CHECK-LABEL: @jumbled-load(<br>
 ; CHECK-NEXT:    [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[IN:%.*]], i64 0<br>
-; CHECK-NEXT:    [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4<br>
 ; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4<br>
 ; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4<br>
 ; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4<br>
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0><br>
 ; CHECK-NEXT:    [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0<br>
-; CHECK-NEXT:    [[LOAD_5:%.*]] = load i32, i32* [[INN_ADDR]], align 4<br>
 ; CHECK-NEXT:    [[GEP_4:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    [[LOAD_6:%.*]] = load i32, i32* [[GEP_4]], align 4<br>
 ; CHECK-NEXT:    [[GEP_5:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[LOAD_7:%.*]] = load i32, i32* [[GEP_5]], align 4<br>
 ; CHECK-NEXT:    [[GEP_6:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    [[LOAD_8:%.*]] = load i32, i32* [[GEP_6]], align 4<br>
-; CHECK-NEXT:    [[MUL_1:%.*]] = mul i32 [[LOAD_3]], [[LOAD_5]]<br>
-; CHECK-NEXT:    [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_8]]<br>
-; CHECK-NEXT:    [[MUL_3:%.*]] = mul i32 [[LOAD_4]], [[LOAD_7]]<br>
-; CHECK-NEXT:    [[MUL_4:%.*]] = mul i32 [[LOAD_1]], [[LOAD_6]]<br>
+; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[INN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]], align 4<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br>
+; CHECK-NEXT:    [[TMP5:%.*]] = mul <4 x i32> [[REORDER_SHUFFLE]], [[REORDER_SHUFFLE1]]<br>
 ; CHECK-NEXT:    [[GEP_7:%.*]] = getelementptr inbounds i32, i32* [[OUT:%.*]], i64 0<br>
-; CHECK-NEXT:    store i32 [[MUL_1]], i32* [[GEP_7]], align 4<br>
 ; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    store i32 [[MUL_2]], i32* [[GEP_8]], align 4<br>
 ; CHECK-NEXT:    [[GEP_9:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    store i32 [[MUL_3]], i32* [[GEP_9]], align 4<br>
 ; CHECK-NEXT:    [[GEP_10:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    store i32 [[MUL_4]], i32* [[GEP_10]], align 4<br>
+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[GEP_7]] to <4 x i32>*<br>
+; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4<br>
 ; CHECK-NEXT:    ret i32 undef<br>
 ;<br>
   %in.addr = getelementptr inbounds i32, i32* %in, i64 0<br>
@@ -71,25 +64,27 @@ define i32 @jumbled-load(i32* noalias no<br>
 define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {<br>
 ; CHECK-LABEL: @jumbled-load-multiuses(<br>
 ; CHECK-NEXT:    [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[IN:%.*]], i64 0<br>
-; CHECK-NEXT:    [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4<br>
 ; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0" target="_blank">
i64 3</a><br>
-; CHECK-NEXT:    [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4<br>
 ; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4<br>
 ; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4<br>
-; CHECK-NEXT:    [[MUL_1:%.*]] = mul i32 [[LOAD_3]], [[LOAD_4]]<br>
-; CHECK-NEXT:    [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_2]]<br>
-; CHECK-NEXT:    [[MUL_3:%.*]] = mul i32 [[LOAD_4]], [[LOAD_1]]<br>
-; CHECK-NEXT:    [[MUL_4:%.*]] = mul i32 [[LOAD_1]], [[LOAD_3]]<br>
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*<br>
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br>
+; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0><br>
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 2<br>
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP3]], i32 0<br>
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 1<br>
+; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1<br>
+; CHECK-NEXT:    [[TMP7:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 3<br>
+; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2<br>
+; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <4 x i32> [[REORDER_SHUFFLE]], i32 0<br>
+; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3<br>
+; CHECK-NEXT:    [[TMP11:%.*]] = mul <4 x i32> [[REORDER_SHUFFLE]], [[TMP10]]<br>
 ; CHECK-NEXT:    [[GEP_7:%.*]] = getelementptr inbounds i32, i32* [[OUT:%.*]], i64 0<br>
-; CHECK-NEXT:    store i32 [[MUL_1]], i32* [[GEP_7]], align 4<br>
 ; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0" target="_blank">
i64 1</a><br>
-; CHECK-NEXT:    store i32 [[MUL_2]], i32* [[GEP_8]], align 4<br>
 ; CHECK-NEXT:    [[GEP_9:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0" target="_blank">
i64 2</a><br>
-; CHECK-NEXT:    store i32 [[MUL_3]], i32* [[GEP_9]], align 4<br>
 ; CHECK-NEXT:    [[GEP_10:%.*]] = getelementptr inbounds i32, i32* [[OUT]], <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0" target="_blank"></a></blockquote></div></blockquote></div></div></blockquote></div></blockquote></div>