[llvm] r328980 - [SLP] Fix PR36481: vectorize reassociated instructions.

Chandler Carruth via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 3 03:41:42 PDT 2018


NP, and awesome work on the SLP vectorizer!

On Tue, Apr 3, 2018 at 3:33 AM Alexey Bataev <a.bataev at hotmail.com> wrote:

> Chandler, thanks for the fix and sorry for the mess.
>
> Best regards,
> Alexey Bataev
>
> 3 апр. 2018 г., в 1:30, Chandler Carruth <chandlerc at gmail.com> написал(а):
>
> FIxed this an some other issues in r329046.
>
> On Mon, Apr 2, 2018 at 8:13 PM Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>> This appears to print to stderr unconditionally in !NDEBUG builds? =[
>> It's causing lots of full-screen for me.
>>
>> If you or someone else don't get to fixing this soon, I guess I will.
>>
>> On Mon, Apr 2, 2018 at 7:54 AM Alexey Bataev via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: abataev
>>> Date: Mon Apr  2 07:51:37 2018
>>> New Revision: 328980
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=328980&view=rev
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%3Frev%3D328980%26view%3Drev&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524320558&sdata=slt6Hirrbh%2FCqG8hQKMljZYzASU1BrW5ZEBVpMJbvGg%3D&reserved=0>
>>> Log:
>>> [SLP] Fix PR36481: vectorize reassociated instructions.
>>>
>>> Summary:
>>> If the load/extractelement/extractvalue instructions are not originally
>>> consecutive, the SLP vectorizer is unable to vectorize them. Patch
>>> allows reordering of such instructions.
>>>
>>> Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
>>>
>>> Subscribers: llvm-commits
>>>
>>> Differential Revision: https://reviews.llvm.org/D43776
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD43776&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=CUfJYrmzjl%2FB2LgPMv8e%2FBticQTGTlN3CZ8iLUEDtYw%3D&reserved=0>
>>>
>>> Modified:
>>>     llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h
>>>     llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp
>>>     llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll
>>>
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
>>>
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll
>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/store-jumbled.ll
>>>
>>> Modified: llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Finclude%2Fllvm%2FAnalysis%2FLoopAccessAnalysis.h%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=3C8okehop%2FZhyIbBnqfGkWbvReOl0Sg0r118ENjstQk%3D&reserved=0>
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h (original)
>>> +++ llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h Mon Apr  2
>>> 07:51:37 2018
>>> @@ -667,6 +667,20 @@ int64_t getPtrStride(PredicatedScalarEvo
>>>                       const ValueToValueMap &StridesMap =
>>> ValueToValueMap(),
>>>                       bool Assume = false, bool ShouldCheckWrap = true);
>>>
>>> +/// \brief Attempt to sort the pointers in \p VL and return the sorted
>>> indices
>>> +/// in \p SortedIndices, if reordering is required.
>>> +///
>>> +/// Returns 'true' if sorting is legal, otherwise returns 'false'.
>>> +///
>>> +/// For example, for a given \p VL of memory accesses in program order,
>>> a[i+4],
>>> +/// a[i+0], a[i+1] and a[i+7], this function will sort the \p VL and
>>> save the
>>> +/// sorted indices in \p SortedIndices as a[i+0], a[i+1], a[i+4],
>>> a[i+7] and
>>> +/// saves the mask for actual memory accesses in program order in
>>> +/// \p SortedIndices as <1,2,0,3>
>>> +bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
>>> +                     ScalarEvolution &SE,
>>> +                     SmallVectorImpl<unsigned> &SortedIndices);
>>> +
>>>  /// \brief Returns true if the memory operations \p A and \p B are
>>> consecutive.
>>>  /// This is a simple API that does not depend on the analysis pass.
>>>  bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
>>>
>>> Modified: llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Flib%2FAnalysis%2FLoopAccessAnalysis.cpp%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=3IHWBFxbCEJJF%2BfiUXPgxWTviVVZNoW8QzacBu1sdiw%3D&reserved=0>
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp (original)
>>> +++ llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp Mon Apr  2 07:51:37
>>> 2018
>>> @@ -1087,6 +1087,67 @@ int64_t llvm::getPtrStride(PredicatedSca
>>>    return Stride;
>>>  }
>>>
>>> +bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
>>> +                           ScalarEvolution &SE,
>>> +                           SmallVectorImpl<unsigned> &SortedIndices) {
>>> +  assert(llvm::all_of(
>>> +             VL, [](const Value *V) { return
>>> V->getType()->isPointerTy(); }) &&
>>> +         "Expected list of pointer operands.");
>>> +  SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
>>> +  OffValPairs.reserve(VL.size());
>>> +
>>> +  // Walk over the pointers, and map each of them to an offset relative
>>> to
>>> +  // first pointer in the array.
>>> +  Value *Ptr0 = VL[0];
>>> +  const SCEV *Scev0 = SE.getSCEV(Ptr0);
>>> +  Value *Obj0 = GetUnderlyingObject(Ptr0, DL);
>>> +
>>> +  llvm::SmallSet<int64_t, 4> Offsets;
>>> +  for (auto *Ptr : VL) {
>>> +    // TODO: Outline this code as a special, more time consuming,
>>> version of
>>> +    // computeConstantDifference() function.
>>> +    if (Ptr->getType()->getPointerAddressSpace() !=
>>> +        Ptr0->getType()->getPointerAddressSpace())
>>> +      return false;
>>> +    // If a pointer refers to a different underlying object, bail - the
>>> +    // pointers are by definition incomparable.
>>> +    Value *CurrObj = GetUnderlyingObject(Ptr, DL);
>>> +    if (CurrObj != Obj0)
>>> +      return false;
>>> +
>>> +    const SCEV *Scev = SE.getSCEV(Ptr);
>>> +    const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev,
>>> Scev0));
>>> +    // The pointers may not have a constant offset from each other, or
>>> SCEV
>>> +    // may just not be smart enough to figure out they do. Regardless,
>>> +    // there's nothing we can do.
>>> +    if (!Diff)
>>> +      return false;
>>> +
>>> +    // Check if the pointer with the same offset is found.
>>> +    int64_t Offset = Diff->getAPInt().getSExtValue();
>>> +    if (!Offsets.insert(Offset).second)
>>> +      return false;
>>> +    OffValPairs.emplace_back(Offset, Ptr);
>>> +  }
>>> +  SortedIndices.clear();
>>> +  SortedIndices.resize(VL.size());
>>> +  std::iota(SortedIndices.begin(), SortedIndices.end(), 0);
>>> +
>>> +  // Sort the memory accesses and keep the order of their uses in
>>> UseOrder.
>>> +  std::stable_sort(SortedIndices.begin(), SortedIndices.end(),
>>> +                   [&OffValPairs](unsigned Left, unsigned Right) {
>>> +                     return OffValPairs[Left].first <
>>> OffValPairs[Right].first;
>>> +                   });
>>> +
>>> +  // Check if the order is consecutive already.
>>> +  if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) {
>>> +        return I == SortedIndices[I];
>>> +      }))
>>> +    SortedIndices.clear();
>>> +
>>> +  return true;
>>> +}
>>> +
>>>  /// Take the address space operand from the Load/Store instruction.
>>>  /// Returns -1 if this is not a valid Load/Store instruction.
>>>  static unsigned getAddressSpaceOperand(Value *I) {
>>>
>>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Flib%2FTransforms%2FVectorize%2FSLPVectorizer.cpp%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=iE1yi%2BTrzO6ceKblu2mZjGVZ0KQ2FBt1ML%2BrHAukEy0%3D&reserved=0>
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Mon Apr  2
>>> 07:51:37 2018
>>> @@ -452,16 +452,21 @@ static bool allSameType(ArrayRef<Value *
>>>  }
>>>
>>>  /// \returns True if Extract{Value,Element} instruction extracts
>>> element Idx.
>>> -static bool matchExtractIndex(Instruction *E, unsigned Idx, unsigned
>>> Opcode) {
>>> -  assert(Opcode == Instruction::ExtractElement ||
>>> -         Opcode == Instruction::ExtractValue);
>>> +static Optional<unsigned> getExtractIndex(Instruction *E) {
>>> +  unsigned Opcode = E->getOpcode();
>>> +  assert((Opcode == Instruction::ExtractElement ||
>>> +          Opcode == Instruction::ExtractValue) &&
>>> +         "Expected extractelement or extractvalue instruction.");
>>>    if (Opcode == Instruction::ExtractElement) {
>>> -    ConstantInt *CI = dyn_cast<ConstantInt>(E->getOperand(1));
>>> -    return CI && CI->getZExtValue() == Idx;
>>> -  } else {
>>> -    ExtractValueInst *EI = cast<ExtractValueInst>(E);
>>> -    return EI->getNumIndices() == 1 && *EI->idx_begin() == Idx;
>>> -  }
>>> +    auto *CI = dyn_cast<ConstantInt>(E->getOperand(1));
>>> +    if (!CI)
>>> +      return None;
>>> +    return CI->getZExtValue();
>>> +  }
>>> +  ExtractValueInst *EI = cast<ExtractValueInst>(E);
>>> +  if (EI->getNumIndices() != 1)
>>> +    return None;
>>> +  return *EI->idx_begin();
>>>  }
>>>
>>>  /// \returns True if in-tree use also needs extract. This refers to
>>> @@ -586,6 +591,7 @@ public:
>>>      MustGather.clear();
>>>      ExternalUses.clear();
>>>      NumOpsWantToKeepOrder.clear();
>>> +    NumOpsWantToKeepOriginalOrder = 0;
>>>      for (auto &Iter : BlocksSchedules) {
>>>        BlockScheduling *BS = Iter.second.get();
>>>        BS->clear();
>>> @@ -598,14 +604,18 @@ public:
>>>    /// \brief Perform LICM and CSE on the newly generated gather
>>> sequences.
>>>    void optimizeGatherSequence();
>>>
>>> -  /// \returns true if it is beneficial to reverse the vector order.
>>> -  bool shouldReorder() const {
>>> -    return std::accumulate(
>>> -               NumOpsWantToKeepOrder.begin(),
>>> NumOpsWantToKeepOrder.end(), 0,
>>> -               [](int Val1,
>>> -                  const decltype(NumOpsWantToKeepOrder)::value_type
>>> &Val2) {
>>> -                 return Val1 + (Val2.second < 0 ? 1 : -1);
>>> -               }) > 0;
>>> +  /// \returns The best order of instructions for vectorization.
>>> +  Optional<ArrayRef<unsigned>> bestOrder() const {
>>> +    auto I = std::max_element(
>>> +        NumOpsWantToKeepOrder.begin(), NumOpsWantToKeepOrder.end(),
>>> +        [](const decltype(NumOpsWantToKeepOrder)::value_type &D1,
>>> +           const decltype(NumOpsWantToKeepOrder)::value_type &D2) {
>>> +          return D1.second < D2.second;
>>> +        });
>>> +    if (I == NumOpsWantToKeepOrder.end() || I->getSecond() <=
>>> NumOpsWantToKeepOriginalOrder)
>>> +      return None;
>>> +
>>> +    return makeArrayRef(I->getFirst());
>>>    }
>>>
>>>    /// \return The vector element size in bits to use when vectorizing
>>> the
>>> @@ -652,9 +662,13 @@ private:
>>>    /// This is the recursive part of buildTree.
>>>    void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);
>>>
>>> -  /// \returns True if the ExtractElement/ExtractValue instructions in
>>> VL can
>>> -  /// be vectorized to use the original vector (or aggregate "bitcast"
>>> to a vector).
>>> -  bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue) const;
>>> +  /// \returns true if the ExtractElement/ExtractValue instructions in
>>> \p VL can
>>> +  /// be vectorized to use the original vector (or aggregate "bitcast"
>>> to a
>>> +  /// vector) and sets \p CurrentOrder to the identity permutation;
>>> otherwise
>>> +  /// returns false, setting \p CurrentOrder to either an empty vector
>>> or a
>>> +  /// non-identity permutation that allows to reuse extract
>>> instructions.
>>> +  bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue,
>>> +                       SmallVectorImpl<unsigned> &CurrentOrder) const;
>>>
>>>    /// Vectorize a single entry in the tree.
>>>    Value *vectorizeTree(TreeEntry *E);
>>> @@ -718,6 +732,9 @@ private:
>>>      /// Does this sequence require some shuffling?
>>>      SmallVector<unsigned, 4> ReuseShuffleIndices;
>>>
>>> +    /// Does this entry require reordering?
>>> +    ArrayRef<unsigned> ReorderIndices;
>>> +
>>>      /// Points back to the VectorizableTree.
>>>      ///
>>>      /// Only used for Graphviz right now.  Unfortunately
>>> GraphTrait::NodeRef has
>>> @@ -733,7 +750,8 @@ private:
>>>
>>>    /// Create a new VectorizableTree entry.
>>>    void newTreeEntry(ArrayRef<Value *> VL, bool Vectorized, int
>>> &UserTreeIdx,
>>> -                    ArrayRef<unsigned> ReuseShuffleIndices = None) {
>>> +                    ArrayRef<unsigned> ReuseShuffleIndices = None,
>>> +                    ArrayRef<unsigned> ReorderIndices = None) {
>>>      VectorizableTree.emplace_back(VectorizableTree);
>>>      int idx = VectorizableTree.size() - 1;
>>>      TreeEntry *Last = &VectorizableTree[idx];
>>> @@ -741,6 +759,7 @@ private:
>>>      Last->NeedToGather = !Vectorized;
>>>      Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
>>>                                       ReuseShuffleIndices.end());
>>> +    Last->ReorderIndices = ReorderIndices;
>>>      if (Vectorized) {
>>>        for (int i = 0, e = VL.size(); i != e; ++i) {
>>>          assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");
>>> @@ -1202,10 +1221,38 @@ private:
>>>    /// List of users to ignore during scheduling and that don't need
>>> extracting.
>>>    ArrayRef<Value *> UserIgnoreList;
>>>
>>> -  /// Number of operation bundles that contain consecutive operations -
>>> number
>>> -  /// of operation bundles that contain consecutive operations in
>>> reversed
>>> -  /// order.
>>> -  DenseMap<unsigned, int> NumOpsWantToKeepOrder;
>>> +  using OrdersType = SmallVector<unsigned, 4>;
>>> +  /// A DenseMapInfo implementation for holding DenseMaps and DenseSets
>>> of
>>> +  /// sorted SmallVectors of unsigned.
>>> +  struct OrdersTypeDenseMapInfo {
>>> +    static OrdersType getEmptyKey() {
>>> +      OrdersType V;
>>> +      V.push_back(~1U);
>>> +      return V;
>>> +    }
>>> +
>>> +    static OrdersType getTombstoneKey() {
>>> +      OrdersType V;
>>> +      V.push_back(~2U);
>>> +      return V;
>>> +    }
>>> +
>>> +    static unsigned getHashValue(const OrdersType &V) {
>>> +      return static_cast<unsigned>(hash_combine_range(V.begin(),
>>> V.end()));
>>> +    }
>>> +
>>> +    static bool isEqual(const OrdersType &LHS, const OrdersType &RHS) {
>>> +      return LHS == RHS;
>>> +    }
>>> +  };
>>> +
>>> +  /// Contains orders of operations along with the number of bundles
>>> that have
>>> +  /// operations in this order. It stores only those orders that require
>>> +  /// reordering, if reordering is not required it is counted using \a
>>> +  /// NumOpsWantToKeepOriginalOrder.
>>> +  DenseMap<OrdersType, unsigned, OrdersTypeDenseMapInfo>
>>> NumOpsWantToKeepOrder;
>>> +  /// Number of bundles that do not require reordering.
>>> +  unsigned NumOpsWantToKeepOriginalOrder = 0;
>>>
>>>    // Analysis and block reference.
>>>    Function *F;
>>> @@ -1557,17 +1604,35 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>>>      }
>>>      case Instruction::ExtractValue:
>>>      case Instruction::ExtractElement: {
>>> -      bool Reuse = canReuseExtract(VL, VL0);
>>> +      OrdersType CurrentOrder;
>>> +      bool Reuse = canReuseExtract(VL, VL0, CurrentOrder);
>>>        if (Reuse) {
>>>          DEBUG(dbgs() << "SLP: Reusing or shuffling extract
>>> sequence.\n");
>>> -        ++NumOpsWantToKeepOrder[S.Opcode];
>>> -      } else {
>>> -        SmallVector<Value *, 4> ReverseVL(VL.rbegin(), VL.rend());
>>> -        if (canReuseExtract(ReverseVL, VL0))
>>> -          --NumOpsWantToKeepOrder[S.Opcode];
>>> -        BS.cancelScheduling(VL, VL0);
>>> +        ++NumOpsWantToKeepOriginalOrder;
>>> +        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,
>>> +                     ReuseShuffleIndicies);
>>> +        return;
>>>        }
>>> -      newTreeEntry(VL, Reuse, UserTreeIdx, ReuseShuffleIndicies);
>>> +      if (!CurrentOrder.empty()) {
>>> +#ifndef NDEBUG
>>> +        dbgs() << "SLP: Reusing or shuffling of reordered extract
>>> sequence "
>>> +                  "with order";
>>> +        for (unsigned Idx : CurrentOrder)
>>> +          dbgs() << " " << Idx;
>>> +        dbgs() << "\n";
>>> +#endif // NDEBUG
>>> +        // Insert new order with initial value 0, if it does not exist,
>>> +        // otherwise return the iterator to the existing one.
>>> +        auto StoredCurrentOrderAndNum =
>>> +            NumOpsWantToKeepOrder.try_emplace(CurrentOrder).first;
>>> +        ++StoredCurrentOrderAndNum->getSecond();
>>> +        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,
>>> ReuseShuffleIndicies,
>>> +                     StoredCurrentOrderAndNum->getFirst());
>>> +        return;
>>> +      }
>>> +      DEBUG(dbgs() << "SLP: Gather extract sequence.\n");
>>> +      newTreeEntry(VL, /*Vectorized=*/false, UserTreeIdx,
>>> ReuseShuffleIndicies);
>>> +      BS.cancelScheduling(VL, VL0);
>>>        return;
>>>      }
>>>      case Instruction::Load: {
>>> @@ -1589,51 +1654,55 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>>>
>>>        // Make sure all loads in the bundle are simple - we can't
>>> vectorize
>>>        // atomic or volatile loads.
>>> -      for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
>>> -        LoadInst *L = cast<LoadInst>(VL[i]);
>>> +      SmallVector<Value *, 4> PointerOps(VL.size());
>>> +      auto POIter = PointerOps.begin();
>>> +      for (Value *V : VL) {
>>> +        auto *L = cast<LoadInst>(V);
>>>          if (!L->isSimple()) {
>>>            BS.cancelScheduling(VL, VL0);
>>>            newTreeEntry(VL, false, UserTreeIdx, ReuseShuffleIndicies);
>>>            DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
>>>            return;
>>>          }
>>> +        *POIter = L->getPointerOperand();
>>> +        ++POIter;
>>>        }
>>>
>>> -      // Check if the loads are consecutive, reversed, or neither.
>>> -      // TODO: What we really want is to sort the loads, but for now,
>>> check
>>> -      // the two likely directions.
>>> -      bool Consecutive = true;
>>> -      bool ReverseConsecutive = true;
>>> -      for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
>>> -        if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {
>>> -          Consecutive = false;
>>> -          break;
>>> +      OrdersType CurrentOrder;
>>> +      // Check the order of pointer operands.
>>> +      if (llvm::sortPtrAccesses(PointerOps, *DL, *SE, CurrentOrder)) {
>>> +        Value *Ptr0;
>>> +        Value *PtrN;
>>> +        if (CurrentOrder.empty()) {
>>> +          Ptr0 = PointerOps.front();
>>> +          PtrN = PointerOps.back();
>>>          } else {
>>> -          ReverseConsecutive = false;
>>> +          Ptr0 = PointerOps[CurrentOrder.front()];
>>> +          PtrN = PointerOps[CurrentOrder.back()];
>>>          }
>>> -      }
>>> -
>>> -      if (Consecutive) {
>>> -        ++NumOpsWantToKeepOrder[S.Opcode];
>>> -        newTreeEntry(VL, true, UserTreeIdx, ReuseShuffleIndicies);
>>> -        DEBUG(dbgs() << "SLP: added a vector of loads.\n");
>>> -        return;
>>> -      }
>>> -
>>> -      // If none of the load pairs were consecutive when checked in
>>> order,
>>> -      // check the reverse order.
>>> -      if (ReverseConsecutive)
>>> -        for (unsigned i = VL.size() - 1; i > 0; --i)
>>> -          if (!isConsecutiveAccess(VL[i], VL[i - 1], *DL, *SE)) {
>>> -            ReverseConsecutive = false;
>>> -            break;
>>> +        const SCEV *Scev0 = SE->getSCEV(Ptr0);
>>> +        const SCEV *ScevN = SE->getSCEV(PtrN);
>>> +        const auto *Diff =
>>> +            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
>>> +        uint64_t Size = DL->getTypeAllocSize(ScalarTy);
>>> +        // Check that the sorted loads are consecutive.
>>> +        if (Diff && Diff->getAPInt().getZExtValue() == (VL.size() - 1)
>>> * Size) {
>>> +          if (CurrentOrder.empty()) {
>>> +            // Original loads are consecutive and does not require
>>> reordering.
>>> +            ++NumOpsWantToKeepOriginalOrder;
>>> +            newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,
>>> +                         ReuseShuffleIndicies);
>>> +            DEBUG(dbgs() << "SLP: added a vector of loads.\n");
>>> +          } else {
>>> +            // Need to reorder.
>>> +            auto I =
>>> NumOpsWantToKeepOrder.try_emplace(CurrentOrder).first;
>>> +            ++I->getSecond();
>>> +            newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,
>>> +                         ReuseShuffleIndicies, I->getFirst());
>>> +            DEBUG(dbgs() << "SLP: added a vector of jumbled loads.\n");
>>>            }
>>> -
>>> -      if (ReverseConsecutive) {
>>> -        --NumOpsWantToKeepOrder[S.Opcode];
>>> -        newTreeEntry(VL, true, UserTreeIdx, ReuseShuffleIndicies);
>>> -        DEBUG(dbgs() << "SLP: added a vector of reversed loads.\n");
>>> -        return;
>>> +          return;
>>> +        }
>>>        }
>>>
>>>        DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");
>>> @@ -1944,7 +2013,8 @@ unsigned BoUpSLP::canMapToVector(Type *T
>>>    return N;
>>>  }
>>>
>>> -bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, Value *OpValue)
>>> const {
>>> +bool BoUpSLP::canReuseExtract(ArrayRef<Value *> VL, Value *OpValue,
>>> +                              SmallVectorImpl<unsigned> &CurrentOrder)
>>> const {
>>>    Instruction *E0 = cast<Instruction>(OpValue);
>>>    assert(E0->getOpcode() == Instruction::ExtractElement ||
>>>           E0->getOpcode() == Instruction::ExtractValue);
>>> @@ -1953,6 +2023,8 @@ bool BoUpSLP::canReuseExtract(ArrayRef<V
>>>    // correct offset.
>>>    Value *Vec = E0->getOperand(0);
>>>
>>> +  CurrentOrder.clear();
>>> +
>>>    // We have to extract from a vector/aggregate with the same number of
>>> elements.
>>>    unsigned NElts;
>>>    if (E0->getOpcode() == Instruction::ExtractValue) {
>>> @@ -1972,15 +2044,40 @@ bool BoUpSLP::canReuseExtract(ArrayRef<V
>>>      return false;
>>>
>>>    // Check that all of the indices extract from the correct offset.
>>> -  for (unsigned I = 0, E = VL.size(); I < E; ++I) {
>>> -    Instruction *Inst = cast<Instruction>(VL[I]);
>>> -    if (!matchExtractIndex(Inst, I, Inst->getOpcode()))
>>> -      return false;
>>> +  bool ShouldKeepOrder = true;
>>> +  unsigned E = VL.size();
>>> +  // Assign to all items the initial value E + 1 so we can check if the
>>> extract
>>> +  // instruction index was used already.
>>> +  // Also, later we can check that all the indices are used and we have
>>> a
>>> +  // consecutive access in the extract instructions, by checking that no
>>> +  // element of CurrentOrder still has value E + 1.
>>> +  CurrentOrder.assign(E, E + 1);
>>> +  unsigned I = 0;
>>> +  for (; I < E; ++I) {
>>> +    auto *Inst = cast<Instruction>(VL[I]);
>>>      if (Inst->getOperand(0) != Vec)
>>> -      return false;
>>> +      break;
>>> +    Optional<unsigned> Idx = getExtractIndex(Inst);
>>> +    if (!Idx)
>>> +      break;
>>> +    const unsigned ExtIdx = *Idx;
>>> +    if (ExtIdx != I) {
>>> +      if (ExtIdx >= E || CurrentOrder[ExtIdx] != E + 1)
>>> +        break;
>>> +      ShouldKeepOrder = false;
>>> +      CurrentOrder[ExtIdx] = I;
>>> +    } else {
>>> +      if (CurrentOrder[I] != E + 1)
>>> +        break;
>>> +      CurrentOrder[I] = I;
>>> +    }
>>> +  }
>>> +  if (I < E) {
>>> +    CurrentOrder.clear();
>>> +    return false;
>>>    }
>>>
>>> -  return true;
>>> +  return ShouldKeepOrder;
>>>  }
>>>
>>>  bool BoUpSLP::areAllUsersVectorized(Instruction *I) const {
>>> @@ -2082,8 +2179,13 @@ int BoUpSLP::getEntryCost(TreeEntry *E)
>>>                TTI->getVectorInstrCost(Instruction::ExtractElement,
>>> VecTy, Idx);
>>>          }
>>>        }
>>> -      if (canReuseExtract(VL, S.OpValue)) {
>>> +      if (!E->NeedToGather) {
>>>          int DeadCost = ReuseShuffleCost;
>>> +        if (!E->ReorderIndices.empty()) {
>>> +          // TODO: Merge this shuffle with the ReuseShuffleCost.
>>> +          DeadCost += TTI->getShuffleCost(
>>> +              TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
>>> +        }
>>>          for (unsigned i = 0, e = VL.size(); i < e; ++i) {
>>>            Instruction *E = cast<Instruction>(VL[i]);
>>>            // If all users are going to be vectorized, instruction can be
>>> @@ -2246,7 +2348,8 @@ int BoUpSLP::getEntryCost(TreeEntry *E)
>>>            TTI->getMemoryOpCost(Instruction::Load, ScalarTy, alignment,
>>> 0, VL0);
>>>        int VecLdCost = TTI->getMemoryOpCost(Instruction::Load,
>>>                                             VecTy, alignment, 0, VL0);
>>> -      if (!isConsecutiveAccess(VL[0], VL[1], *DL, *SE)) {
>>> +      if (!E->ReorderIndices.empty()) {
>>> +        // TODO: Merge this shuffle with the ReuseShuffleCost.
>>>          VecLdCost += TTI->getShuffleCost(
>>>              TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
>>>        }
>>> @@ -2944,6 +3047,15 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<V
>>>    return V;
>>>  }
>>>
>>> +static void inversePermutation(ArrayRef<unsigned> Indices,
>>> +                               SmallVectorImpl<unsigned> &Mask) {
>>> +  Mask.clear();
>>> +  const unsigned E = Indices.size();
>>> +  Mask.resize(E);
>>> +  for (unsigned I = 0; I < E; ++I)
>>> +    Mask[Indices[I]] = I;
>>> +}
>>> +
>>>  Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
>>>    IRBuilder<>::InsertPointGuard Guard(Builder);
>>>
>>> @@ -3020,10 +3132,19 @@ Value *BoUpSLP::vectorizeTree(TreeEntry
>>>      }
>>>
>>>      case Instruction::ExtractElement: {
>>> -      if (canReuseExtract(E->Scalars, VL0)) {
>>> +      if (!E->NeedToGather) {
>>>          Value *V = VL0->getOperand(0);
>>> -        if (NeedToShuffleReuses) {
>>> +        if (!E->ReorderIndices.empty()) {
>>> +          OrdersType Mask;
>>> +          inversePermutation(E->ReorderIndices, Mask);
>>>            Builder.SetInsertPoint(VL0);
>>> +          V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),
>>> Mask,
>>> +                                          "reorder_shuffle");
>>> +        }
>>> +        if (NeedToShuffleReuses) {
>>> +          // TODO: Merge this shuffle with the ReorderShuffleMask.
>>> +          if (!E->ReorderIndices.empty())
>>> +            Builder.SetInsertPoint(VL0);
>>>            V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),
>>>                                            E->ReuseShuffleIndices,
>>> "shuffle");
>>>          }
>>> @@ -3044,14 +3165,21 @@ Value *BoUpSLP::vectorizeTree(TreeEntry
>>>        return V;
>>>      }
>>>      case Instruction::ExtractValue: {
>>> -      if (canReuseExtract(E->Scalars, VL0)) {
>>> +      if (!E->NeedToGather) {
>>>          LoadInst *LI = cast<LoadInst>(VL0->getOperand(0));
>>>          Builder.SetInsertPoint(LI);
>>>          PointerType *PtrTy = PointerType::get(VecTy,
>>> LI->getPointerAddressSpace());
>>>          Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);
>>>          LoadInst *V = Builder.CreateAlignedLoad(Ptr,
>>> LI->getAlignment());
>>>          Value *NewV = propagateMetadata(V, E->Scalars);
>>> +        if (!E->ReorderIndices.empty()) {
>>> +          OrdersType Mask;
>>> +          inversePermutation(E->ReorderIndices, Mask);
>>> +          NewV = Builder.CreateShuffleVector(NewV,
>>> UndefValue::get(VecTy), Mask,
>>> +                                             "reorder_shuffle");
>>> +        }
>>>          if (NeedToShuffleReuses) {
>>> +          // TODO: Merge this shuffle with the ReorderShuffleMask.
>>>            NewV = Builder.CreateShuffleVector(
>>>                NewV, UndefValue::get(VecTy), E->ReuseShuffleIndices,
>>> "shuffle");
>>>          }
>>> @@ -3225,10 +3353,9 @@ Value *BoUpSLP::vectorizeTree(TreeEntry
>>>      case Instruction::Load: {
>>>        // Loads are inserted at the head of the tree because we don't
>>> want to
>>>        // sink them all the way down past store instructions.
>>> -      bool IsReversed =
>>> -          !isConsecutiveAccess(E->Scalars[0], E->Scalars[1], *DL, *SE);
>>> -      if (IsReversed)
>>> -        VL0 = cast<Instruction>(E->Scalars.back());
>>> +      bool IsReorder = !E->ReorderIndices.empty();
>>> +      if (IsReorder)
>>> +        VL0 = cast<Instruction>(E->Scalars[E->ReorderIndices.front()]);
>>>        setInsertPointAfterBundle(E->Scalars, VL0);
>>>
>>>        LoadInst *LI = cast<LoadInst>(VL0);
>>> @@ -3252,12 +3379,14 @@ Value *BoUpSLP::vectorizeTree(TreeEntry
>>>        }
>>>        LI->setAlignment(Alignment);
>>>        Value *V = propagateMetadata(LI, E->Scalars);
>>> -      if (IsReversed) {
>>> -        SmallVector<uint32_t, 4> Mask(E->Scalars.size());
>>> -        std::iota(Mask.rbegin(), Mask.rend(), 0);
>>> -        V = Builder.CreateShuffleVector(V,
>>> UndefValue::get(V->getType()), Mask);
>>> +      if (IsReorder) {
>>> +        OrdersType Mask;
>>> +        inversePermutation(E->ReorderIndices, Mask);
>>> +        V = Builder.CreateShuffleVector(V,
>>> UndefValue::get(V->getType()),
>>> +                                        Mask, "reorder_shuffle");
>>>        }
>>>        if (NeedToShuffleReuses) {
>>> +        // TODO: Merge this shuffle with the ReorderShuffleMask.
>>>          V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),
>>>                                          E->ReuseShuffleIndices,
>>> "shuffle");
>>>        }
>>> @@ -4836,8 +4965,10 @@ bool SLPVectorizerPass::tryToVectorizeLi
>>>        ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);
>>>
>>>        R.buildTree(Ops);
>>> +      Optional<ArrayRef<unsigned>> Order = R.bestOrder();
>>>        // TODO: check if we can allow reordering for more cases.
>>> -      if (AllowReorder && R.shouldReorder()) {
>>> +      if (AllowReorder && Order) {
>>> +        // TODO: reorder tree nodes without tree rebuilding.
>>>          // Conceptually, there is nothing actually preventing us from
>>> trying to
>>>          // reorder a larger list. In fact, we do exactly this when
>>> vectorizing
>>>          // reductions. However, at this point, we only expect to get
>>> here when
>>> @@ -5583,9 +5714,13 @@ public:
>>>      while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {
>>>        auto VL = makeArrayRef(&ReducedVals[i], ReduxWidth);
>>>        V.buildTree(VL, ExternallyUsedValues, IgnoreList);
>>> -      if (V.shouldReorder()) {
>>> -        SmallVector<Value *, 8> Reversed(VL.rbegin(), VL.rend());
>>> -        V.buildTree(Reversed, ExternallyUsedValues, IgnoreList);
>>> +      Optional<ArrayRef<unsigned>> Order = V.bestOrder();
>>> +      if (Order) {
>>> +        // TODO: reorder tree nodes without tree rebuilding.
>>> +        SmallVector<Value *, 4> ReorderedOps(VL.size());
>>> +        llvm::transform(*Order, ReorderedOps.begin(),
>>> +                        [VL](const unsigned Idx) { return VL[Idx]; });
>>> +        V.buildTree(ReorderedOps, ExternallyUsedValues, IgnoreList);
>>>        }
>>>        if (V.isTreeTinyAndNotFullyVectorizable())
>>>          break;
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fexternal_user_jumbled_load.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=rl2mPrFwOGqFpoO4E6C4IILK4nTOcUdfAQJR2eJB6Ik%3D&reserved=0>
>>>
>>> ==============================================================================
>>> ---
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
>>> (original)
>>> +++
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
>>> Mon Apr  2 07:51:37 2018
>>> @@ -10,15 +10,16 @@ define void @hoge(i64 %idx, <4 x i32>* %
>>>  ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds [20 x [13 x
>>> i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]], i64 6
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B6%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=v33lZYTwn7UkUQSkulV2%2BlX%2BuvALWn3hoirYXKZE%2BdI%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds [20 x [13 x
>>> i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]], i64 7
>>>  ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [20 x [13 x
>>> i32]], [20 x [13 x i32]]* @array, i64 0, i64 [[IDX]], i64 8
>>> -; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[TMP1]] to <2 x i32>*
>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load <2 x i32>, <2 x i32>* [[TMP4]],
>>> align 4
>>> -; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[TMP0]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]],
>>> align 4
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32>
>>> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 0
>>>  ; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP6]], i32 0
>>> -; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
>>> +; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 1
>>>  ; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32
>>> [[TMP8]], i32 1
>>> -; CHECK-NEXT:    [[TMP10:%.*]] = load i32, i32* [[TMP3]], align 4
>>> +; CHECK-NEXT:    [[TMP10:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 2
>>>  ; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP9]], i32
>>> [[TMP10]], i32 2
>>> -; CHECK-NEXT:    [[TMP12:%.*]] = load i32, i32* [[TMP0]], align 4
>>> +; CHECK-NEXT:    [[TMP12:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 3
>>>  ; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32
>>> [[TMP12]], i32 3
>>>  ; CHECK-NEXT:    store <4 x i32> [[TMP13]], <4 x i32>* [[SINK:%.*]]
>>>  ; CHECK-NEXT:    ret void
>>>
>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fextract.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=bV1fP8LXc%2FPNvnN5CR921Tz%2B1LPStDKoBxt46frv1mE%3D&reserved=0>
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll (original)
>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/extract.ll Mon Apr  2
>>> 07:51:37 2018
>>> @@ -30,14 +30,11 @@ define void @fextr1(double* %ptr) {
>>>  ; CHECK-LABEL: @fextr1(
>>>  ; CHECK-NEXT:  entry:
>>>  ; CHECK-NEXT:    [[LD:%.*]] = load <2 x double>, <2 x double>* undef
>>> -; CHECK-NEXT:    [[V0:%.*]] = extractelement <2 x double> [[LD]], i32 0
>>> -; CHECK-NEXT:    [[V1:%.*]] = extractelement <2 x double> [[LD]], i32 1
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x double>
>>> [[LD]], <2 x double> undef, <2 x i32> <i32 1, i32 0>
>>>  ; CHECK-NEXT:    [[P1:%.*]] = getelementptr inbounds double, double*
>>> [[PTR:%.*]], i64 0
>>> -; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <2 x double> undef,
>>> double [[V1]], i32 0
>>> -; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]],
>>> double [[V0]], i32 1
>>> -; CHECK-NEXT:    [[TMP2:%.*]] = fadd <2 x double> <double 3.400000e+00,
>>> double 1.200000e+00>, [[TMP1]]
>>> -; CHECK-NEXT:    [[TMP3:%.*]] = bitcast double* [[P1]] to <2 x double>*
>>> -; CHECK-NEXT:    store <2 x double> [[TMP2]], <2 x double>* [[TMP3]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = fadd <2 x double> <double 3.400000e+00,
>>> double 1.200000e+00>, [[REORDER_SHUFFLE]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast double* [[P1]] to <2 x double>*
>>> +; CHECK-NEXT:    store <2 x double> [[TMP0]], <2 x double>* [[TMP1]],
>>> align 4
>>>  ; CHECK-NEXT:    ret void
>>>  ;
>>>  entry:
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-multiuse.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=8XMOu0xXV8jSYkO7kgWzzNidOS2PYWv%2BgOc3bDPQSVI%3D&reserved=0>
>>>
>>> ==============================================================================
>>> ---
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll
>>> (original)
>>> +++
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll Mon
>>> Apr  2 07:51:37 2018
>>> @@ -11,21 +11,16 @@
>>>      define i32 @fn1() {
>>>  ; CHECK-LABEL: @fn1(
>>>  ; CHECK-NEXT:  entry:
>>> -; CHECK-NEXT:    [[TMP0:%.*]] = load i32, i32* getelementptr inbounds
>>> ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* bitcast
>>> (i32* getelementptr inbounds ([4 x i32], [4 x i32]* @b, i64 0, i32 1) to <2
>>> x i32>*), align 4
>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* getelementptr inbounds
>>> ([4 x i32], [4 x i32]* @b, i64 0, i32 3), align 4
>>> -; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
>>> -; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP3]], i32 0
>>> -; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
>>> -; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32
>>> [[TMP5]], i32 1
>>> -; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32
>>> [[TMP2]], i32 2
>>> -; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32
>>> [[TMP0]], i32 3
>>> -; CHECK-NEXT:    [[TMP9:%.*]] = icmp sgt <4 x i32> [[TMP8]],
>>> zeroinitializer
>>> -; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP4]], i32
>>> ptrtoint (i32 ()* @fn1 to i32), i32 1
>>> -; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32
>>> ptrtoint (i32 ()* @fn1 to i32), i32 2
>>> -; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32
>>> 8, i32 3
>>> -; CHECK-NEXT:    [[TMP13:%.*]] = select <4 x i1> [[TMP9]], <4 x i32>
>>> [[TMP12]], <4 x i32> <i32 6, i32 0, i32 0, i32 0>
>>> -; CHECK-NEXT:    store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([4 x
>>> i32]* @a to <4 x i32>*), align 4
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* bitcast ([4
>>> x i32]* @b to <4 x i32>*), align 4
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32>
>>> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = icmp sgt <4 x i32> [[REORDER_SHUFFLE]],
>>> zeroinitializer
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 0
>>> +; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP2]], i32 0
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32
>>> ptrtoint (i32 ()* @fn1 to i32), i32 1
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32
>>> ptrtoint (i32 ()* @fn1 to i32), i32 2
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32
>>> 8, i32 3
>>> +; CHECK-NEXT:    [[TMP7:%.*]] = select <4 x i1> [[TMP1]], <4 x i32>
>>> [[TMP6]], <4 x i32> <i32 6, i32 0, i32 0, i32 0>
>>> +; CHECK-NEXT:    store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([4 x
>>> i32]* @a to <4 x i32>*), align 4
>>>  ; CHECK-NEXT:    ret i32 0
>>>  ;
>>>    entry:
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-shuffle-placement.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=fwMEz9oSB29s2M6HUQ7SF9J0cIZsl7eDkpZTQTJQ%2BGQ%3D&reserved=0>
>>>
>>> ==============================================================================
>>> ---
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
>>> (original)
>>> +++
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
>>> Mon Apr  2 07:51:37 2018
>>> @@ -21,28 +21,21 @@
>>>  ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32*
>>> [[A:%.*]], i64 10
>>>  ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 11
>>>  ; CHECK-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=AoZcLfcgO4ecBPvV8yQ6pqzfkfZ8NTuwmz3oLw4ipOI%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[A]] to <2 x i32>*
>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]],
>>> align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 12
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B12%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=S62a0wXMjld0um2qOX7HC0pCmdWzCvYkmDJkHAs9jUQ%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=olwmtTeQ7dyLHt%2BhtvB5GlsUhkEpM0niCX9dxoj7x9o%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX6]], align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 1 <https://maps.google.com/?q=i64+1&entry=gmail&source=g>3
>>> -; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*
>>> -; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]],
>>> align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=wejUBfkbY93mzs8%2F3ndKDRWu20lA%2BVn2TvhnsKjhk10%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX9]], align 4
>>> -; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
>>> -; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP6]], i32 0
>>> -; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
>>> -; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32
>>> [[TMP8]], i32 1
>>> -; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32
>>> [[TMP2]], i32 2
>>> -; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32
>>> [[TMP5]], i32 3
>>> -; CHECK-NEXT:    [[TMP12:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP11]]
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[A]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, <4 x i32>* [[TMP2]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x
>>> i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = mul nsw <4 x i32> [[TMP1]], [[TMP4]]
>>>  ; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32*
>>> [[B:%.*]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=AoZcLfcgO4ecBPvV8yQ6pqzfkfZ8NTuwmz3oLw4ipOI%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524476822&sdata=wejUBfkbY93mzs8%2F3ndKDRWu20lA%2BVn2TvhnsKjhk10%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP13:%.*]] = bitcast i32* [[B]] to <4 x i32>*
>>> -; CHECK-NEXT:    store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align
>>> 4
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[B]] to <4 x i32>*
>>> +; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
>>>  ; CHECK-NEXT:    ret void
>>>  ;
>>>  entry:
>>> @@ -83,28 +76,21 @@ entry:
>>>  ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32*
>>> [[A:%.*]], i64 10
>>>  ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 11
>>>  ; CHECK-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[A]] to <2 x i32>*
>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]],
>>> align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 12
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B12%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=apewEPIwdtZClt%2B1lJlxH2%2BT4OFOc3ThP798o7hl6cc%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[ARRAYIDX6]], align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 1 <https://maps.google.com/?q=i64+1&entry=gmail&source=g>3
>>> -; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*
>>> -; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, <4 x i32>* [[TMP0]],
>>> align 4
>>>  ; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[ARRAYIDX9]], align 4
>>> -; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
>>> -; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP6]], i32 0
>>> -; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
>>> -; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32
>>> [[TMP8]], i32 1
>>> -; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32
>>> [[TMP2]], i32 2
>>> -; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32
>>> [[TMP5]], i32 3
>>> -; CHECK-NEXT:    [[TMP12:%.*]] = mul nsw <4 x i32> [[TMP11]], [[TMP4]]
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[A]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, <4 x i32>* [[TMP2]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x
>>> i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP1]]
>>>  ; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32*
>>> [[B:%.*]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[TMP13:%.*]] = bitcast i32* [[B]] to <4 x i32>*
>>> -; CHECK-NEXT:    store <4 x i32> [[TMP12]], <4 x i32>* [[TMP13]], align
>>> 4
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[B]] to <4 x i32>*
>>> +; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
>>>  ; CHECK-NEXT:    ret void
>>>  ;
>>>  entry:
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load-used-in-phi.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=WE%2BNqAsVbLBFYjU56aW9dvl4n%2BBgIAd3XqLUZvPmMGg%3D&reserved=0>
>>>
>>> ==============================================================================
>>> ---
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll
>>> (original)
>>> +++
>>> llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll
>>> Mon Apr  2 07:51:37 2018
>>> @@ -48,11 +48,11 @@ define void @phiUsingLoads(i32* noalias
>>>  ; CHECK-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=1%2FDGlgA9%2BbUWPiyfAb8US8%2BTfIEEVjls0DR5OllQFaI%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[ARRAYIDX66:%.*]] = getelementptr inbounds i32, i32*
>>> [[B]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0>
>>>  ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[B]] to <4 x i32>*
>>> -; CHECK-NEXT:    store <4 x i32> [[TMP34:%.*]], <4 x i32>* [[TMP1]],
>>> align 4
>>> +; CHECK-NEXT:    store <4 x i32> [[TMP27:%.*]], <4 x i32>* [[TMP1]],
>>> align 4
>>>  ; CHECK-NEXT:    ret void
>>>  ; CHECK:       for.body:
>>>  ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [
>>> [[INDVARS_IV_NEXT:%.*]], [[FOR_INC:%.*]] ]
>>> -; CHECK-NEXT:    [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [
>>> [[TMP34]], [[FOR_INC]] ]
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = phi <4 x i32> [ undef, [[ENTRY]] ], [
>>> [[TMP27]], [[FOR_INC]] ]
>>>  ; CHECK-NEXT:    br i1 [[CMP1]], label [[IF_THEN:%.*]], label
>>> [[IF_ELSE:%.*]]
>>>  ; CHECK:       if.then:
>>>  ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[INDVARS_IV]]
>>> @@ -103,23 +103,16 @@ define void @phiUsingLoads(i32* noalias
>>>  ; CHECK-NEXT:    [[ARRAYIDX49:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[INDVARS_IV]]
>>>  ; CHECK-NEXT:    [[TMP21:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
>>>  ; CHECK-NEXT:    [[ARRAYIDX52:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[TMP21]]
>>> -; CHECK-NEXT:    [[TMP22:%.*]] = bitcast i32* [[ARRAYIDX49]] to <2 x
>>> i32>*
>>> -; CHECK-NEXT:    [[TMP23:%.*]] = load <2 x i32>, <2 x i32>* [[TMP22]],
>>> align 4
>>> -; CHECK-NEXT:    [[TMP24:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
>>> -; CHECK-NEXT:    [[ARRAYIDX55:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[TMP24]]
>>> -; CHECK-NEXT:    [[TMP25:%.*]] = load i32, i32* [[ARRAYIDX55]], align 4
>>> -; CHECK-NEXT:    [[TMP26:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
>>> -; CHECK-NEXT:    [[ARRAYIDX58:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[TMP26]]
>>> -; CHECK-NEXT:    [[TMP27:%.*]] = load i32, i32* [[ARRAYIDX58]], align 4
>>> -; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <2 x i32> [[TMP23]],
>>> i32 0
>>> -; CHECK-NEXT:    [[TMP29:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP28]], i32 0
>>> -; CHECK-NEXT:    [[TMP30:%.*]] = extractelement <2 x i32> [[TMP23]],
>>> i32 1
>>> -; CHECK-NEXT:    [[TMP31:%.*]] = insertelement <4 x i32> [[TMP29]], i32
>>> [[TMP30]], i32 1
>>> -; CHECK-NEXT:    [[TMP32:%.*]] = insertelement <4 x i32> [[TMP31]], i32
>>> [[TMP25]], i32 2
>>> -; CHECK-NEXT:    [[TMP33:%.*]] = insertelement <4 x i32> [[TMP32]], i32
>>> [[TMP27]], i32 3
>>> +; CHECK-NEXT:    [[TMP22:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
>>> +; CHECK-NEXT:    [[ARRAYIDX55:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[TMP22]]
>>> +; CHECK-NEXT:    [[TMP23:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
>>> +; CHECK-NEXT:    [[ARRAYIDX58:%.*]] = getelementptr inbounds i32, i32*
>>> [[A]], i64 [[TMP23]]
>>> +; CHECK-NEXT:    [[TMP24:%.*]] = bitcast i32* [[ARRAYIDX49]] to <4 x
>>> i32>*
>>> +; CHECK-NEXT:    [[TMP25:%.*]] = load <4 x i32>, <4 x i32>* [[TMP24]],
>>> align 4
>>> +; CHECK-NEXT:    [[TMP26:%.*]] = shufflevector <4 x i32> [[TMP25]], <4
>>> x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
>>>  ; CHECK-NEXT:    br label [[FOR_INC]]
>>>  ; CHECK:       for.inc:
>>> -; CHECK-NEXT:    [[TMP34]] = phi <4 x i32> [ [[TMP7]], [[IF_THEN]] ], [
>>> [[TMP13]], [[IF_THEN14]] ], [ [[TMP19]], [[IF_THEN30]] ], [ [[TMP33]],
>>> [[IF_THEN46]] ], [ [[TMP2]], [[IF_ELSE43]] ]
>>> +; CHECK-NEXT:    [[TMP27]] = phi <4 x i32> [ [[TMP7]], [[IF_THEN]] ], [
>>> [[TMP13]], [[IF_THEN14]] ], [ [[TMP19]], [[IF_THEN30]] ], [ [[TMP26]],
>>> [[IF_THEN46]] ], [ [[TMP2]], [[IF_ELSE43]] ]
>>>  ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
>>>  ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100
>>>  ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]],
>>> label [[FOR_BODY]]
>>>
>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll?rev=328980&r1=328979&r2=328980&view=diff
>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fviewvc%2Fllvm-project%2Fllvm%2Ftrunk%2Ftest%2FTransforms%2FSLPVectorizer%2FX86%2Fjumbled-load.ll%3Frev%3D328980%26r1%3D328979%26r2%3D328980%26view%3Ddiff&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=DV4dtudqqCgMsQ2oDjmymL2O5ZAqEcFYqYSA7Kk5ZVY%3D&reserved=0>
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll
>>> (original)
>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll Mon
>>> Apr  2 07:51:37 2018
>>> @@ -6,33 +6,26 @@
>>>  define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias
>>> nocapture %inn, i32* noalias nocapture %out) {
>>>  ; CHECK-LABEL: @jumbled-load(
>>>  ; CHECK-NEXT:    [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN:%.*]], i64 0
>>> -; CHECK-NEXT:    [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4
>>>  ; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=J3aT4epTAe9vc9vKYP9%2BqS1OzbkHdCysrL7iHgDahbc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4
>>>  ; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524633074&sdata=2IHzOosI5tl2bI3tIUCOciOYGN4xe9dgPK4TMtbnCCc%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4
>>>  ; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]],
>>> align 4
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32>
>>> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
>>>  ; CHECK-NEXT:    [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32*
>>> [[INN:%.*]], i64 0
>>> -; CHECK-NEXT:    [[LOAD_5:%.*]] = load i32, i32* [[INN_ADDR]], align 4
>>>  ; CHECK-NEXT:    [[GEP_4:%.*]] = getelementptr inbounds i32, i32*
>>> [[INN_ADDR]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_6:%.*]] = load i32, i32* [[GEP_4]], align 4
>>>  ; CHECK-NEXT:    [[GEP_5:%.*]] = getelementptr inbounds i32, i32*
>>> [[INN_ADDR]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_7:%.*]] = load i32, i32* [[GEP_5]], align 4
>>>  ; CHECK-NEXT:    [[GEP_6:%.*]] = getelementptr inbounds i32, i32*
>>> [[INN_ADDR]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_8:%.*]] = load i32, i32* [[GEP_6]], align 4
>>> -; CHECK-NEXT:    [[MUL_1:%.*]] = mul i32 [[LOAD_3]], [[LOAD_5]]
>>> -; CHECK-NEXT:    [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_8]]
>>> -; CHECK-NEXT:    [[MUL_3:%.*]] = mul i32 [[LOAD_4]], [[LOAD_7]]
>>> -; CHECK-NEXT:    [[MUL_4:%.*]] = mul i32 [[LOAD_1]], [[LOAD_6]]
>>> +; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[INN_ADDR]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, <4 x i32>* [[TMP3]],
>>> align 4
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE1:%.*]] = shufflevector <4 x i32>
>>> [[TMP4]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = mul <4 x i32> [[REORDER_SHUFFLE]],
>>> [[REORDER_SHUFFLE1]]
>>>  ; CHECK-NEXT:    [[GEP_7:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT:%.*]], i64 0
>>> -; CHECK-NEXT:    store i32 [[MUL_1]], i32* [[GEP_7]], align 4
>>>  ; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0>
>>> -; CHECK-NEXT:    store i32 [[MUL_2]], i32* [[GEP_8]], align 4
>>>  ; CHECK-NEXT:    [[GEP_9:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0>
>>> -; CHECK-NEXT:    store i32 [[MUL_3]], i32* [[GEP_9]], align 4
>>>  ; CHECK-NEXT:    [[GEP_10:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0>
>>> -; CHECK-NEXT:    store i32 [[MUL_4]], i32* [[GEP_10]], align 4
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[GEP_7]] to <4 x i32>*
>>> +; CHECK-NEXT:    store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
>>>  ; CHECK-NEXT:    ret i32 undef
>>>  ;
>>>    %in.addr = getelementptr inbounds i32, i32* %in, i64 0
>>> @@ -71,25 +64,27 @@ define i32 @jumbled-load(i32* noalias no
>>>  define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32*
>>> noalias nocapture %out) {
>>>  ; CHECK-LABEL: @jumbled-load-multiuses(
>>>  ; CHECK-NEXT:    [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN:%.*]], i64 0
>>> -; CHECK-NEXT:    [[LOAD_1:%.*]] = load i32, i32* [[IN_ADDR]], align 4
>>>  ; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 3
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_2:%.*]] = load i32, i32* [[GEP_1]], align 4
>>>  ; CHECK-NEXT:    [[GEP_2:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_3:%.*]] = load i32, i32* [[GEP_2]], align 4
>>>  ; CHECK-NEXT:    [[GEP_3:%.*]] = getelementptr inbounds i32, i32*
>>> [[IN_ADDR]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0>
>>> -; CHECK-NEXT:    [[LOAD_4:%.*]] = load i32, i32* [[GEP_3]], align 4
>>> -; CHECK-NEXT:    [[MUL_1:%.*]] = mul i32 [[LOAD_3]], [[LOAD_4]]
>>> -; CHECK-NEXT:    [[MUL_2:%.*]] = mul i32 [[LOAD_2]], [[LOAD_2]]
>>> -; CHECK-NEXT:    [[MUL_3:%.*]] = mul i32 [[LOAD_4]], [[LOAD_1]]
>>> -; CHECK-NEXT:    [[MUL_4:%.*]] = mul i32 [[LOAD_1]], [[LOAD_3]]
>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*
>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]],
>>> align 4
>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32>
>>> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
>>> +; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 2
>>> +; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32
>>> [[TMP3]], i32 0
>>> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 1
>>> +; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32
>>> [[TMP5]], i32 1
>>> +; CHECK-NEXT:    [[TMP7:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 3
>>> +; CHECK-NEXT:    [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32
>>> [[TMP7]], i32 2
>>> +; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <4 x i32>
>>> [[REORDER_SHUFFLE]], i32 0
>>> +; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32
>>> [[TMP9]], i32 3
>>> +; CHECK-NEXT:    [[TMP11:%.*]] = mul <4 x i32> [[REORDER_SHUFFLE]],
>>> [[TMP10]]
>>>  ; CHECK-NEXT:    [[GEP_7:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT:%.*]], i64 0
>>> -; CHECK-NEXT:    store i32 [[MUL_1]], i32* [[GEP_7]], align 4
>>>  ; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]], i64 1
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B1%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=g3n%2B%2B0pgo13072FhwbZE4NSoEvvaojnmvVwlzwMvGqY%3D&reserved=0>
>>> -; CHECK-NEXT:    store i32 [[MUL_2]], i32* [[GEP_8]], align 4
>>>  ; CHECK-NEXT:    [[GEP_9:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]], i64 2
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B2%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=0Qr0DYJa61PiMNyPGEMJwfffqUI3%2FTqm0KyCXU558cY%3D&reserved=0>
>>> -; CHECK-NEXT:    store i32 [[MUL_3]], i32* [[GEP_9]], align 4
>>>  ; CHECK-NEXT:    [[GEP_10:%.*]] = getelementptr inbounds i32, i32*
>>> [[OUT]],
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaps.google.com%2F%3Fq%3Di64%2B3%26entry%3Dgmail%26source%3Dg&data=02%7C01%7C%7C956036c2898b4a2badde08d599240fc1%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636583302524789320&sdata=zku8eabxvJs%2BRdVO02F19fcf76rlLb2q%2BYH%2F2M11fQg%3D&reserved=0>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180403/b7cad5cc/attachment.html>


More information about the llvm-commits mailing list