[llvm] r296863 - [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available

Fri Mar 10 19:39:35 PST 2017

No need to apologise. I realize you're new to the project, I just wanted to
explain how we work.

Michael

On Mar 10, 2017 19:00, "Shahid, Asghar-ahmad" <Asghar-ahmad.Shahid at amd.com>
wrote:

> Hi Michael,
>
>
>
> My sincere apologies for the inconvenience, I understand my request was
> not proper.
>
>
>
> Regards,
>
> Shahid
>
>
>
> *From:* Michael Kuperstein [mailto:mkuper at google.com]
> *Sent:* Saturday, March 11, 2017 1:05 AM
> *To:* Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>
> *Cc:* Daniel Jasper <djasper at google.com>; llvm-commits <
> llvm-commits at lists.llvm.org>; Hans Wennborg <hwennborg at google.com>;
> Matthias Braun <matze at braunis.de>; Steven Wu <stevenwu at apple.com>; Adam
> Nemet <anemet at apple.com>
> *Subject:* Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in
> order uses of scalars which needs to be available
>
>
>
> Shahid, I've reverted everything in r297493.
>
>
>
> A note for the future, though - this is not how this project works.
>
>
>
> We try hard not to have correctness regressions. That means that when
> people find out a recent commit breaks something, there are usually two
> possible courses of action:
>
> 1) If this breakage is blocking people, or if there is no easy fix in
> sight, we revert immediately.
>
> 2) If not, the author can choose between reverting, or trying to fix the
> bug in-tree, if they see a quick fix.
>
>
>
> But the underlying assumption is that if the fixing the bug in-tree fails
> (or takes too long), the author will revert their commit. And the revert
> may be much messier, because more time has passed, and more work was
> layered on top of the original commit. This is a risk the patch author
> takes if they decide to fix in-tree, and then has to own - everyone here
> either has a schedule, or is volunteering their time to the project, so
> people don't normally have the spare time or the will to do this sort of
> cleanup work.
>
>
>
> There are exceptions, of course - what if the patch fixes another critical
> bug, etc. But none of that applies if the breaking commit is a performance
> improvement.
>
>
>
> Thanks,
>
>   Michael
>
>
>
> On Fri, Mar 10, 2017 at 1:34 AM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid at amd.com> wrote:
>
> Ok to revert but I am little tight on my schedules. Can you / Adam do the
> revert? … appreciate your support.
>
>
>
> Thanks,
>
> Shahid
>
>
>
> *From:* Michael Kuperstein [mailto:mkuper at google.com]
> *Sent:* Friday, March 10, 2017 5:19 AM
> *To:* Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>
> *Cc:* Daniel Jasper <djasper at google.com>; llvm-commits <
> llvm-commits at lists.llvm.org>; Hans Wennborg <hwennborg at google.com>;
> Matthias Braun <matze at braunis.de>; Steven Wu <stevenwu at apple.com>; Adam
> Nemet <anemet at apple.com>
>
>
> *Subject:* Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in
> order uses of scalars which needs to be available
>
>
>
> I'm sort of agnostic on having multiple tree nodes for the same set of
> scalar values vs. having a DAG - using a tree won't blow the graph up too
> much, since we're only talking about the leaves, and I don't expect high
> duplication of those either...
>
> If we have several tree nodes, we can keep the order on the node, if we
> have a single node, we can keep the order on the edges.
>
> I do think it would be good to have an explicit graph / tree, instead of
> making the scalar-based lookup even more complicated.
>
> In any case, I'd strongly prefer to have this discussion while we don't
> have a known mis-compile affecting SPEC in tree.
> Would you mind reverting for now? Note that reverting just became slightly
> more difficult because of conflicts with r297303...
>
>
>
> Thanks,
>
>   Michael
>
>
>
> On Thu, Mar 9, 2017 at 3:02 AM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid at amd.com> wrote:
>
> The other probable idea regarding the resolution are as below,
>
>
>
> Allow multiple TreeEntries for different UseMasks and
>
> 1.       Maintain a Hash table for multiple UseMasks and query the tree
> index using a hash function which is based on the UseMasks, OR
>
>
>
> 2.       Traverse the vectorizable Tree and look for the proper tree
> entry index after comparing all the Scalar’s indexes are same, i.e
> isEqual(INDEX(E->Scalars[i]))
>
>
>
> Both of above may be expensive but it may be amortized based on the OpCode
> and early exits?
>
>
>
> Regards,
>
> Shahid
>
>
>
> *From:* Michael Kuperstein [mailto:mkuper at google.com]
> *Sent:* Wednesday, March 8, 2017 12:30 AM
> *To:* Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>
> *Cc:* Daniel Jasper <djasper at google.com>; llvm-commits <
> llvm-commits at lists.llvm.org>; Hans Wennborg <hwennborg at google.com>;
> Matthias Braun <matze at braunis.de>; Steven Wu <stevenwu at apple.com>
>
>
> *Subject:* Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in
> order uses of scalars which needs to be available
>
>
>
> I think we're going to have to revert the entire patch set (r293386,
> r294027, r296411, anything else?) and go back to the drawing board on
> this...
>
>
>
> Any other ideas?
>
>
>
> On Mon, Mar 6, 2017 at 10:44 PM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid at amd.com> wrote:
>
> Hi Michael,
>
>
>
> Apart from single load bundle tree entry another problem is the mapping of
> scalars to tree entries.
>
>
>
> Currently even if we allow multiple load bundle tree entries for in-order
> & OOO uses, queries for tree
>
> entry index will not be proper as it is based on scalars which is assigned
> same index irrespective of the
>
> order.
>
>
>
> -Shahid
>
>
>
> *From:* Michael Kuperstein [mailto:mkuper at google.com]
> *Sent:* Tuesday, March 7, 2017 5:15 AM
> *To:* Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>; Daniel Jasper <
> djasper at google.com>
> *Cc:* llvm-commits <llvm-commits at lists.llvm.org>; Hans Wennborg <
> hwennborg at google.com>; Matthias Braun <matze at braunis.de>; Steven Wu <
> stevenwu at apple.com>
> *Subject:* Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in
> order uses of scalars which needs to be available
>
>
>
> Hi Shahid,
>
>
>
> Unfortunately, this still fails in some cases - and it's partially my
> fault, since it's caused by a change I suggested on the review.
>
>
>
> The problem is that we end up having a single load bundle that's used in
> two different orders. Since we only have a single tree entry for that
> bundle, we generate a single mask, and use the same shuffle for both uses.
> Note that I I'm not sure computing the mask on-demand (like we did before)
> would always help. I think it would break if one of the uses was in-order,
> another was out-of-order, and we reached buildTree_vec with the in-order
> use first. We'd then set NeedToShuffle to false, and wouldn't generate a
> shuffle for the out-of-order use.
>
>
>
> I'm not sure if the right solution is to generate the mask on demand like
> we did before (but then we need to make sure the issue above doesn't
> happen), or have distinct tree entries.
>
>
>
> Reduced example:
>
>
>
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>
> target triple = "x86_64-pc-linux-gnu"
>
>
>
> %complex = type { { double, double } }
>
>
>
> define void @eggs(%complex* %p) local_unnamed_addr #0 {
>
> bb:
>
>   br i1 undef, label %bb2, label %bb1
>
>
>
> bb1:                                              ; preds = %bb
>
>   ret void
>
>
>
> bb2:                                              ; preds = %bb2, %bb
>
>   %phi0 = phi double [ %tmp21, %bb2 ], [ 0.000000e+00, %bb ]
>
>   %phi1 = phi double [ %tmp20, %bb2 ], [ 0.000000e+00, %bb ]
>
>   %phi2 = phi double [ %tmp29, %bb2 ], [ 0.000000e+00, %bb ]
>
>   %phi3 = phi double [ %tmp28, %bb2 ], [ 0.000000e+00, %bb ]
>
>
>
>   %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 0
>
>   %load0 = load double, double* %gep0, align 8
>
>   %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 1
>
>   %load1 = load double, double* %gep1, align 8
>
>   %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 0
>
>   %load2 = load double, double* %gep2, align 8
>
>   %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 1
>
>   %load3 = load double, double* %gep3, align 8
>
>
>
>   %tmp14 = fmul double 1.0, %load0
>
>   %tmp15 = fmul double 2.0, %load1
>
>   %tmp16 = fsub double %tmp14, %tmp15
>
>   %tmp17 = fmul double 1.0, %load1
>
>   %tmp18 = fmul double 2.0, %load0
>
>   %tmp19 = fsub double %tmp17, %tmp18
>
>   %tmp20 = fadd double %phi1, %tmp16
>
>   %tmp21 = fadd double %phi0, %tmp19
>
>   %tmp22 = fmul double 1.0, %load2
>
>   %tmp23 = fmul double 2.0, %load3
>
>   %tmp24 = fsub double %tmp22, %tmp23
>
>   %tmp25 = fmul double 1.0, %load3
>
>   %tmp26 = fmul double 2.0, %load2
>
>   %tmp27 = fsub double %tmp25, %tmp26
>
>   %tmp28 = fadd double %phi3, %tmp24
>
>   %tmp29 = fadd double %phi2, %tmp27
>
>   br label %bb2
>
> }
>
>
>
> Good output:
>
>
>
> bb2:                                              ; preds = %bb2, %bb
>
>   %0 = phi <4 x double> [ %10, %bb2 ], [ zeroinitializer, %bb ]
>
>   %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 0
>
>   %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 1
>
>   %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 0
>
>   %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 1
>
>   %1 = bitcast double* %gep0 to <4 x double>*
>
>   %2 = load <4 x double>, <4 x double>* %1, align 8
>
>   %3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32
> 0, i32 1, i32 2, i32 3>
>
>   %4 = bitcast double* %gep0 to <4 x double>*
>
>   %5 = load <4 x double>, <4 x double>* %4, align 8
>
>   %6 = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> <i32
> 1, i32 0, i32 3, i32 2>
>
>   %7 = fmul <4 x double> <double 1.000000e+00, double 1.000000e+00, double
> 1.000000e+00, double 1.000000e+00>, %6
>
>   %8 = fmul <4 x double> <double 2.000000e+00, double 2.000000e+00, double
> 2.000000e+00, double 2.000000e+00>, %3
>
>   %9 = fsub <4 x double> %7, %8
>
>   %10 = fadd <4 x double> %0, %9
>
>   br label %bb2
>
>
>
> Bad output:
>
> bb2:                                              ; preds = %bb2, %bb
>
>   %0 = phi <4 x double> [ %10, %bb2 ], [ zeroinitializer, %bb ]
>
>   %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 0
>
>   %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32
> 1
>
>   %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 0
>
>   %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32
> 1
>
>   %1 = bitcast double* %gep0 to <4 x double>*
>
>   %2 = load <4 x double>, <4 x double>* %1, align 8
>
>   %3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32
> 1, i32 0, i32 3, i32 2>
>
>   %4 = bitcast double* %gep0 to <4 x double>*
>
>   %5 = load <4 x double>, <4 x double>* %4, align 8
>
>   %6 = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> <i32
> 1, i32 0, i32 3, i32 2>
>
>   %7 = fmul <4 x double> <double 1.000000e+00, double 1.000000e+00, double
> 1.000000e+00, double 1.000000e+00>, %6
>
>   %8 = fmul <4 x double> <double 2.000000e+00, double 2.000000e+00, double
> 2.000000e+00, double 2.000000e+00>, %3
>
>   %9 = fsub <4 x double> %7, %8
>
>   %10 = fadd <4 x double> %0, %9
>
>   br label %bb2
>
>
>
> Note the difference in %3.
>
> (In this reproducer, the "good" %3 shuffle is a nop, but the original case
> I saw had a different order.)
>
>
>
>
>
>
>
> On Fri, Mar 3, 2017 at 2:02 AM, Mohammad Shahid via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> Author: ashahid
> Date: Fri Mar  3 04:02:47 2017
> New Revision: 296863
>
> URL: http://llvm.org/viewvc/llvm-project?rev=296863&view=rev
> Log:
> [SLP] Fixes the bug due to absence of in order uses of scalars which needs
> to be available
> for VectorizeTree() API.This API uses it for proper mask computation to be
> used in shufflevector IR.
> The fix is to compute the mask for out of order memory accesses while
> building the vectorizable tree
> instead of actual vectorization of vectorizable tree.It also needs to
> recompute the proper Lane for
> external use of vectorizable scalars based on shuffle mask.
>
> Reviewers: mkuper
>
> Differential Revision: https://reviews.llvm.org/D30159
>
> Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
>
> Added:
>     llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-bug.ll
> Modified:
>     llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h
>     llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp
>     llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>     llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-same.ll
>
> Modified: llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/
> LoopAccessAnalysis.h?rev=296863&r1=296862&r2=296863&view=diff
> ============================================================
> ==================
> --- llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h (original)
> +++ llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h Fri Mar  3
> 04:02:47 2017
> @@ -660,12 +660,15 @@ int64_t getPtrStride(PredicatedScalarEvo
>  /// \brief Try to sort an array of loads / stores.
>  ///
>  /// An array of loads / stores can only be sorted if all pointer operands
> -/// refer to the same object, and the differences between these pointers
> +/// refer to the same object, and the differences between these pointers
>  /// are known to be constant. If that is the case, this returns true, and
> the
>  /// sorted array is returned in \p Sorted. Otherwise, this returns false,
> and
>  /// \p Sorted is invalid.
> +//  If \p Mask is not null, it also returns the \p Mask which is the
> shuffle
> +//  mask for actual memory access order.
>  bool sortMemAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
> -                     ScalarEvolution &SE, SmallVectorImpl<Value *>
> &Sorted);
> +                     ScalarEvolution &SE, SmallVectorImpl<Value *>
> &Sorted,
> +                     SmallVectorImpl<unsigned> *Mask = nullptr);
>
>  /// \brief Returns true if the memory operations \p A and \p B are
> consecutive.
>  /// This is a simple API that does not depend on the analysis pass.
>
> Modified: llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Analysis/LoopAccessAnalysis.cpp?rev=296863&r1=296862&r2=296863&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp (original)
> +++ llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp Fri Mar  3 04:02:47
> 2017
> @@ -1040,7 +1040,8 @@ static unsigned getAddressSpaceOperand(V
>
>  bool llvm::sortMemAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
>                             ScalarEvolution &SE,
> -                           SmallVectorImpl<Value *> &Sorted) {
> +                           SmallVectorImpl<Value *> &Sorted,
> +                           SmallVectorImpl<unsigned> *Mask) {
>    SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
>    OffValPairs.reserve(VL.size());
>    Sorted.reserve(VL.size());
> @@ -1050,7 +1051,6 @@ bool llvm::sortMemAccesses(ArrayRef<Valu
>    Value *Ptr0 = getPointerOperand(VL[0]);
>    const SCEV *Scev0 = SE.getSCEV(Ptr0);
>    Value *Obj0 = GetUnderlyingObject(Ptr0, DL);
> -
>    for (auto *Val : VL) {
>      // The only kind of access we care about here is load.
>      if (!isa<LoadInst>(Val))
> @@ -1077,14 +1077,30 @@ bool llvm::sortMemAccesses(ArrayRef<Valu
>      OffValPairs.emplace_back(Diff->getAPInt().getSExtValue(), Val);
>    }
>
> -  std::sort(OffValPairs.begin(), OffValPairs.end(),
> -            [](const std::pair<int64_t, Value *> &Left,
> -               const std::pair<int64_t, Value *> &Right) {
> -              return Left.first < Right.first;
> +  SmallVector<unsigned, 4> UseOrder(VL.size());
> +  for (unsigned i = 0; i < VL.size(); i++) {
> +    UseOrder[i] = i;
> +  }
> +
> +  // Sort the memory accesses and keep the order of their uses in
> UseOrder.
> +  std::sort(UseOrder.begin(), UseOrder.end(),
> +            [&OffValPairs](unsigned Left, unsigned Right) {
> +              return OffValPairs[Left].first < OffValPairs[Right].first;
>              });
>
> -  for (auto &it : OffValPairs)
> -    Sorted.push_back(it.second);
> +  for (unsigned i = 0; i < VL.size(); i++)
> +    Sorted.emplace_back(OffValPairs[UseOrder[i]].second);
> +
> +  // Sort UseOrder to compute the Mask.
> +  if (Mask) {
> +    Mask->reserve(VL.size());
> +    for (unsigned i = 0; i < VL.size(); i++)
> +      Mask->emplace_back(i);
> +    std::sort(Mask->begin(), Mask->end(),
> +              [&UseOrder](unsigned Left, unsigned Right) {
> +                return UseOrder[Left] < UseOrder[Right];
> +              });
> +  }
>
>    return true;
>  }
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Transforms/Vectorize/SLPVectorizer.cpp?rev=296863&
> r1=296862&r2=296863&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Mar  3
> 04:02:47 2017
> @@ -423,10 +423,8 @@ private:
>    /// be vectorized to use the original vector (or aggregate "bitcast" to
> a vector).
>    bool canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const;
>
> -  /// Vectorize a single entry in the tree. VL icontains all isomorphic
> scalars
> -  /// in order of its usage in a user program, for example ADD1, ADD2 and
> so on
> -  /// or LOAD1 , LOAD2 etc.
> -  Value *vectorizeTree(ArrayRef<Value *> VL, TreeEntry *E);
> +  /// Vectorize a single entry in the tree.
> +  Value *vectorizeTree(TreeEntry *E);
>
>    /// Vectorize a single entry in the tree, starting in \p VL.
>    Value *vectorizeTree(ArrayRef<Value *> VL);
> @@ -466,8 +464,8 @@ private:
>                                        SmallVectorImpl<Value *> &Left,
>                                        SmallVectorImpl<Value *> &Right);
>    struct TreeEntry {
> -    TreeEntry() : Scalars(), VectorizedValue(nullptr),
> -    NeedToGather(0), NeedToShuffle(0) {}
> +    TreeEntry()
> +        : Scalars(), VectorizedValue(nullptr), NeedToGather(0),
> ShuffleMask() {}
>
>      /// \returns true if the scalars in VL are equal to this entry.
>      bool isSame(ArrayRef<Value *> VL) const {
> @@ -495,19 +493,23 @@ private:
>      /// Do we need to gather this sequence ?
>      bool NeedToGather;
>
> -    /// Do we need to shuffle the load ?
> -    bool NeedToShuffle;
> +    /// Records optional suffle mask for jumbled memory accesses in this.
> +    SmallVector<unsigned, 8> ShuffleMask;
> +
>    };
>
>    /// Create a new VectorizableTree entry.
>    TreeEntry *newTreeEntry(ArrayRef<Value *> VL, bool Vectorized,
> -                          bool NeedToShuffle) {
> +                          ArrayRef<unsigned> ShuffleMask = None) {
>      VectorizableTree.emplace_back();
>      int idx = VectorizableTree.size() - 1;
>      TreeEntry *Last = &VectorizableTree[idx];
>      Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
>      Last->NeedToGather = !Vectorized;
> -    Last->NeedToShuffle = NeedToShuffle;
> +    if (!ShuffleMask.empty())
> +      Last->ShuffleMask.insert(Last->ShuffleMask.begin(),
> ShuffleMask.begin(),
> +                               ShuffleMask.end());
> +
>      if (Vectorized) {
>        for (int i = 0, e = VL.size(); i != e; ++i) {
>          assert(!ScalarToTreeEntry.count(VL[i]) && "Scalar already in
> tree!");
> @@ -1030,21 +1032,21 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>
>    if (Depth == RecursionMaxDepth) {
>      DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
> -    newTreeEntry(VL, false, false);
> +    newTreeEntry(VL, false);
>      return;
>    }
>
>    // Don't handle vectors.
>    if (VL[0]->getType()->isVectorTy()) {
>      DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");
> -    newTreeEntry(VL, false, false);
> +    newTreeEntry(VL, false);
>      return;
>    }
>
>    if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
>      if (SI->getValueOperand()->getType()->isVectorTy()) {
>        DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>        return;
>      }
>    unsigned Opcode = getSameOpcode(VL);
> @@ -1061,7 +1063,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>    // If all of the operands are identical or constant we have a simple
> solution.
>    if (allConstant(VL) || isSplat(VL) || !allSameBlock(VL) || !Opcode) {
>      DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
> -    newTreeEntry(VL, false, false);
> +    newTreeEntry(VL, false);
>      return;
>    }
>
> @@ -1073,7 +1075,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      if (EphValues.count(VL[i])) {
>        DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
>              ") is ephemeral.\n");
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>        return;
>      }
>    }
> @@ -1086,7 +1088,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");
>        if (E->Scalars[i] != VL[i]) {
>          DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
> -        newTreeEntry(VL, false, false);
> +        newTreeEntry(VL, false);
>          return;
>        }
>      }
> @@ -1099,7 +1101,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      if (ScalarToTreeEntry.count(VL[i])) {
>        DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
>              ") is already in tree.\n");
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>        return;
>      }
>    }
> @@ -1109,7 +1111,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>    for (unsigned i = 0, e = VL.size(); i != e; ++i) {
>      if (MustGather.count(VL[i])) {
>        DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>        return;
>      }
>    }
> @@ -1123,7 +1125,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      // Don't go into unreachable blocks. They may contain instructions
> with
>      // dependency cycles which confuse the final scheduling.
>      DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");
> -    newTreeEntry(VL, false, false);
> +    newTreeEntry(VL, false);
>      return;
>    }
>
> @@ -1132,7 +1134,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      for (unsigned j = i+1; j < e; ++j)
>        if (VL[i] == VL[j]) {
>          DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");
> -        newTreeEntry(VL, false, false);
> +        newTreeEntry(VL, false);
>          return;
>        }
>
> @@ -1147,7 +1149,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      assert((!BS.getScheduleData(VL[0]) ||
>              !BS.getScheduleData(VL[0])->isPartOfBundle()) &&
>             "tryScheduleBundle should cancelScheduling on failure");
> -    newTreeEntry(VL, false, false);
> +    newTreeEntry(VL, false);
>      return;
>    }
>    DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");
> @@ -1164,12 +1166,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>            if (Term) {
>              DEBUG(dbgs() << "SLP: Need to swizzle PHINodes
> (TerminatorInst use).\n");
>              BS.cancelScheduling(VL);
> -            newTreeEntry(VL, false, false);
> +            newTreeEntry(VL, false);
>              return;
>            }
>          }
>
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");
>
>        for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
> @@ -1191,7 +1193,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        } else {
>          BS.cancelScheduling(VL);
>        }
> -      newTreeEntry(VL, Reuse, false);
> +      newTreeEntry(VL, Reuse);
>        return;
>      }
>      case Instruction::Load: {
> @@ -1207,7 +1209,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        if (DL->getTypeSizeInBits(ScalarTy) !=
>            DL->getTypeAllocSizeInBits(ScalarTy)) {
>          BS.cancelScheduling(VL);
> -        newTreeEntry(VL, false, false);
> +        newTreeEntry(VL, false);
>          DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
>          return;
>        }
> @@ -1218,7 +1220,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>          LoadInst *L = cast<LoadInst>(VL[i]);
>          if (!L->isSimple()) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
>            return;
>          }
> @@ -1238,7 +1240,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>
>        if (Consecutive) {
>          ++NumLoadsWantToKeepOrder;
> -        newTreeEntry(VL, true, false);
> +        newTreeEntry(VL, true);
>          DEBUG(dbgs() << "SLP: added a vector of loads.\n");
>          return;
>        }
> @@ -1255,7 +1257,8 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        if (VL.size() > 2 && !ReverseConsecutive) {
>          bool ShuffledLoads = true;
>          SmallVector<Value *, 8> Sorted;
> -        if (sortMemAccesses(VL, *DL, *SE, Sorted)) {
> +        SmallVector<unsigned, 4> Mask;
> +        if (sortMemAccesses(VL, *DL, *SE, Sorted, &Mask)) {
>            auto NewVL = makeArrayRef(Sorted.begin(), Sorted.end());
>            for (unsigned i = 0, e = NewVL.size() - 1; i < e; ++i) {
>              if (!isConsecutiveAccess(NewVL[i], NewVL[i + 1], *DL, *SE)) {
> @@ -1264,14 +1267,14 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>              }
>            }
>            if (ShuffledLoads) {
> -            newTreeEntry(NewVL, true, true);
> +            newTreeEntry(NewVL, true, makeArrayRef(Mask.begin(),
> Mask.end()));
>              return;
>            }
>          }
>        }
>
>        BS.cancelScheduling(VL);
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>
>        if (ReverseConsecutive) {
>          ++NumLoadsWantToChangeOrder;
> @@ -1298,12 +1301,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>          Type *Ty = cast<Instruction>(Val)->getOperand(0)->getType();
>          if (Ty != SrcTy || !isValidElementType(Ty)) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: Gathering casts with different src
> types.\n");
>            return;
>          }
>        }
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of casts.\n");
>
>        for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
> @@ -1326,13 +1329,13 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>          if (Cmp->getPredicate() != P0 ||
>              Cmp->getOperand(0)->getType() != ComparedTy) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: Gathering cmp with different
> predicate.\n");
>            return;
>          }
>        }
>
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of compares.\n");
>
>        for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
> @@ -1364,7 +1367,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      case Instruction::And:
>      case Instruction::Or:
>      case Instruction::Xor: {
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of bin op.\n");
>
>        // Sort operands of the instructions so that each side is more
> likely to
> @@ -1393,7 +1396,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>          if (cast<Instruction>(Val)->getNumOperands() != 2) {
>            DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested
> indexes).\n");
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            return;
>          }
>        }
> @@ -1406,7 +1409,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>          if (Ty0 != CurTy) {
>            DEBUG(dbgs() << "SLP: not-vectorizable GEP (different
> types).\n");
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            return;
>          }
>        }
> @@ -1418,12 +1421,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>            DEBUG(
>                dbgs() << "SLP: not-vectorizable GEP (non-constant
> indexes).\n");
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            return;
>          }
>        }
>
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
>        for (unsigned i = 0, e = 2; i < e; ++i) {
>          ValueList Operands;
> @@ -1440,12 +1443,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)
>          if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
>            return;
>          }
>
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a vector of stores.\n");
>
>        ValueList Operands;
> @@ -1463,7 +1466,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
>        if (!isTriviallyVectorizable(ID)) {
>          BS.cancelScheduling(VL);
> -        newTreeEntry(VL, false, false);
> +        newTreeEntry(VL, false);
>          DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
>          return;
>        }
> @@ -1477,7 +1480,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>              getVectorIntrinsicIDForCall(CI2, TLI) != ID ||
>              !CI->hasIdenticalOperandBundleSchema(*CI2)) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: mismatched calls:" << *CI << "!=" <<
> *VL[i]
>                         << "\n");
>            return;
> @@ -1488,7 +1491,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>            Value *A1J = CI2->getArgOperand(1);
>            if (A1I != A1J) {
>              BS.cancelScheduling(VL);
> -            newTreeEntry(VL, false, false);
> +            newTreeEntry(VL, false);
>              DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI
>                           << " argument "<< A1I<<"!=" << A1J
>                           << "\n");
> @@ -1501,14 +1504,14 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>                          CI->op_begin() + CI->getBundleOperandsEndIndex(),
>                          CI2->op_begin() + CI2->
> getBundleOperandsStartIndex())) {
>            BS.cancelScheduling(VL);
> -          newTreeEntry(VL, false, false);
> +          newTreeEntry(VL, false);
>            DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" <<
> *CI << "!="
>                         << *VL[i] << '\n');
>            return;
>          }
>        }
>
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
>          ValueList Operands;
>          // Prepare the operand vector.
> @@ -1525,11 +1528,11 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>        // then do not vectorize this instruction.
>        if (!isAltShuffle) {
>          BS.cancelScheduling(VL);
> -        newTreeEntry(VL, false, false);
> +        newTreeEntry(VL, false);
>          DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
>          return;
>        }
> -      newTreeEntry(VL, true, false);
> +      newTreeEntry(VL, true);
>        DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");
>
>        // Reorder operands if reordering would enable vectorization.
> @@ -1553,7 +1556,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>      }
>      default:
>        BS.cancelScheduling(VL);
> -      newTreeEntry(VL, false, false);
> +      newTreeEntry(VL, false);
>        DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");
>        return;
>    }
> @@ -1792,7 +1795,7 @@ int BoUpSLP::getEntryCost(TreeEntry *E)
>              TTI->getMemoryOpCost(Instruction::Load, ScalarTy, alignment,
> 0);
>        int VecLdCost = TTI->getMemoryOpCost(Instruction::Load,
>                                             VecTy, alignment, 0);
> -      if (E->NeedToShuffle) {
> +      if (!E->ShuffleMask.empty()) {
>          VecLdCost += TTI->getShuffleCost(
>              TargetTransformInfo::SK_PermuteSingleSrc, VecTy, 0);
>        }
> @@ -2358,8 +2361,9 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<V
>    if (ScalarToTreeEntry.count(VL[0])) {
>      int Idx = ScalarToTreeEntry[VL[0]];
>      TreeEntry *E = &VectorizableTree[Idx];
> -    if (E->isSame(VL) || (E->NeedToShuffle && E->isFoundJumbled(VL, *DL,
> *SE)))
> -      return vectorizeTree(VL, E);
> +    if (E->isSame(VL) ||
> +        (!E->ShuffleMask.empty() && E->isFoundJumbled(VL, *DL, *SE)))
> +      return vectorizeTree(E);
>    }
>
>    Type *ScalarTy = VL[0]->getType();
> @@ -2370,10 +2374,10 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<V
>    return Gather(VL, VecTy);
>  }
>
> -Value *BoUpSLP::vectorizeTree(ArrayRef<Value *> VL, TreeEntry *E) {
> +Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
>    IRBuilder<>::InsertPointGuard Guard(Builder);
>
> -  if (E->VectorizedValue && !E->NeedToShuffle) {
> +  if (E->VectorizedValue && E->ShuffleMask.empty()) {
>      DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] <<
> ".\n");
>      return E->VectorizedValue;
>    }
> @@ -2611,27 +2615,18 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<V
>
>        // As program order of scalar loads are jumbled, the vectorized
> 'load'
>        // must be followed by a 'shuffle' with the required jumbled mask.
> -      if (!VL.empty() && (E->NeedToShuffle)) {
> -        assert(VL.size() == E->Scalars.size() &&
> -               "Equal number of scalars expected");
> +      if (!E->ShuffleMask.empty()) {
>          SmallVector<Constant *, 8> Mask;
> -        for (Value *Val : VL) {
> -          if (ScalarToTreeEntry.count(Val)) {
> -            int Idx = ScalarToTreeEntry[Val];
> -            TreeEntry *E = &VectorizableTree[Idx];
> -            for (unsigned Lane = 0, LE = VL.size(); Lane != LE; ++Lane) {
> -              if (E->Scalars[Lane] == Val) {
> -                Mask.push_back(Builder.getInt32(Lane));
> -                break;
> -              }
> -            }
> -          }
> +        for (unsigned Lane = 0, LE = E->ShuffleMask.size(); Lane != LE;
> +             ++Lane) {
> +          Mask.push_back(Builder.getInt32(E->ShuffleMask[Lane]));
>          }
> -
>          // Generate shuffle for jumbled memory access
>          Value *Undef = UndefValue::get(VecTy);
>          Value *Shuf = Builder.CreateShuffleVector((Value *)LI, Undef,
>
>  ConstantVector::get(Mask));
> +        E->VectorizedValue = Shuf;
> +        ++NumVectorInstructions;
>          return Shuf;
>        }
>
> @@ -2816,7 +2811,7 @@ BoUpSLP::vectorizeTree(ExtraValueToDebug
>    }
>
>    Builder.SetInsertPoint(&F->getEntryBlock().front());
> -  auto *VectorRoot = vectorizeTree(ArrayRef<Value *>(),
> &VectorizableTree[0]);
> +  auto *VectorRoot = vectorizeTree(&VectorizableTree[0]);
>
>    // If the vectorized tree can be rewritten in a smaller type, we
> truncate the
>    // vectorized root. InstCombine will then rewrite the entire
> expression. We
> @@ -2861,8 +2856,20 @@ BoUpSLP::vectorizeTree(ExtraValueToDebug
>
>      Value *Vec = E->VectorizedValue;
>      assert(Vec && "Can't find vectorizable value");
> -
> -    Value *Lane = Builder.getInt32(ExternalUse.Lane);
> +    unsigned i = 0;
> +    Value *Lane;
> +    // In case vectorizable scalars use are not in-order, scalars would
> have
> +    // been shuffled.Recompute the proper Lane of ExternalUse.
> +    if (!E->ShuffleMask.empty()) {
> +      SmallVector<unsigned, 4> Val(E->ShuffleMask.size());
> +      for (; i < E->ShuffleMask.size(); i++) {
> +        if (E->ShuffleMask[i] == (unsigned)ExternalUse.Lane)
> +          break;
> +      }
> +      Lane = Builder.getInt32(i);
> +    } else {
> +      Lane = Builder.getInt32(ExternalUse.Lane);
> +    }
>      // If User == nullptr, the Scalar is used as extra arg. Generate
>      // ExtractElement instruction and update the record for this scalar in
>      // ExternallyUsedValues.
>
> Added: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-bug.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/SLPVectorizer/X86/jumbled-load-bug.ll?rev=296863&view=auto
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-bug.ll
> (added)
> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-bug.ll Fri
> Mar  3 04:02:47 2017
> @@ -0,0 +1,43 @@
> +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
> +; RUN: opt < %s -S -slp-vectorizer | FileCheck %s
> +
> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> +define <4 x i32> @zot() #0 {
> +; CHECK-LABEL: @zot(
> +; CHECK-NEXT:  bb:
> +; CHECK-NEXT:    [[P0:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]*
> undef, i64 undef, i64 0
> +; CHECK-NEXT:    [[P1:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]*
> undef, i64 undef, i64 1
> +; CHECK-NEXT:    [[P2:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]*
> undef, i64 undef, i64 2
> +; CHECK-NEXT:    [[P3:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]*
> undef, i64 undef, i64 3
> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[P0]] to <4 x i8>*
> +; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i8>, <4 x i8>* [[TMP0]], align 1
> +; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8>
> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
> +; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i8> [[TMP2]], i32 0
> +; CHECK-NEXT:    [[I0:%.*]] = insertelement <4 x i8> undef, i8 [[TMP3]],
> i32 0
> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1
> +; CHECK-NEXT:    [[I1:%.*]] = insertelement <4 x i8> [[I0]], i8 [[TMP4]],
> i32 1
> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2
> +; CHECK-NEXT:    [[I2:%.*]] = insertelement <4 x i8> [[I1]], i8 [[TMP5]],
> i32 2
> +; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 3
> +; CHECK-NEXT:    [[I3:%.*]] = insertelement <4 x i8> [[I2]], i8 [[TMP6]],
> i32 3
> +; CHECK-NEXT:    [[RET:%.*]] = zext <4 x i8> [[I3]] to <4 x i32>
> +; CHECK-NEXT:    ret <4 x i32> [[RET]]
> +;
> +bb:
> +  %p0 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 0
> +  %p1 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 1
> +  %p2 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 2
> +  %p3 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 3
> +  %v3 = load i8, i8* %p3, align 1
> +  %v2 = load i8, i8* %p2, align 1
> +  %v0 = load i8, i8* %p0, align 1
> +  %v1 = load i8, i8* %p1, align 1
> +  %i0 = insertelement <4 x i8> undef, i8 %v1, i32 0
> +  %i1 = insertelement <4 x i8> %i0, i8 %v0, i32 1
> +  %i2 = insertelement <4 x i8> %i1, i8 %v2, i32 2
> +  %i3 = insertelement <4 x i8> %i2, i8 %v3, i32 3
> +  %ret = zext <4 x i8> %i3 to <4 x i32>
> +  ret <4 x i32> %ret
> +}
>
> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-same.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/SLPVectorizer/X86/jumbled-same.ll?rev=296863&r1=
> 296862&r2=296863&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-same.ll
> (original)
> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-same.ll Fri Mar
> 3 04:02:47 2017
> @@ -13,7 +13,7 @@ define i32 @fn1() {
>  ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* bitcast ([4 x
> i32]* @b to <4 x i32>*), align 4
>  ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x
> i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
>  ; CHECK-NEXT:    [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]],
> zeroinitializer
> -; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
> +; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0
>  ; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32
> [[TMP3]], i32 0
>  ; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32
> ptrtoint (i32 ()* @fn1 to i32), i32 1
>  ; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32
> ptrtoint (i32 ()* @fn1 to i32), i32 2
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170310/9a01a2f3/attachment.html>