[llvm] 99203f2 - [Analysis]Add getPointersDiff function to improve compile time.

Alexey Bataev via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 25 17:29:58 PDT 2021


Yes, it should be. There was just one problem with pointers in load instructions (I just forgot about it, because I saw exactly the same problem when I initially prepared this fix several months ago as part of another big patch). In other places we already checked that the pointers are strictly aligned and we can use relaxed version of the call.

Best regards,
Alexey Bataev

25 марта 2021 г., в 19:55, Richard Smith <richard at metafoo.co.uk> написал(а):


Landed in 040c60d9b69e2ad570556f255a746929a4b10e82.

I didn't look at the other callers of getPointersDiff to see if they are OK passing StrictCheck == false. Is it OK for the returned value to be rounded down to a multiple of the type size in the other cases?

On Thu, 25 Mar 2021 at 16:44, Alexey Bataev <a.bataev at outlook.com<mailto:a.bataev at outlook.com>> wrote:
Sure!

Best regards,
Alexey Bataev

25 марта 2021 г., в 19:41, Richard Smith <richard at metafoo.co.uk<mailto:richard at metafoo.co.uk>> написал(а):


Sure, I'm currently running the tests on that change and will land that once they pass. However, I've not figured out how to observe the bug with an LLVM test yet (the testcase in question was extracted from the middle of a JIT process). Do you mind if I land this without a test and leave the test coverage with you? This is currently blocking us, so I'd like to get the bug resolved as soon as possible.

On Thu, 25 Mar 2021 at 16:34, Alexey Bataev <a.bataev at outlook.com<mailto:a.bataev at outlook.com>> wrote:
Hi Richard, yes, that’s what I thought. Can you commit this fix?

Best regards,
Alexey Bataev

25 марта 2021 г., в 19:29, Richard Smith <richard at metafoo.co.uk<mailto:richard at metafoo.co.uk>> написал(а):


On Thu, 25 Mar 2021 at 16:24, Richard Smith <richard at metafoo.co.uk<mailto:richard at metafoo.co.uk>> wrote:
Hi Alexey, this patch has introduced a miscompile for us.

On Tue, 23 Mar 2021 at 14:26, Alexey Bataev via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:

Author: Alexey Bataev
Date: 2021-03-23T14:25:36-07:00
New Revision: 99203f2004d031f2ef22f01e3c569d2775de1836

URL: https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836
DIFF: https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836.diff

LOG: [Analysis]Add getPointersDiff function to improve compile time.

Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967

Added:


Modified:
    llvm/include/llvm/Analysis/LoopAccessAnalysis.h
    llvm/lib/Analysis/LoopAccessAnalysis.cpp
    llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

Removed:



################################################################################
diff  --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index 13fbe884eddf..39acfd5bbbee 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -679,6 +679,15 @@ int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, const Loop *Lp,
                      const ValueToValueMap &StridesMap = ValueToValueMap(),
                      bool Assume = false, bool ShouldCheckWrap = true);

+/// Returns the distance between the pointers \p PtrA and \p PtrB iff they are
+/// compatible and it is possible to calculate the distance between them. This
+/// is a simple API that does not depend on the analysis pass.
+/// \param StrictCheck Ensure that the calculated distance matches the
+/// type-based one after all the bitcasts removal in the provided pointers.
+Optional<int> getPointersDiff(Value *PtrA, Value *PtrB, const DataLayout &DL,
+                              ScalarEvolution &SE, bool StrictCheck = false,
+                              bool CheckType = true);
+
 /// Attempt to sort the pointers in \p VL and return the sorted indices
 /// in \p SortedIndices, if reordering is required.
 ///

diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index e632fe25c24c..997d4474a448 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1124,139 +1124,123 @@ int64_t llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr,
   return Stride;
 }

+Optional<int> llvm::getPointersDiff(Value *PtrA, Value *PtrB,
+                                    const DataLayout &DL, ScalarEvolution &SE,
+                                    bool StrictCheck, bool CheckType) {
+  assert(PtrA && PtrB && "Expected non-nullptr pointers.");
+  // Make sure that A and B are
diff erent pointers.
+  if (PtrA == PtrB)
+    return 0;
+
+  // Make sure that PtrA and PtrB have the same type if required
+  if (CheckType && PtrA->getType() != PtrB->getType())
+    return None;
+
+  unsigned ASA = PtrA->getType()->getPointerAddressSpace();
+  unsigned ASB = PtrB->getType()->getPointerAddressSpace();
+
+  // Check that the address spaces match.
+  if (ASA != ASB)
+    return None;
+  unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
+
+  APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
+  Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);
+  Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);
+
+  int Val;
+  if (PtrA1 == PtrB1) {
+    // Retrieve the address space again as pointer stripping now tracks through
+    // `addrspacecast`.
+    ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace();
+    ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace();
+    // Check that the address spaces match and that the pointers are valid.
+    if (ASA != ASB)
+      return None;
+
+    IdxWidth = DL.getIndexSizeInBits(ASA);
+    OffsetA = OffsetA.sextOrTrunc(IdxWidth);
+    OffsetB = OffsetB.sextOrTrunc(IdxWidth);
+
+    OffsetB -= OffsetA;
+    Val = OffsetB.getSExtValue();
+  } else {
+    // Otherwise compute the distance with SCEV between the base pointers.
+    const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
+    const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
+    const auto *Diff =
+        dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA));
+    if (!Diff)
+      return None;
+    Val = Diff->getAPInt().getSExtValue();
+  }
+  Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
+  int Size = DL.getTypeStoreSize(Ty);
+  int Dist = Val / Size;
+
+  // Ensure that the calculated distance matches the type-based one after all
+  // the bitcasts removal in the provided pointers.
+  if (!StrictCheck || Dist * Size == Val)
+    return Dist;

Specifically, we get here, with Val == 12 and Size == 8, resulting in an incorrect computation of Dist, for the following testcase:

; Function Attrs: nofree norecurse nounwind uwtable
define void @Int32R2.5(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
entry:
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %1 = bitcast i8** %0 to i32**
  %2 = load i32*, i32** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !1
  %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %4 = bitcast i8** %3 to i32**
  %5 = load i32*, i32** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !1
  %6 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %7 = bitcast i8** %6 to [2 x [2 x i32]]**
  %8 = load [2 x [2 x i32]]*, [2 x [2 x i32]]** %7, align 8, !invariant.load !0, !dereferenceable !2, !align !2
  %9 = load i32, i32* %2, align 4, !invariant.load !0, !noalias !3
  %10 = icmp sgt i32 %9, 0
  %11 = load i32, i32* %5, align 4, !invariant.load !0, !noalias !3
  %12 = icmp sgt i32 %11, 0
  %13 = select i1 %10, i64 12, i64 0
  %14 = select i1 %12, i64 4, i64 0
  %15 = add nuw nsw i64 %13, %14
  %scevgep6 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %15
  %16 = bitcast i8* %scevgep6 to i64*
  %17 = bitcast [2 x [2 x i32]]* %8 to i64*
  %18 = load i64, i64* %16, align 4
  store i64 %18, i64* %17, align 16
  %scevgep.1 = getelementptr [2 x [2 x i32]], [2 x [2 x i32]]* %8, i64 0, i64 1, i64 0
  %19 = add nuw nsw i64 %15, 12
  %scevgep6.1 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %19
  %20 = bitcast i8* %scevgep6.1 to i64*
  %21 = bitcast i32* %scevgep.1 to i64*
  %22 = load i64, i64* %20, align 4
  store i64 %22, i64* %21, align 8
  ret void
}

... when computing the difference between %16 and %20 in the call from LoopAccessAnalysis.cpp:1207.

Perhaps StrictCheck should always be enabled, given that this function truncates its result and produces a wrong answer if not?

+  return None;
+}
+
 bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
                            ScalarEvolution &SE,
                            SmallVectorImpl<unsigned> &SortedIndices) {
   assert(llvm::all_of(
              VL, [](const Value *V) { return V->getType()->isPointerTy(); }) &&
          "Expected list of pointer operands.");
-  SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
-  OffValPairs.reserve(VL.size());
-
   // Walk over the pointers, and map each of them to an offset relative to
   // first pointer in the array.
   Value *Ptr0 = VL[0];
-  const SCEV *Scev0 = SE.getSCEV(Ptr0);
-  Value *Obj0 = getUnderlyingObject(Ptr0);
-
-  llvm::SmallSet<int64_t, 4> Offsets;
-  for (auto *Ptr : VL) {
-    // TODO: Outline this code as a special, more time consuming, version of
-    // computeConstantDifference() function.
-    if (Ptr->getType()->getPointerAddressSpace() !=
-        Ptr0->getType()->getPointerAddressSpace())
-      return false;
-    // If a pointer refers to a
diff erent underlying object, bail - the
-    // pointers are by definition incomparable.
-    Value *CurrObj = getUnderlyingObject(Ptr);
-    if (CurrObj != Obj0)
-      return false;

-    const SCEV *Scev = SE.getSCEV(Ptr);
-    const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev, Scev0));
-    // The pointers may not have a constant offset from each other, or SCEV
-    // may just not be smart enough to figure out they do. Regardless,
-    // there's nothing we can do.
+  using DistOrdPair = std::pair<int64_t, int>;
+  auto Compare = [](const DistOrdPair &L, const DistOrdPair &R) {
+    return L.first < R.first;
+  };
+  std::set<DistOrdPair, decltype(Compare)> Offsets(Compare);
+  Offsets.emplace(0, 0);
+  int Cnt = 1;
+  bool IsConsecutive = true;
+  for (auto *Ptr : VL.drop_front()) {
+    Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE);

Passing StrictCheck = true here fixes the miscompile.

     if (!Diff)
       return false;

     // Check if the pointer with the same offset is found.
-    int64_t Offset = Diff->getAPInt().getSExtValue();
-    if (!Offsets.insert(Offset).second)
+    int64_t Offset = *Diff;
+    auto Res = Offsets.emplace(Offset, Cnt);
+    if (!Res.second)
       return false;
-    OffValPairs.emplace_back(Offset, Ptr);
+    // Consecutive order if the inserted element is the last one.
+    IsConsecutive = IsConsecutive && std::next(Res.first) == Offsets.end();
+    ++Cnt;
   }
   SortedIndices.clear();
-  SortedIndices.resize(VL.size());
-  std::iota(SortedIndices.begin(), SortedIndices.end(), 0);
-
-  // Sort the memory accesses and keep the order of their uses in UseOrder.
-  llvm::stable_sort(SortedIndices, [&](unsigned Left, unsigned Right) {
-    return OffValPairs[Left].first < OffValPairs[Right].first;
-  });
-
-  // Check if the order is consecutive already.
-  if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) {
-        return I == SortedIndices[I];
-      }))
-    SortedIndices.clear();
-
+  if (!IsConsecutive) {
+    // Fill SortedIndices array only if it is non-consecutive.
+    SortedIndices.resize(VL.size());
+    Cnt = 0;
+    for (const std::pair<int64_t, int> &Pair : Offsets) {
+      IsConsecutive = IsConsecutive && Cnt == Pair.second;
+      SortedIndices[Cnt] = Pair.second;
+      ++Cnt;
+    }
+  }
   return true;
 }

-/// Take the address space operand from the Load/Store instruction.
-/// Returns -1 if this is not a valid Load/Store instruction.
-static unsigned getAddressSpaceOperand(Value *I) {
-  if (LoadInst *L = dyn_cast<LoadInst>(I))
-    return L->getPointerAddressSpace();
-  if (StoreInst *S = dyn_cast<StoreInst>(I))
-    return S->getPointerAddressSpace();
-  return -1;
-}
-
 /// Returns true if the memory operations \p A and \p B are consecutive.
 bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
                                ScalarEvolution &SE, bool CheckType) {
   Value *PtrA = getLoadStorePointerOperand(A);
   Value *PtrB = getLoadStorePointerOperand(B);
-  unsigned ASA = getAddressSpaceOperand(A);
-  unsigned ASB = getAddressSpaceOperand(B);
-
-  // Check that the address spaces match and that the pointers are valid.
-  if (!PtrA || !PtrB || (ASA != ASB))
-    return false;
-
-  // Make sure that A and B are
diff erent pointers.
-  if (PtrA == PtrB)
-    return false;
-
-  // Make sure that A and B have the same type if required.
-  if (CheckType && PtrA->getType() != PtrB->getType())
+  if (!PtrA || !PtrB)
     return false;
-
-  unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
-  Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
-
-  APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
-  PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);
-  PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);
-
-  // Retrieve the address space again as pointer stripping now tracks through
-  // `addrspacecast`.
-  ASA = cast<PointerType>(PtrA->getType())->getAddressSpace();
-  ASB = cast<PointerType>(PtrB->getType())->getAddressSpace();
-  // Check that the address spaces match and that the pointers are valid.
-  if (ASA != ASB)
-    return false;
-
-  IdxWidth = DL.getIndexSizeInBits(ASA);
-  OffsetA = OffsetA.sextOrTrunc(IdxWidth);
-  OffsetB = OffsetB.sextOrTrunc(IdxWidth);
-
-  APInt Size(IdxWidth, DL.getTypeStoreSize(Ty));
-
-  //  OffsetDelta = OffsetB - OffsetA;
-  const SCEV *OffsetSCEVA = SE.getConstant(OffsetA);
-  const SCEV *OffsetSCEVB = SE.getConstant(OffsetB);
-  const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB, OffsetSCEVA);
-  const APInt &OffsetDelta = cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt();
-
-  // Check if they are based on the same pointer. That makes the offsets
-  // sufficient.
-  if (PtrA == PtrB)
-    return OffsetDelta == Size;
-
-  // Compute the necessary base pointer delta to have the necessary final delta
-  // equal to the size.
-  // BaseDelta = Size - OffsetDelta;
-  const SCEV *SizeSCEV = SE.getConstant(Size);
-  const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV);
-
-  // Otherwise compute the distance with SCEV between the base pointers.
-  const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
-  const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
-  const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta);
-  return X == PtrSCEVB;
+  Optional<int> Diff =
+      getPointersDiff(PtrA, PtrB, DL, SE, /*StrictCheck=*/true, CheckType);
+  return Diff && *Diff == 1;
 }

 MemoryDepChecker::VectorizationSafetyStatus

diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 385b6f30dc0f..78d2ea0032db 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -941,10 +941,16 @@ class BoUpSLP {
                                ScalarEvolution &SE) {
       auto *LI1 = dyn_cast<LoadInst>(V1);
       auto *LI2 = dyn_cast<LoadInst>(V2);
-      if (LI1 && LI2)
-        return isConsecutiveAccess(LI1, LI2, DL, SE)
-                   ? VLOperands::ScoreConsecutiveLoads
-                   : VLOperands::ScoreFail;
+      if (LI1 && LI2) {
+        if (LI1->getParent() != LI2->getParent())
+          return VLOperands::ScoreFail;
+
+        Optional<int> Dist =
+            getPointersDiff(LI1->getPointerOperand(), LI2->getPointerOperand(),
+                            DL, SE, /*StrictCheck=*/true);
+        return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads
+                                    : VLOperands::ScoreFail;
+      }

       auto *C1 = dyn_cast<Constant>(V1);
       auto *C2 = dyn_cast<Constant>(V2);
@@ -2871,13 +2877,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
           Ptr0 = PointerOps[CurrentOrder.front()];
           PtrN = PointerOps[CurrentOrder.back()];
         }
-        const SCEV *Scev0 = SE->getSCEV(Ptr0);
-        const SCEV *ScevN = SE->getSCEV(PtrN);
-        const auto *Diff =
-            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
-        uint64_t Size = DL->getTypeAllocSize(ScalarTy);
+        Optional<int> Diff = getPointersDiff(Ptr0, PtrN, *DL, *SE);
         // Check that the sorted loads are consecutive.
-        if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
+        if (static_cast<unsigned>(*Diff) == VL.size() - 1) {
           if (CurrentOrder.empty()) {
             // Original loads are consecutive and does not require reordering.
             ++NumOpsWantToKeepOriginalOrder;
@@ -3150,13 +3152,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
           Ptr0 = PointerOps[CurrentOrder.front()];
           PtrN = PointerOps[CurrentOrder.back()];
         }
-        const SCEV *Scev0 = SE->getSCEV(Ptr0);
-        const SCEV *ScevN = SE->getSCEV(PtrN);
-        const auto *Diff =
-            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
-        uint64_t Size = DL->getTypeAllocSize(ScalarTy);
+        Optional<int> Dist = getPointersDiff(Ptr0, PtrN, *DL, *SE);
         // Check that the sorted pointer operands are consecutive.
-        if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
+        if (static_cast<unsigned>(*Dist) == VL.size() - 1) {
           if (CurrentOrder.empty()) {
             // Original stores are consecutive and does not require reordering.
             ++NumOpsWantToKeepOriginalOrder;
@@ -6107,20 +6105,41 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,

   int E = Stores.size();
   SmallBitVector Tails(E, false);
-  SmallVector<int, 16> ConsecutiveChain(E, E + 1);
   int MaxIter = MaxStoreLookup.getValue();
+  SmallVector<std::pair<int, int>, 16> ConsecutiveChain(
+      E, std::make_pair(E, INT_MAX));
+  SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E, false));
   int IterCnt;
   auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt, MaxIter,
+                                  &CheckedPairs,
                                   &ConsecutiveChain](int K, int Idx) {
     if (IterCnt >= MaxIter)
       return true;
+    if (CheckedPairs[Idx].test(K))
+      return ConsecutiveChain[K].second == 1 &&
+             ConsecutiveChain[K].first == Idx;
     ++IterCnt;
-    if (!isConsecutiveAccess(Stores[K], Stores[Idx], *DL, *SE))
+    CheckedPairs[Idx].set(K);
+    CheckedPairs[K].set(Idx);
+    Optional<int> Diff = getPointersDiff(Stores[K]->getPointerOperand(),
+                                         Stores[Idx]->getPointerOperand(), *DL,
+                                         *SE, /*StrictCheck=*/true);
+    if (!Diff || *Diff == 0)
+      return false;
+    int Val = *Diff;
+    if (Val < 0) {
+      if (ConsecutiveChain[Idx].second > -Val) {
+        Tails.set(K);
+        ConsecutiveChain[Idx] = std::make_pair(K, -Val);
+      }
+      return false;
+    }
+    if (ConsecutiveChain[K].second <= Val)
       return false;

     Tails.set(Idx);
-    ConsecutiveChain[K] = Idx;
-    return true;
+    ConsecutiveChain[K] = std::make_pair(Idx, Val);
+    return Val == 1;
   };
   // Do a quadratic search on all of the given stores in reverse order and find
   // all of the pairs of stores that follow each other.
@@ -6140,17 +6159,31 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
   // For stores that start but don't end a link in the chain:
   for (int Cnt = E; Cnt > 0; --Cnt) {
     int I = Cnt - 1;
-    if (ConsecutiveChain[I] == E + 1 || Tails.test(I))
+    if (ConsecutiveChain[I].first == E || Tails.test(I))
       continue;
     // We found a store instr that starts a chain. Now follow the chain and try
     // to vectorize it.
     BoUpSLP::ValueList Operands;
     // Collect the chain into a list.
-    while (I != E + 1 && !VectorizedStores.count(Stores[I])) {
+    while (I != E && !VectorizedStores.count(Stores[I])) {
       Operands.push_back(Stores[I]);
+      Tails.set(I);
+      if (ConsecutiveChain[I].second != 1) {
+        // Mark the new end in the chain and go back, if required. It might be
+        // required if the original stores come in reversed order, for example.
+        if (ConsecutiveChain[I].first != E &&
+            Tails.test(ConsecutiveChain[I].first) &&
+            !VectorizedStores.count(Stores[ConsecutiveChain[I].first])) {
+          Tails.reset(ConsecutiveChain[I].first);
+          if (Cnt < ConsecutiveChain[I].first + 2)
+            Cnt = ConsecutiveChain[I].first + 2;
+        }
+        break;
+      }
       // Move to the next value in the chain.
-      I = ConsecutiveChain[I];
+      I = ConsecutiveChain[I].first;
     }
+    assert(!Operands.empty() && "Expected non-empty list of stores.");

     unsigned MaxVecRegSize = R.getMaxVecRegSize();
     unsigned EltSize = R.getVectorElementSize(Operands[0]);

diff  --git a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
index 267cf1a02c29..e28362894910 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
-; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx  -S | FileCheck %s --check-prefix=AVX
-; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX
+; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
+; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx  -S | FileCheck %s --check-prefix=AVX
+; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX

 %class.1 = type { %class.2 }
 %class.2 = type { %"class.3" }
@@ -117,13 +117,10 @@ define void @pr35497() local_unnamed_addr #0 {
 ; AVX-NEXT:    [[ARRAYIDX2_6:%.*]] = getelementptr inbounds [0 x i64], [0 x i64]* undef, i64 0, i64 0
 ; AVX-NEXT:    [[TMP10:%.*]] = bitcast i64* [[ARRAYIDX2_6]] to <2 x i64>*
 ; AVX-NEXT:    store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
-; AVX-NEXT:    [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0
-; AVX-NEXT:    [[TMP12:%.*]] = insertelement <2 x i64> poison, i64 [[TMP11]], i32 0
-; AVX-NEXT:    [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64 [[TMP5]], i32 1
-; AVX-NEXT:    [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64 6>
-; AVX-NEXT:    [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP14]]
-; AVX-NEXT:    [[TMP16:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>*
-; AVX-NEXT:    store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align 1
+; AVX-NEXT:    [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
+; AVX-NEXT:    [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]
+; AVX-NEXT:    [[TMP13:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x i64>*
+; AVX-NEXT:    store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
 ; AVX-NEXT:    ret void
 ;
 entry:



_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210326/8dd7a80f/attachment.html>


More information about the llvm-commits mailing list