[llvm] 99203f2 - [Analysis]Add getPointersDiff function to improve compile time.

Thu Mar 25 18:12:35 PDT 2021

On Thu, 25 Mar 2021 at 17:30, Alexey Bataev <a.bataev at outlook.com> wrote:

> Yes, it should be. There was just one problem with pointers in load
> instructions (I just forgot about it, because I saw exactly the same
> problem when I initially prepared this fix several months ago as part of
> another big patch). In other places we already checked that the pointers
> are strictly aligned and we can use relaxed version of the call.
>

OK, thanks. Can we assert that truncation doesn't happen when strict
checking is disabled?

> Best regards,
> Alexey Bataev
>
> 25 марта 2021 г., в 19:55, Richard Smith <richard at metafoo.co.uk>
> написал(а):
>
> 
> Landed in 040c60d9b69e2ad570556f255a746929a4b10e82.
>
> I didn't look at the other callers of getPointersDiff to see if they are
> OK passing StrictCheck == false. Is it OK for the returned value to be
> rounded down to a multiple of the type size in the other cases?
>
> On Thu, 25 Mar 2021 at 16:44, Alexey Bataev <a.bataev at outlook.com> wrote:
>
>> Sure!
>>
>> Best regards,
>> Alexey Bataev
>>
>> 25 марта 2021 г., в 19:41, Richard Smith <richard at metafoo.co.uk>
>> написал(а):
>>
>> 
>> Sure, I'm currently running the tests on that change and will land that
>> once they pass. However, I've not figured out how to observe the bug with
>> an LLVM test yet (the testcase in question was extracted from the middle of
>> a JIT process). Do you mind if I land this without a test and leave the
>> test coverage with you? This is currently blocking us, so I'd like to get
>> the bug resolved as soon as possible.
>>
>> On Thu, 25 Mar 2021 at 16:34, Alexey Bataev <a.bataev at outlook.com> wrote:
>>
>>> Hi Richard, yes, that’s what I thought. Can you commit this fix?
>>>
>>> Best regards,
>>> Alexey Bataev
>>>
>>> 25 марта 2021 г., в 19:29, Richard Smith <richard at metafoo.co.uk>
>>> написал(а):
>>>
>>> 
>>> On Thu, 25 Mar 2021 at 16:24, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>>> Hi Alexey, this patch has introduced a miscompile for us.
>>>>
>>>> On Tue, 23 Mar 2021 at 14:26, Alexey Bataev via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>> Author: Alexey Bataev
>>>>> Date: 2021-03-23T14:25:36-07:00
>>>>> New Revision: 99203f2004d031f2ef22f01e3c569d2775de1836
>>>>>
>>>>> URL:
>>>>> https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836
>>>>> DIFF:
>>>>> https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836.diff
>>>>>
>>>>> LOG: [Analysis]Add getPointersDiff function to improve compile time.
>>>>>
>>>>> Added getPointersDiff function to LoopAccessAnalysis and used it
>>>>> instead
>>>>> direct calculatoin of the distance between pointers and/or
>>>>> isConsecutiveAccess function in SLP vectorizer to improve compile time
>>>>> and detection of stores consecutive chains.
>>>>>
>>>>> Part of D57059
>>>>>
>>>>> Differential Revision: https://reviews.llvm.org/D98967
>>>>>
>>>>> Added:
>>>>>
>>>>>
>>>>> Modified:
>>>>>     llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>>>     llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>>>     llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>>>     llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>>
>>>>> Removed:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ################################################################################
>>>>> diff  --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>>> b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>>> index 13fbe884eddf..39acfd5bbbee 100644
>>>>> --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>>> +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>>> @@ -679,6 +679,15 @@ int64_t getPtrStride(PredicatedScalarEvolution
>>>>> &PSE, Value *Ptr, const Loop *Lp,
>>>>>                       const ValueToValueMap &StridesMap =
>>>>> ValueToValueMap(),
>>>>>                       bool Assume = false, bool ShouldCheckWrap =
>>>>> true);
>>>>>
>>>>> +/// Returns the distance between the pointers \p PtrA and \p PtrB iff
>>>>> they are
>>>>> +/// compatible and it is possible to calculate the distance between
>>>>> them. This
>>>>> +/// is a simple API that does not depend on the analysis pass.
>>>>> +/// \param StrictCheck Ensure that the calculated distance matches the
>>>>> +/// type-based one after all the bitcasts removal in the provided
>>>>> pointers.
>>>>> +Optional<int> getPointersDiff(Value *PtrA, Value *PtrB, const
>>>>> DataLayout &DL,
>>>>> +                              ScalarEvolution &SE, bool StrictCheck =
>>>>> false,
>>>>> +                              bool CheckType = true);
>>>>> +
>>>>>  /// Attempt to sort the pointers in \p VL and return the sorted
>>>>> indices
>>>>>  /// in \p SortedIndices, if reordering is required.
>>>>>  ///
>>>>>
>>>>> diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>>> b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>>> index e632fe25c24c..997d4474a448 100644
>>>>> --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>>> +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>>> @@ -1124,139 +1124,123 @@ int64_t
>>>>> llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr,
>>>>>    return Stride;
>>>>>  }
>>>>>
>>>>> +Optional<int> llvm::getPointersDiff(Value *PtrA, Value *PtrB,
>>>>> +                                    const DataLayout &DL,
>>>>> ScalarEvolution &SE,
>>>>> +                                    bool StrictCheck, bool CheckType)
>>>>> {
>>>>> +  assert(PtrA && PtrB && "Expected non-nullptr pointers.");
>>>>> +  // Make sure that A and B are
>>>>> diff erent pointers.
>>>>> +  if (PtrA == PtrB)
>>>>> +    return 0;
>>>>> +
>>>>> +  // Make sure that PtrA and PtrB have the same type if required
>>>>> +  if (CheckType && PtrA->getType() != PtrB->getType())
>>>>> +    return None;
>>>>> +
>>>>> +  unsigned ASA = PtrA->getType()->getPointerAddressSpace();
>>>>> +  unsigned ASB = PtrB->getType()->getPointerAddressSpace();
>>>>> +
>>>>> +  // Check that the address spaces match.
>>>>> +  if (ASA != ASB)
>>>>> +    return None;
>>>>> +  unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
>>>>> +
>>>>> +  APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
>>>>> +  Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL,
>>>>> OffsetA);
>>>>> +  Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL,
>>>>> OffsetB);
>>>>> +
>>>>> +  int Val;
>>>>> +  if (PtrA1 == PtrB1) {
>>>>> +    // Retrieve the address space again as pointer stripping now
>>>>> tracks through
>>>>> +    // `addrspacecast`.
>>>>> +    ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace();
>>>>> +    ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace();
>>>>> +    // Check that the address spaces match and that the pointers are
>>>>> valid.
>>>>> +    if (ASA != ASB)
>>>>> +      return None;
>>>>> +
>>>>> +    IdxWidth = DL.getIndexSizeInBits(ASA);
>>>>> +    OffsetA = OffsetA.sextOrTrunc(IdxWidth);
>>>>> +    OffsetB = OffsetB.sextOrTrunc(IdxWidth);
>>>>> +
>>>>> +    OffsetB -= OffsetA;
>>>>> +    Val = OffsetB.getSExtValue();
>>>>> +  } else {
>>>>> +    // Otherwise compute the distance with SCEV between the base
>>>>> pointers.
>>>>> +    const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
>>>>> +    const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
>>>>> +    const auto *Diff =
>>>>> +        dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA));
>>>>> +    if (!Diff)
>>>>> +      return None;
>>>>> +    Val = Diff->getAPInt().getSExtValue();
>>>>> +  }
>>>>> +  Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
>>>>> +  int Size = DL.getTypeStoreSize(Ty);
>>>>> +  int Dist = Val / Size;
>>>>> +
>>>>> +  // Ensure that the calculated distance matches the type-based one
>>>>> after all
>>>>> +  // the bitcasts removal in the provided pointers.
>>>>> +  if (!StrictCheck || Dist * Size == Val)
>>>>> +    return Dist;
>>>>>
>>>>
>>>> Specifically, we get here, with Val == 12 and Size == 8, resulting in
>>>> an incorrect computation of Dist, for the following testcase:
>>>>
>>>> ; Function Attrs: nofree norecurse nounwind uwtable
>>>> define void @Int32R2.5(i8* nocapture readnone %retval, i8* noalias
>>>> nocapture readnone %run_options, i8** noalias nocapture readnone %params,
>>>> i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture
>>>> readnone %prof_counters) local_unnamed_addr #0 {
>>>> entry:
>>>>   %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
>>>>   %1 = bitcast i8** %0 to i32**
>>>>   %2 = load i32*, i32** %1, align 8, !invariant.load !0,
>>>> !dereferenceable !1, !align !1
>>>>   %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
>>>>   %4 = bitcast i8** %3 to i32**
>>>>   %5 = load i32*, i32** %4, align 8, !invariant.load !0,
>>>> !dereferenceable !1, !align !1
>>>>   %6 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
>>>>   %7 = bitcast i8** %6 to [2 x [2 x i32]]**
>>>>   %8 = load [2 x [2 x i32]]*, [2 x [2 x i32]]** %7, align 8,
>>>> !invariant.load !0, !dereferenceable !2, !align !2
>>>>   %9 = load i32, i32* %2, align 4, !invariant.load !0, !noalias !3
>>>>   %10 = icmp sgt i32 %9, 0
>>>>   %11 = load i32, i32* %5, align 4, !invariant.load !0, !noalias !3
>>>>   %12 = icmp sgt i32 %11, 0
>>>>   %13 = select i1 %10, i64 12, i64 0
>>>>   %14 = select i1 %12, i64 4, i64 0
>>>>   %15 = add nuw nsw i64 %13, %14
>>>>   %scevgep6 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %15
>>>>   %16 = bitcast i8* %scevgep6 to i64*
>>>>   %17 = bitcast [2 x [2 x i32]]* %8 to i64*
>>>>   %18 = load i64, i64* %16, align 4
>>>>   store i64 %18, i64* %17, align 16
>>>>   %scevgep.1 = getelementptr [2 x [2 x i32]], [2 x [2 x i32]]* %8, i64
>>>> 0, i64 1, i64 0
>>>>   %19 = add nuw nsw i64 %15, 12
>>>>   %scevgep6.1 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %19
>>>>   %20 = bitcast i8* %scevgep6.1 to i64*
>>>>   %21 = bitcast i32* %scevgep.1 to i64*
>>>>   %22 = load i64, i64* %20, align 4
>>>>   store i64 %22, i64* %21, align 8
>>>>   ret void
>>>> }
>>>>
>>>> ... when computing the difference between %16 and %20 in the call from
>>>> LoopAccessAnalysis.cpp:1207.
>>>>
>>>> Perhaps StrictCheck should always be enabled, given that this function
>>>> truncates its result and produces a wrong answer if not?
>>>>
>>>>
>>>>> +  return None;
>>>>> +}
>>>>> +
>>>>>  bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
>>>>>                             ScalarEvolution &SE,
>>>>>                             SmallVectorImpl<unsigned> &SortedIndices) {
>>>>>    assert(llvm::all_of(
>>>>>               VL, [](const Value *V) { return
>>>>> V->getType()->isPointerTy(); }) &&
>>>>>           "Expected list of pointer operands.");
>>>>> -  SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
>>>>> -  OffValPairs.reserve(VL.size());
>>>>> -
>>>>>    // Walk over the pointers, and map each of them to an offset
>>>>> relative to
>>>>>    // first pointer in the array.
>>>>>    Value *Ptr0 = VL[0];
>>>>> -  const SCEV *Scev0 = SE.getSCEV(Ptr0);
>>>>> -  Value *Obj0 = getUnderlyingObject(Ptr0);
>>>>> -
>>>>> -  llvm::SmallSet<int64_t, 4> Offsets;
>>>>> -  for (auto *Ptr : VL) {
>>>>> -    // TODO: Outline this code as a special, more time consuming,
>>>>> version of
>>>>> -    // computeConstantDifference() function.
>>>>> -    if (Ptr->getType()->getPointerAddressSpace() !=
>>>>> -        Ptr0->getType()->getPointerAddressSpace())
>>>>> -      return false;
>>>>> -    // If a pointer refers to a
>>>>> diff erent underlying object, bail - the
>>>>> -    // pointers are by definition incomparable.
>>>>> -    Value *CurrObj = getUnderlyingObject(Ptr);
>>>>> -    if (CurrObj != Obj0)
>>>>> -      return false;
>>>>>
>>>>> -    const SCEV *Scev = SE.getSCEV(Ptr);
>>>>> -    const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev,
>>>>> Scev0));
>>>>> -    // The pointers may not have a constant offset from each other,
>>>>> or SCEV
>>>>> -    // may just not be smart enough to figure out they do. Regardless,
>>>>> -    // there's nothing we can do.
>>>>> +  using DistOrdPair = std::pair<int64_t, int>;
>>>>> +  auto Compare = [](const DistOrdPair &L, const DistOrdPair &R) {
>>>>> +    return L.first < R.first;
>>>>> +  };
>>>>> +  std::set<DistOrdPair, decltype(Compare)> Offsets(Compare);
>>>>> +  Offsets.emplace(0, 0);
>>>>> +  int Cnt = 1;
>>>>> +  bool IsConsecutive = true;
>>>>> +  for (auto *Ptr : VL.drop_front()) {
>>>>> +    Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE);
>>>>>
>>>>
>>> Passing StrictCheck = true here fixes the miscompile.
>>>
>>>
>>>>      if (!Diff)
>>>>>        return false;
>>>>>
>>>>>      // Check if the pointer with the same offset is found.
>>>>> -    int64_t Offset = Diff->getAPInt().getSExtValue();
>>>>> -    if (!Offsets.insert(Offset).second)
>>>>> +    int64_t Offset = *Diff;
>>>>> +    auto Res = Offsets.emplace(Offset, Cnt);
>>>>> +    if (!Res.second)
>>>>>        return false;
>>>>> -    OffValPairs.emplace_back(Offset, Ptr);
>>>>> +    // Consecutive order if the inserted element is the last one.
>>>>> +    IsConsecutive = IsConsecutive && std::next(Res.first) ==
>>>>> Offsets.end();
>>>>> +    ++Cnt;
>>>>>    }
>>>>>    SortedIndices.clear();
>>>>> -  SortedIndices.resize(VL.size());
>>>>> -  std::iota(SortedIndices.begin(), SortedIndices.end(), 0);
>>>>> -
>>>>> -  // Sort the memory accesses and keep the order of their uses in
>>>>> UseOrder.
>>>>> -  llvm::stable_sort(SortedIndices, [&](unsigned Left, unsigned Right)
>>>>> {
>>>>> -    return OffValPairs[Left].first < OffValPairs[Right].first;
>>>>> -  });
>>>>> -
>>>>> -  // Check if the order is consecutive already.
>>>>> -  if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) {
>>>>> -        return I == SortedIndices[I];
>>>>> -      }))
>>>>> -    SortedIndices.clear();
>>>>> -
>>>>> +  if (!IsConsecutive) {
>>>>> +    // Fill SortedIndices array only if it is non-consecutive.
>>>>> +    SortedIndices.resize(VL.size());
>>>>> +    Cnt = 0;
>>>>> +    for (const std::pair<int64_t, int> &Pair : Offsets) {
>>>>> +      IsConsecutive = IsConsecutive && Cnt == Pair.second;
>>>>> +      SortedIndices[Cnt] = Pair.second;
>>>>> +      ++Cnt;
>>>>> +    }
>>>>> +  }
>>>>>    return true;
>>>>>  }
>>>>>
>>>>> -/// Take the address space operand from the Load/Store instruction.
>>>>> -/// Returns -1 if this is not a valid Load/Store instruction.
>>>>> -static unsigned getAddressSpaceOperand(Value *I) {
>>>>> -  if (LoadInst *L = dyn_cast<LoadInst>(I))
>>>>> -    return L->getPointerAddressSpace();
>>>>> -  if (StoreInst *S = dyn_cast<StoreInst>(I))
>>>>> -    return S->getPointerAddressSpace();
>>>>> -  return -1;
>>>>> -}
>>>>> -
>>>>>  /// Returns true if the memory operations \p A and \p B are
>>>>> consecutive.
>>>>>  bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout
>>>>> &DL,
>>>>>                                 ScalarEvolution &SE, bool CheckType) {
>>>>>    Value *PtrA = getLoadStorePointerOperand(A);
>>>>>    Value *PtrB = getLoadStorePointerOperand(B);
>>>>> -  unsigned ASA = getAddressSpaceOperand(A);
>>>>> -  unsigned ASB = getAddressSpaceOperand(B);
>>>>> -
>>>>> -  // Check that the address spaces match and that the pointers are
>>>>> valid.
>>>>> -  if (!PtrA || !PtrB || (ASA != ASB))
>>>>> -    return false;
>>>>> -
>>>>> -  // Make sure that A and B are
>>>>> diff erent pointers.
>>>>> -  if (PtrA == PtrB)
>>>>> -    return false;
>>>>> -
>>>>> -  // Make sure that A and B have the same type if required.
>>>>> -  if (CheckType && PtrA->getType() != PtrB->getType())
>>>>> +  if (!PtrA || !PtrB)
>>>>>      return false;
>>>>> -
>>>>> -  unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
>>>>> -  Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
>>>>> -
>>>>> -  APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
>>>>> -  PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);
>>>>> -  PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);
>>>>> -
>>>>> -  // Retrieve the address space again as pointer stripping now tracks
>>>>> through
>>>>> -  // `addrspacecast`.
>>>>> -  ASA = cast<PointerType>(PtrA->getType())->getAddressSpace();
>>>>> -  ASB = cast<PointerType>(PtrB->getType())->getAddressSpace();
>>>>> -  // Check that the address spaces match and that the pointers are
>>>>> valid.
>>>>> -  if (ASA != ASB)
>>>>> -    return false;
>>>>> -
>>>>> -  IdxWidth = DL.getIndexSizeInBits(ASA);
>>>>> -  OffsetA = OffsetA.sextOrTrunc(IdxWidth);
>>>>> -  OffsetB = OffsetB.sextOrTrunc(IdxWidth);
>>>>> -
>>>>> -  APInt Size(IdxWidth, DL.getTypeStoreSize(Ty));
>>>>> -
>>>>> -  //  OffsetDelta = OffsetB - OffsetA;
>>>>> -  const SCEV *OffsetSCEVA = SE.getConstant(OffsetA);
>>>>> -  const SCEV *OffsetSCEVB = SE.getConstant(OffsetB);
>>>>> -  const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB,
>>>>> OffsetSCEVA);
>>>>> -  const APInt &OffsetDelta =
>>>>> cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt();
>>>>> -
>>>>> -  // Check if they are based on the same pointer. That makes the
>>>>> offsets
>>>>> -  // sufficient.
>>>>> -  if (PtrA == PtrB)
>>>>> -    return OffsetDelta == Size;
>>>>> -
>>>>> -  // Compute the necessary base pointer delta to have the necessary
>>>>> final delta
>>>>> -  // equal to the size.
>>>>> -  // BaseDelta = Size - OffsetDelta;
>>>>> -  const SCEV *SizeSCEV = SE.getConstant(Size);
>>>>> -  const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV);
>>>>> -
>>>>> -  // Otherwise compute the distance with SCEV between the base
>>>>> pointers.
>>>>> -  const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
>>>>> -  const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
>>>>> -  const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta);
>>>>> -  return X == PtrSCEVB;
>>>>> +  Optional<int> Diff =
>>>>> +      getPointersDiff(PtrA, PtrB, DL, SE, /*StrictCheck=*/true,
>>>>> CheckType);
>>>>> +  return Diff && *Diff == 1;
>>>>>  }
>>>>>
>>>>>  MemoryDepChecker::VectorizationSafetyStatus
>>>>>
>>>>> diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>>> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>>> index 385b6f30dc0f..78d2ea0032db 100644
>>>>> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>>> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>>> @@ -941,10 +941,16 @@ class BoUpSLP {
>>>>>                                 ScalarEvolution &SE) {
>>>>>        auto *LI1 = dyn_cast<LoadInst>(V1);
>>>>>        auto *LI2 = dyn_cast<LoadInst>(V2);
>>>>> -      if (LI1 && LI2)
>>>>> -        return isConsecutiveAccess(LI1, LI2, DL, SE)
>>>>> -                   ? VLOperands::ScoreConsecutiveLoads
>>>>> -                   : VLOperands::ScoreFail;
>>>>> +      if (LI1 && LI2) {
>>>>> +        if (LI1->getParent() != LI2->getParent())
>>>>> +          return VLOperands::ScoreFail;
>>>>> +
>>>>> +        Optional<int> Dist =
>>>>> +            getPointersDiff(LI1->getPointerOperand(),
>>>>> LI2->getPointerOperand(),
>>>>> +                            DL, SE, /*StrictCheck=*/true);
>>>>> +        return (Dist && *Dist == 1) ?
>>>>> VLOperands::ScoreConsecutiveLoads
>>>>> +                                    : VLOperands::ScoreFail;
>>>>> +      }
>>>>>
>>>>>        auto *C1 = dyn_cast<Constant>(V1);
>>>>>        auto *C2 = dyn_cast<Constant>(V2);
>>>>> @@ -2871,13 +2877,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *>
>>>>> VL, unsigned Depth,
>>>>>            Ptr0 = PointerOps[CurrentOrder.front()];
>>>>>            PtrN = PointerOps[CurrentOrder.back()];
>>>>>          }
>>>>> -        const SCEV *Scev0 = SE->getSCEV(Ptr0);
>>>>> -        const SCEV *ScevN = SE->getSCEV(PtrN);
>>>>> -        const auto *Diff =
>>>>> -            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
>>>>> -        uint64_t Size = DL->getTypeAllocSize(ScalarTy);
>>>>> +        Optional<int> Diff = getPointersDiff(Ptr0, PtrN, *DL, *SE);
>>>>>          // Check that the sorted loads are consecutive.
>>>>> -        if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
>>>>> +        if (static_cast<unsigned>(*Diff) == VL.size() - 1) {
>>>>>            if (CurrentOrder.empty()) {
>>>>>              // Original loads are consecutive and does not require
>>>>> reordering.
>>>>>              ++NumOpsWantToKeepOriginalOrder;
>>>>> @@ -3150,13 +3152,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *>
>>>>> VL, unsigned Depth,
>>>>>            Ptr0 = PointerOps[CurrentOrder.front()];
>>>>>            PtrN = PointerOps[CurrentOrder.back()];
>>>>>          }
>>>>> -        const SCEV *Scev0 = SE->getSCEV(Ptr0);
>>>>> -        const SCEV *ScevN = SE->getSCEV(PtrN);
>>>>> -        const auto *Diff =
>>>>> -            dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
>>>>> -        uint64_t Size = DL->getTypeAllocSize(ScalarTy);
>>>>> +        Optional<int> Dist = getPointersDiff(Ptr0, PtrN, *DL, *SE);
>>>>>          // Check that the sorted pointer operands are consecutive.
>>>>> -        if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
>>>>> +        if (static_cast<unsigned>(*Dist) == VL.size() - 1) {
>>>>>            if (CurrentOrder.empty()) {
>>>>>              // Original stores are consecutive and does not require
>>>>> reordering.
>>>>>              ++NumOpsWantToKeepOriginalOrder;
>>>>> @@ -6107,20 +6105,41 @@ bool
>>>>> SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
>>>>>
>>>>>    int E = Stores.size();
>>>>>    SmallBitVector Tails(E, false);
>>>>> -  SmallVector<int, 16> ConsecutiveChain(E, E + 1);
>>>>>    int MaxIter = MaxStoreLookup.getValue();
>>>>> +  SmallVector<std::pair<int, int>, 16> ConsecutiveChain(
>>>>> +      E, std::make_pair(E, INT_MAX));
>>>>> +  SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E,
>>>>> false));
>>>>>    int IterCnt;
>>>>>    auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt,
>>>>> MaxIter,
>>>>> +                                  &CheckedPairs,
>>>>>                                    &ConsecutiveChain](int K, int Idx) {
>>>>>      if (IterCnt >= MaxIter)
>>>>>        return true;
>>>>> +    if (CheckedPairs[Idx].test(K))
>>>>> +      return ConsecutiveChain[K].second == 1 &&
>>>>> +             ConsecutiveChain[K].first == Idx;
>>>>>      ++IterCnt;
>>>>> -    if (!isConsecutiveAccess(Stores[K], Stores[Idx], *DL, *SE))
>>>>> +    CheckedPairs[Idx].set(K);
>>>>> +    CheckedPairs[K].set(Idx);
>>>>> +    Optional<int> Diff =
>>>>> getPointersDiff(Stores[K]->getPointerOperand(),
>>>>> +
>>>>>  Stores[Idx]->getPointerOperand(), *DL,
>>>>> +                                         *SE, /*StrictCheck=*/true);
>>>>> +    if (!Diff || *Diff == 0)
>>>>> +      return false;
>>>>> +    int Val = *Diff;
>>>>> +    if (Val < 0) {
>>>>> +      if (ConsecutiveChain[Idx].second > -Val) {
>>>>> +        Tails.set(K);
>>>>> +        ConsecutiveChain[Idx] = std::make_pair(K, -Val);
>>>>> +      }
>>>>> +      return false;
>>>>> +    }
>>>>> +    if (ConsecutiveChain[K].second <= Val)
>>>>>        return false;
>>>>>
>>>>>      Tails.set(Idx);
>>>>> -    ConsecutiveChain[K] = Idx;
>>>>> -    return true;
>>>>> +    ConsecutiveChain[K] = std::make_pair(Idx, Val);
>>>>> +    return Val == 1;
>>>>>    };
>>>>>    // Do a quadratic search on all of the given stores in reverse
>>>>> order and find
>>>>>    // all of the pairs of stores that follow each other.
>>>>> @@ -6140,17 +6159,31 @@ bool
>>>>> SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
>>>>>    // For stores that start but don't end a link in the chain:
>>>>>    for (int Cnt = E; Cnt > 0; --Cnt) {
>>>>>      int I = Cnt - 1;
>>>>> -    if (ConsecutiveChain[I] == E + 1 || Tails.test(I))
>>>>> +    if (ConsecutiveChain[I].first == E || Tails.test(I))
>>>>>        continue;
>>>>>      // We found a store instr that starts a chain. Now follow the
>>>>> chain and try
>>>>>      // to vectorize it.
>>>>>      BoUpSLP::ValueList Operands;
>>>>>      // Collect the chain into a list.
>>>>> -    while (I != E + 1 && !VectorizedStores.count(Stores[I])) {
>>>>> +    while (I != E && !VectorizedStores.count(Stores[I])) {
>>>>>        Operands.push_back(Stores[I]);
>>>>> +      Tails.set(I);
>>>>> +      if (ConsecutiveChain[I].second != 1) {
>>>>> +        // Mark the new end in the chain and go back, if required. It
>>>>> might be
>>>>> +        // required if the original stores come in reversed order,
>>>>> for example.
>>>>> +        if (ConsecutiveChain[I].first != E &&
>>>>> +            Tails.test(ConsecutiveChain[I].first) &&
>>>>> +
>>>>> !VectorizedStores.count(Stores[ConsecutiveChain[I].first])) {
>>>>> +          Tails.reset(ConsecutiveChain[I].first);
>>>>> +          if (Cnt < ConsecutiveChain[I].first + 2)
>>>>> +            Cnt = ConsecutiveChain[I].first + 2;
>>>>> +        }
>>>>> +        break;
>>>>> +      }
>>>>>        // Move to the next value in the chain.
>>>>> -      I = ConsecutiveChain[I];
>>>>> +      I = ConsecutiveChain[I].first;
>>>>>      }
>>>>> +    assert(!Operands.empty() && "Expected non-empty list of stores.");
>>>>>
>>>>>      unsigned MaxVecRegSize = R.getMaxVecRegSize();
>>>>>      unsigned EltSize = R.getVectorElementSize(Operands[0]);
>>>>>
>>>>> diff  --git a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>> b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>> index 267cf1a02c29..e28362894910 100644
>>>>> --- a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>> +++ b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>> @@ -1,7 +1,7 @@
>>>>>  ; NOTE: Assertions have been autogenerated by
>>>>> utils/update_test_checks.py
>>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
>>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -slp-vectorizer -mattr=+avx  -S | FileCheck %s --check-prefix=AVX
>>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX
>>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
>>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -mattr=+avx  -S | FileCheck %s --check-prefix=AVX
>>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>>> -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX
>>>>>
>>>>>  %class.1 = type { %class.2 }
>>>>>  %class.2 = type { %"class.3" }
>>>>> @@ -117,13 +117,10 @@ define void @pr35497() local_unnamed_addr #0 {
>>>>>  ; AVX-NEXT:    [[ARRAYIDX2_6:%.*]] = getelementptr inbounds [0 x
>>>>> i64], [0 x i64]* undef, i64 0, i64 0
>>>>>  ; AVX-NEXT:    [[TMP10:%.*]] = bitcast i64* [[ARRAYIDX2_6]] to <2 x
>>>>> i64>*
>>>>>  ; AVX-NEXT:    store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
>>>>> -; AVX-NEXT:    [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32
>>>>> 0
>>>>> -; AVX-NEXT:    [[TMP12:%.*]] = insertelement <2 x i64> poison, i64
>>>>> [[TMP11]], i32 0
>>>>> -; AVX-NEXT:    [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64
>>>>> [[TMP5]], i32 1
>>>>> -; AVX-NEXT:    [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64
>>>>> 6>
>>>>> -; AVX-NEXT:    [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]],
>>>>> [[TMP14]]
>>>>> -; AVX-NEXT:    [[TMP16:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x
>>>>> i64>*
>>>>> -; AVX-NEXT:    store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align
>>>>> 1
>>>>> +; AVX-NEXT:    [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
>>>>> +; AVX-NEXT:    [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]],
>>>>> [[TMP11]]
>>>>> +; AVX-NEXT:    [[TMP13:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x
>>>>> i64>*
>>>>> +; AVX-NEXT:    store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align
>>>>> 1
>>>>>  ; AVX-NEXT:    ret void
>>>>>  ;
>>>>>  entry:
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210325/95f875ae/attachment.html>