[llvm] 99203f2 - [Analysis]Add getPointersDiff function to improve compile time.
Richard Smith via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 25 16:55:22 PDT 2021
Landed in 040c60d9b69e2ad570556f255a746929a4b10e82.
I didn't look at the other callers of getPointersDiff to see if they are OK
passing StrictCheck == false. Is it OK for the returned value to be rounded
down to a multiple of the type size in the other cases?
On Thu, 25 Mar 2021 at 16:44, Alexey Bataev <a.bataev at outlook.com> wrote:
> Sure!
>
> Best regards,
> Alexey Bataev
>
> 25 марта 2021 г., в 19:41, Richard Smith <richard at metafoo.co.uk>
> написал(а):
>
>
> Sure, I'm currently running the tests on that change and will land that
> once they pass. However, I've not figured out how to observe the bug with
> an LLVM test yet (the testcase in question was extracted from the middle of
> a JIT process). Do you mind if I land this without a test and leave the
> test coverage with you? This is currently blocking us, so I'd like to get
> the bug resolved as soon as possible.
>
> On Thu, 25 Mar 2021 at 16:34, Alexey Bataev <a.bataev at outlook.com> wrote:
>
>> Hi Richard, yes, that’s what I thought. Can you commit this fix?
>>
>> Best regards,
>> Alexey Bataev
>>
>> 25 марта 2021 г., в 19:29, Richard Smith <richard at metafoo.co.uk>
>> написал(а):
>>
>>
>> On Thu, 25 Mar 2021 at 16:24, Richard Smith <richard at metafoo.co.uk>
>> wrote:
>>
>>> Hi Alexey, this patch has introduced a miscompile for us.
>>>
>>> On Tue, 23 Mar 2021 at 14:26, Alexey Bataev via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>>
>>>> Author: Alexey Bataev
>>>> Date: 2021-03-23T14:25:36-07:00
>>>> New Revision: 99203f2004d031f2ef22f01e3c569d2775de1836
>>>>
>>>> URL:
>>>> https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836
>>>> DIFF:
>>>> https://github.com/llvm/llvm-project/commit/99203f2004d031f2ef22f01e3c569d2775de1836.diff
>>>>
>>>> LOG: [Analysis]Add getPointersDiff function to improve compile time.
>>>>
>>>> Added getPointersDiff function to LoopAccessAnalysis and used it instead
>>>> direct calculatoin of the distance between pointers and/or
>>>> isConsecutiveAccess function in SLP vectorizer to improve compile time
>>>> and detection of stores consecutive chains.
>>>>
>>>> Part of D57059
>>>>
>>>> Differential Revision: https://reviews.llvm.org/D98967
>>>>
>>>> Added:
>>>>
>>>>
>>>> Modified:
>>>> llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>> llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>> llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>> llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>>
>>>> Removed:
>>>>
>>>>
>>>>
>>>>
>>>> ################################################################################
>>>> diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>> b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>> index 13fbe884eddf..39acfd5bbbee 100644
>>>> --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>> +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
>>>> @@ -679,6 +679,15 @@ int64_t getPtrStride(PredicatedScalarEvolution
>>>> &PSE, Value *Ptr, const Loop *Lp,
>>>> const ValueToValueMap &StridesMap =
>>>> ValueToValueMap(),
>>>> bool Assume = false, bool ShouldCheckWrap = true);
>>>>
>>>> +/// Returns the distance between the pointers \p PtrA and \p PtrB iff
>>>> they are
>>>> +/// compatible and it is possible to calculate the distance between
>>>> them. This
>>>> +/// is a simple API that does not depend on the analysis pass.
>>>> +/// \param StrictCheck Ensure that the calculated distance matches the
>>>> +/// type-based one after all the bitcasts removal in the provided
>>>> pointers.
>>>> +Optional<int> getPointersDiff(Value *PtrA, Value *PtrB, const
>>>> DataLayout &DL,
>>>> + ScalarEvolution &SE, bool StrictCheck =
>>>> false,
>>>> + bool CheckType = true);
>>>> +
>>>> /// Attempt to sort the pointers in \p VL and return the sorted indices
>>>> /// in \p SortedIndices, if reordering is required.
>>>> ///
>>>>
>>>> diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>> b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>> index e632fe25c24c..997d4474a448 100644
>>>> --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>> +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
>>>> @@ -1124,139 +1124,123 @@ int64_t
>>>> llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr,
>>>> return Stride;
>>>> }
>>>>
>>>> +Optional<int> llvm::getPointersDiff(Value *PtrA, Value *PtrB,
>>>> + const DataLayout &DL,
>>>> ScalarEvolution &SE,
>>>> + bool StrictCheck, bool CheckType) {
>>>> + assert(PtrA && PtrB && "Expected non-nullptr pointers.");
>>>> + // Make sure that A and B are
>>>> diff erent pointers.
>>>> + if (PtrA == PtrB)
>>>> + return 0;
>>>> +
>>>> + // Make sure that PtrA and PtrB have the same type if required
>>>> + if (CheckType && PtrA->getType() != PtrB->getType())
>>>> + return None;
>>>> +
>>>> + unsigned ASA = PtrA->getType()->getPointerAddressSpace();
>>>> + unsigned ASB = PtrB->getType()->getPointerAddressSpace();
>>>> +
>>>> + // Check that the address spaces match.
>>>> + if (ASA != ASB)
>>>> + return None;
>>>> + unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
>>>> +
>>>> + APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
>>>> + Value *PtrA1 = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL,
>>>> OffsetA);
>>>> + Value *PtrB1 = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL,
>>>> OffsetB);
>>>> +
>>>> + int Val;
>>>> + if (PtrA1 == PtrB1) {
>>>> + // Retrieve the address space again as pointer stripping now
>>>> tracks through
>>>> + // `addrspacecast`.
>>>> + ASA = cast<PointerType>(PtrA1->getType())->getAddressSpace();
>>>> + ASB = cast<PointerType>(PtrB1->getType())->getAddressSpace();
>>>> + // Check that the address spaces match and that the pointers are
>>>> valid.
>>>> + if (ASA != ASB)
>>>> + return None;
>>>> +
>>>> + IdxWidth = DL.getIndexSizeInBits(ASA);
>>>> + OffsetA = OffsetA.sextOrTrunc(IdxWidth);
>>>> + OffsetB = OffsetB.sextOrTrunc(IdxWidth);
>>>> +
>>>> + OffsetB -= OffsetA;
>>>> + Val = OffsetB.getSExtValue();
>>>> + } else {
>>>> + // Otherwise compute the distance with SCEV between the base
>>>> pointers.
>>>> + const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
>>>> + const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
>>>> + const auto *Diff =
>>>> + dyn_cast<SCEVConstant>(SE.getMinusSCEV(PtrSCEVB, PtrSCEVA));
>>>> + if (!Diff)
>>>> + return None;
>>>> + Val = Diff->getAPInt().getSExtValue();
>>>> + }
>>>> + Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
>>>> + int Size = DL.getTypeStoreSize(Ty);
>>>> + int Dist = Val / Size;
>>>> +
>>>> + // Ensure that the calculated distance matches the type-based one
>>>> after all
>>>> + // the bitcasts removal in the provided pointers.
>>>> + if (!StrictCheck || Dist * Size == Val)
>>>> + return Dist;
>>>>
>>>
>>> Specifically, we get here, with Val == 12 and Size == 8, resulting in an
>>> incorrect computation of Dist, for the following testcase:
>>>
>>> ; Function Attrs: nofree norecurse nounwind uwtable
>>> define void @Int32R2.5(i8* nocapture readnone %retval, i8* noalias
>>> nocapture readnone %run_options, i8** noalias nocapture readnone %params,
>>> i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture
>>> readnone %prof_counters) local_unnamed_addr #0 {
>>> entry:
>>> %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
>>> %1 = bitcast i8** %0 to i32**
>>> %2 = load i32*, i32** %1, align 8, !invariant.load !0,
>>> !dereferenceable !1, !align !1
>>> %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
>>> %4 = bitcast i8** %3 to i32**
>>> %5 = load i32*, i32** %4, align 8, !invariant.load !0,
>>> !dereferenceable !1, !align !1
>>> %6 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
>>> %7 = bitcast i8** %6 to [2 x [2 x i32]]**
>>> %8 = load [2 x [2 x i32]]*, [2 x [2 x i32]]** %7, align 8,
>>> !invariant.load !0, !dereferenceable !2, !align !2
>>> %9 = load i32, i32* %2, align 4, !invariant.load !0, !noalias !3
>>> %10 = icmp sgt i32 %9, 0
>>> %11 = load i32, i32* %5, align 4, !invariant.load !0, !noalias !3
>>> %12 = icmp sgt i32 %11, 0
>>> %13 = select i1 %10, i64 12, i64 0
>>> %14 = select i1 %12, i64 4, i64 0
>>> %15 = add nuw nsw i64 %13, %14
>>> %scevgep6 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %15
>>> %16 = bitcast i8* %scevgep6 to i64*
>>> %17 = bitcast [2 x [2 x i32]]* %8 to i64*
>>> %18 = load i64, i64* %16, align 4
>>> store i64 %18, i64* %17, align 16
>>> %scevgep.1 = getelementptr [2 x [2 x i32]], [2 x [2 x i32]]* %8, i64
>>> 0, i64 1, i64 0
>>> %19 = add nuw nsw i64 %15, 12
>>> %scevgep6.1 = getelementptr [36 x i8], [36 x i8]* @0, i64 0, i64 %19
>>> %20 = bitcast i8* %scevgep6.1 to i64*
>>> %21 = bitcast i32* %scevgep.1 to i64*
>>> %22 = load i64, i64* %20, align 4
>>> store i64 %22, i64* %21, align 8
>>> ret void
>>> }
>>>
>>> ... when computing the difference between %16 and %20 in the call from
>>> LoopAccessAnalysis.cpp:1207.
>>>
>>> Perhaps StrictCheck should always be enabled, given that this function
>>> truncates its result and produces a wrong answer if not?
>>>
>>>
>>>> + return None;
>>>> +}
>>>> +
>>>> bool llvm::sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
>>>> ScalarEvolution &SE,
>>>> SmallVectorImpl<unsigned> &SortedIndices) {
>>>> assert(llvm::all_of(
>>>> VL, [](const Value *V) { return
>>>> V->getType()->isPointerTy(); }) &&
>>>> "Expected list of pointer operands.");
>>>> - SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
>>>> - OffValPairs.reserve(VL.size());
>>>> -
>>>> // Walk over the pointers, and map each of them to an offset
>>>> relative to
>>>> // first pointer in the array.
>>>> Value *Ptr0 = VL[0];
>>>> - const SCEV *Scev0 = SE.getSCEV(Ptr0);
>>>> - Value *Obj0 = getUnderlyingObject(Ptr0);
>>>> -
>>>> - llvm::SmallSet<int64_t, 4> Offsets;
>>>> - for (auto *Ptr : VL) {
>>>> - // TODO: Outline this code as a special, more time consuming,
>>>> version of
>>>> - // computeConstantDifference() function.
>>>> - if (Ptr->getType()->getPointerAddressSpace() !=
>>>> - Ptr0->getType()->getPointerAddressSpace())
>>>> - return false;
>>>> - // If a pointer refers to a
>>>> diff erent underlying object, bail - the
>>>> - // pointers are by definition incomparable.
>>>> - Value *CurrObj = getUnderlyingObject(Ptr);
>>>> - if (CurrObj != Obj0)
>>>> - return false;
>>>>
>>>> - const SCEV *Scev = SE.getSCEV(Ptr);
>>>> - const auto *Diff = dyn_cast<SCEVConstant>(SE.getMinusSCEV(Scev,
>>>> Scev0));
>>>> - // The pointers may not have a constant offset from each other, or
>>>> SCEV
>>>> - // may just not be smart enough to figure out they do. Regardless,
>>>> - // there's nothing we can do.
>>>> + using DistOrdPair = std::pair<int64_t, int>;
>>>> + auto Compare = [](const DistOrdPair &L, const DistOrdPair &R) {
>>>> + return L.first < R.first;
>>>> + };
>>>> + std::set<DistOrdPair, decltype(Compare)> Offsets(Compare);
>>>> + Offsets.emplace(0, 0);
>>>> + int Cnt = 1;
>>>> + bool IsConsecutive = true;
>>>> + for (auto *Ptr : VL.drop_front()) {
>>>> + Optional<int> Diff = getPointersDiff(Ptr0, Ptr, DL, SE);
>>>>
>>>
>> Passing StrictCheck = true here fixes the miscompile.
>>
>>
>>> if (!Diff)
>>>> return false;
>>>>
>>>> // Check if the pointer with the same offset is found.
>>>> - int64_t Offset = Diff->getAPInt().getSExtValue();
>>>> - if (!Offsets.insert(Offset).second)
>>>> + int64_t Offset = *Diff;
>>>> + auto Res = Offsets.emplace(Offset, Cnt);
>>>> + if (!Res.second)
>>>> return false;
>>>> - OffValPairs.emplace_back(Offset, Ptr);
>>>> + // Consecutive order if the inserted element is the last one.
>>>> + IsConsecutive = IsConsecutive && std::next(Res.first) ==
>>>> Offsets.end();
>>>> + ++Cnt;
>>>> }
>>>> SortedIndices.clear();
>>>> - SortedIndices.resize(VL.size());
>>>> - std::iota(SortedIndices.begin(), SortedIndices.end(), 0);
>>>> -
>>>> - // Sort the memory accesses and keep the order of their uses in
>>>> UseOrder.
>>>> - llvm::stable_sort(SortedIndices, [&](unsigned Left, unsigned Right) {
>>>> - return OffValPairs[Left].first < OffValPairs[Right].first;
>>>> - });
>>>> -
>>>> - // Check if the order is consecutive already.
>>>> - if (llvm::all_of(SortedIndices, [&SortedIndices](const unsigned I) {
>>>> - return I == SortedIndices[I];
>>>> - }))
>>>> - SortedIndices.clear();
>>>> -
>>>> + if (!IsConsecutive) {
>>>> + // Fill SortedIndices array only if it is non-consecutive.
>>>> + SortedIndices.resize(VL.size());
>>>> + Cnt = 0;
>>>> + for (const std::pair<int64_t, int> &Pair : Offsets) {
>>>> + IsConsecutive = IsConsecutive && Cnt == Pair.second;
>>>> + SortedIndices[Cnt] = Pair.second;
>>>> + ++Cnt;
>>>> + }
>>>> + }
>>>> return true;
>>>> }
>>>>
>>>> -/// Take the address space operand from the Load/Store instruction.
>>>> -/// Returns -1 if this is not a valid Load/Store instruction.
>>>> -static unsigned getAddressSpaceOperand(Value *I) {
>>>> - if (LoadInst *L = dyn_cast<LoadInst>(I))
>>>> - return L->getPointerAddressSpace();
>>>> - if (StoreInst *S = dyn_cast<StoreInst>(I))
>>>> - return S->getPointerAddressSpace();
>>>> - return -1;
>>>> -}
>>>> -
>>>> /// Returns true if the memory operations \p A and \p B are
>>>> consecutive.
>>>> bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout
>>>> &DL,
>>>> ScalarEvolution &SE, bool CheckType) {
>>>> Value *PtrA = getLoadStorePointerOperand(A);
>>>> Value *PtrB = getLoadStorePointerOperand(B);
>>>> - unsigned ASA = getAddressSpaceOperand(A);
>>>> - unsigned ASB = getAddressSpaceOperand(B);
>>>> -
>>>> - // Check that the address spaces match and that the pointers are
>>>> valid.
>>>> - if (!PtrA || !PtrB || (ASA != ASB))
>>>> - return false;
>>>> -
>>>> - // Make sure that A and B are
>>>> diff erent pointers.
>>>> - if (PtrA == PtrB)
>>>> - return false;
>>>> -
>>>> - // Make sure that A and B have the same type if required.
>>>> - if (CheckType && PtrA->getType() != PtrB->getType())
>>>> + if (!PtrA || !PtrB)
>>>> return false;
>>>> -
>>>> - unsigned IdxWidth = DL.getIndexSizeInBits(ASA);
>>>> - Type *Ty = cast<PointerType>(PtrA->getType())->getElementType();
>>>> -
>>>> - APInt OffsetA(IdxWidth, 0), OffsetB(IdxWidth, 0);
>>>> - PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetA);
>>>> - PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(DL, OffsetB);
>>>> -
>>>> - // Retrieve the address space again as pointer stripping now tracks
>>>> through
>>>> - // `addrspacecast`.
>>>> - ASA = cast<PointerType>(PtrA->getType())->getAddressSpace();
>>>> - ASB = cast<PointerType>(PtrB->getType())->getAddressSpace();
>>>> - // Check that the address spaces match and that the pointers are
>>>> valid.
>>>> - if (ASA != ASB)
>>>> - return false;
>>>> -
>>>> - IdxWidth = DL.getIndexSizeInBits(ASA);
>>>> - OffsetA = OffsetA.sextOrTrunc(IdxWidth);
>>>> - OffsetB = OffsetB.sextOrTrunc(IdxWidth);
>>>> -
>>>> - APInt Size(IdxWidth, DL.getTypeStoreSize(Ty));
>>>> -
>>>> - // OffsetDelta = OffsetB - OffsetA;
>>>> - const SCEV *OffsetSCEVA = SE.getConstant(OffsetA);
>>>> - const SCEV *OffsetSCEVB = SE.getConstant(OffsetB);
>>>> - const SCEV *OffsetDeltaSCEV = SE.getMinusSCEV(OffsetSCEVB,
>>>> OffsetSCEVA);
>>>> - const APInt &OffsetDelta =
>>>> cast<SCEVConstant>(OffsetDeltaSCEV)->getAPInt();
>>>> -
>>>> - // Check if they are based on the same pointer. That makes the
>>>> offsets
>>>> - // sufficient.
>>>> - if (PtrA == PtrB)
>>>> - return OffsetDelta == Size;
>>>> -
>>>> - // Compute the necessary base pointer delta to have the necessary
>>>> final delta
>>>> - // equal to the size.
>>>> - // BaseDelta = Size - OffsetDelta;
>>>> - const SCEV *SizeSCEV = SE.getConstant(Size);
>>>> - const SCEV *BaseDelta = SE.getMinusSCEV(SizeSCEV, OffsetDeltaSCEV);
>>>> -
>>>> - // Otherwise compute the distance with SCEV between the base
>>>> pointers.
>>>> - const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
>>>> - const SCEV *PtrSCEVB = SE.getSCEV(PtrB);
>>>> - const SCEV *X = SE.getAddExpr(PtrSCEVA, BaseDelta);
>>>> - return X == PtrSCEVB;
>>>> + Optional<int> Diff =
>>>> + getPointersDiff(PtrA, PtrB, DL, SE, /*StrictCheck=*/true,
>>>> CheckType);
>>>> + return Diff && *Diff == 1;
>>>> }
>>>>
>>>> MemoryDepChecker::VectorizationSafetyStatus
>>>>
>>>> diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>> b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>> index 385b6f30dc0f..78d2ea0032db 100644
>>>> --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>> +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>> @@ -941,10 +941,16 @@ class BoUpSLP {
>>>> ScalarEvolution &SE) {
>>>> auto *LI1 = dyn_cast<LoadInst>(V1);
>>>> auto *LI2 = dyn_cast<LoadInst>(V2);
>>>> - if (LI1 && LI2)
>>>> - return isConsecutiveAccess(LI1, LI2, DL, SE)
>>>> - ? VLOperands::ScoreConsecutiveLoads
>>>> - : VLOperands::ScoreFail;
>>>> + if (LI1 && LI2) {
>>>> + if (LI1->getParent() != LI2->getParent())
>>>> + return VLOperands::ScoreFail;
>>>> +
>>>> + Optional<int> Dist =
>>>> + getPointersDiff(LI1->getPointerOperand(),
>>>> LI2->getPointerOperand(),
>>>> + DL, SE, /*StrictCheck=*/true);
>>>> + return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads
>>>> + : VLOperands::ScoreFail;
>>>> + }
>>>>
>>>> auto *C1 = dyn_cast<Constant>(V1);
>>>> auto *C2 = dyn_cast<Constant>(V2);
>>>> @@ -2871,13 +2877,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *>
>>>> VL, unsigned Depth,
>>>> Ptr0 = PointerOps[CurrentOrder.front()];
>>>> PtrN = PointerOps[CurrentOrder.back()];
>>>> }
>>>> - const SCEV *Scev0 = SE->getSCEV(Ptr0);
>>>> - const SCEV *ScevN = SE->getSCEV(PtrN);
>>>> - const auto *Diff =
>>>> - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
>>>> - uint64_t Size = DL->getTypeAllocSize(ScalarTy);
>>>> + Optional<int> Diff = getPointersDiff(Ptr0, PtrN, *DL, *SE);
>>>> // Check that the sorted loads are consecutive.
>>>> - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
>>>> + if (static_cast<unsigned>(*Diff) == VL.size() - 1) {
>>>> if (CurrentOrder.empty()) {
>>>> // Original loads are consecutive and does not require
>>>> reordering.
>>>> ++NumOpsWantToKeepOriginalOrder;
>>>> @@ -3150,13 +3152,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *>
>>>> VL, unsigned Depth,
>>>> Ptr0 = PointerOps[CurrentOrder.front()];
>>>> PtrN = PointerOps[CurrentOrder.back()];
>>>> }
>>>> - const SCEV *Scev0 = SE->getSCEV(Ptr0);
>>>> - const SCEV *ScevN = SE->getSCEV(PtrN);
>>>> - const auto *Diff =
>>>> - dyn_cast<SCEVConstant>(SE->getMinusSCEV(ScevN, Scev0));
>>>> - uint64_t Size = DL->getTypeAllocSize(ScalarTy);
>>>> + Optional<int> Dist = getPointersDiff(Ptr0, PtrN, *DL, *SE);
>>>> // Check that the sorted pointer operands are consecutive.
>>>> - if (Diff && Diff->getAPInt() == (VL.size() - 1) * Size) {
>>>> + if (static_cast<unsigned>(*Dist) == VL.size() - 1) {
>>>> if (CurrentOrder.empty()) {
>>>> // Original stores are consecutive and does not require
>>>> reordering.
>>>> ++NumOpsWantToKeepOriginalOrder;
>>>> @@ -6107,20 +6105,41 @@ bool
>>>> SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
>>>>
>>>> int E = Stores.size();
>>>> SmallBitVector Tails(E, false);
>>>> - SmallVector<int, 16> ConsecutiveChain(E, E + 1);
>>>> int MaxIter = MaxStoreLookup.getValue();
>>>> + SmallVector<std::pair<int, int>, 16> ConsecutiveChain(
>>>> + E, std::make_pair(E, INT_MAX));
>>>> + SmallVector<SmallBitVector, 4> CheckedPairs(E, SmallBitVector(E,
>>>> false));
>>>> int IterCnt;
>>>> auto &&FindConsecutiveAccess = [this, &Stores, &Tails, &IterCnt,
>>>> MaxIter,
>>>> + &CheckedPairs,
>>>> &ConsecutiveChain](int K, int Idx) {
>>>> if (IterCnt >= MaxIter)
>>>> return true;
>>>> + if (CheckedPairs[Idx].test(K))
>>>> + return ConsecutiveChain[K].second == 1 &&
>>>> + ConsecutiveChain[K].first == Idx;
>>>> ++IterCnt;
>>>> - if (!isConsecutiveAccess(Stores[K], Stores[Idx], *DL, *SE))
>>>> + CheckedPairs[Idx].set(K);
>>>> + CheckedPairs[K].set(Idx);
>>>> + Optional<int> Diff =
>>>> getPointersDiff(Stores[K]->getPointerOperand(),
>>>> +
>>>> Stores[Idx]->getPointerOperand(), *DL,
>>>> + *SE, /*StrictCheck=*/true);
>>>> + if (!Diff || *Diff == 0)
>>>> + return false;
>>>> + int Val = *Diff;
>>>> + if (Val < 0) {
>>>> + if (ConsecutiveChain[Idx].second > -Val) {
>>>> + Tails.set(K);
>>>> + ConsecutiveChain[Idx] = std::make_pair(K, -Val);
>>>> + }
>>>> + return false;
>>>> + }
>>>> + if (ConsecutiveChain[K].second <= Val)
>>>> return false;
>>>>
>>>> Tails.set(Idx);
>>>> - ConsecutiveChain[K] = Idx;
>>>> - return true;
>>>> + ConsecutiveChain[K] = std::make_pair(Idx, Val);
>>>> + return Val == 1;
>>>> };
>>>> // Do a quadratic search on all of the given stores in reverse order
>>>> and find
>>>> // all of the pairs of stores that follow each other.
>>>> @@ -6140,17 +6159,31 @@ bool
>>>> SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
>>>> // For stores that start but don't end a link in the chain:
>>>> for (int Cnt = E; Cnt > 0; --Cnt) {
>>>> int I = Cnt - 1;
>>>> - if (ConsecutiveChain[I] == E + 1 || Tails.test(I))
>>>> + if (ConsecutiveChain[I].first == E || Tails.test(I))
>>>> continue;
>>>> // We found a store instr that starts a chain. Now follow the
>>>> chain and try
>>>> // to vectorize it.
>>>> BoUpSLP::ValueList Operands;
>>>> // Collect the chain into a list.
>>>> - while (I != E + 1 && !VectorizedStores.count(Stores[I])) {
>>>> + while (I != E && !VectorizedStores.count(Stores[I])) {
>>>> Operands.push_back(Stores[I]);
>>>> + Tails.set(I);
>>>> + if (ConsecutiveChain[I].second != 1) {
>>>> + // Mark the new end in the chain and go back, if required. It
>>>> might be
>>>> + // required if the original stores come in reversed order, for
>>>> example.
>>>> + if (ConsecutiveChain[I].first != E &&
>>>> + Tails.test(ConsecutiveChain[I].first) &&
>>>> +
>>>> !VectorizedStores.count(Stores[ConsecutiveChain[I].first])) {
>>>> + Tails.reset(ConsecutiveChain[I].first);
>>>> + if (Cnt < ConsecutiveChain[I].first + 2)
>>>> + Cnt = ConsecutiveChain[I].first + 2;
>>>> + }
>>>> + break;
>>>> + }
>>>> // Move to the next value in the chain.
>>>> - I = ConsecutiveChain[I];
>>>> + I = ConsecutiveChain[I].first;
>>>> }
>>>> + assert(!Operands.empty() && "Expected non-empty list of stores.");
>>>>
>>>> unsigned MaxVecRegSize = R.getMaxVecRegSize();
>>>> unsigned EltSize = R.getVectorElementSize(Operands[0]);
>>>>
>>>> diff --git a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>> b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>> index 267cf1a02c29..e28362894910 100644
>>>> --- a/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>> +++ b/llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
>>>> @@ -1,7 +1,7 @@
>>>> ; NOTE: Assertions have been autogenerated by
>>>> utils/update_test_checks.py
>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -slp-vectorizer -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -slp-vectorizer -mattr=+avx -S | FileCheck %s --check-prefix=AVX
>>>> -; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -slp-vectorizer -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX
>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -mattr=+sse2 -S | FileCheck %s --check-prefix=SSE
>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -mattr=+avx -S | FileCheck %s --check-prefix=AVX
>>>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -slp-vectorizer
>>>> -mattr=+avx2 -S | FileCheck %s --check-prefix=AVX
>>>>
>>>> %class.1 = type { %class.2 }
>>>> %class.2 = type { %"class.3" }
>>>> @@ -117,13 +117,10 @@ define void @pr35497() local_unnamed_addr #0 {
>>>> ; AVX-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds [0 x i64],
>>>> [0 x i64]* undef, i64 0, i64 0
>>>> ; AVX-NEXT: [[TMP10:%.*]] = bitcast i64* [[ARRAYIDX2_6]] to <2 x
>>>> i64>*
>>>> ; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
>>>> -; AVX-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP4]], i32 0
>>>> -; AVX-NEXT: [[TMP12:%.*]] = insertelement <2 x i64> poison, i64
>>>> [[TMP11]], i32 0
>>>> -; AVX-NEXT: [[TMP13:%.*]] = insertelement <2 x i64> [[TMP12]], i64
>>>> [[TMP5]], i32 1
>>>> -; AVX-NEXT: [[TMP14:%.*]] = lshr <2 x i64> [[TMP13]], <i64 6, i64 6>
>>>> -; AVX-NEXT: [[TMP15:%.*]] = add nuw nsw <2 x i64> [[TMP9]],
>>>> [[TMP14]]
>>>> -; AVX-NEXT: [[TMP16:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x
>>>> i64>*
>>>> -; AVX-NEXT: store <2 x i64> [[TMP15]], <2 x i64>* [[TMP16]], align 1
>>>> +; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
>>>> +; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]],
>>>> [[TMP11]]
>>>> +; AVX-NEXT: [[TMP13:%.*]] = bitcast i64* [[ARRAYIDX2_2]] to <2 x
>>>> i64>*
>>>> +; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
>>>> ; AVX-NEXT: ret void
>>>> ;
>>>> entry:
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210325/982e7728/attachment.html>
More information about the llvm-commits
mailing list