[llvm] r205018 - SLPVectorizer: Ignore users that are insertelements we can reschedule them

Arnold Schwaighofer aschwaighofer at apple.com
Mon Mar 31 16:54:32 PDT 2014


It looks like the SLP vectorizer should be filtering calls to intrinsics making sure it “understands” the call (all parameters can be vectorized) like the loop vectorizer does. The loop vectorizer calls “getIntrinsicIDForCall” to do this. The slp vectorizer should do the same.


On Mar 31, 2014, at 4:20 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> Reverted in r205260.
> 
> It looks like we fail when we vectorize the function call " %call.i4 = call i8 @llvm.ctlz.i8(i8 %1, i1 false)". Probably, because the second parameter has to stay scalar.
> 
> 
> 
> On Mar 31, 2014, at 10:51 AM, Tom Stellard <tom at stellard.net> wrote:
> 
>> On Fri, Mar 28, 2014 at 05:21:23PM -0000, Arnold Schwaighofer wrote:
>>> Author: arnolds
>>> Date: Fri Mar 28 12:21:22 2014
>>> New Revision: 205018
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=205018&view=rev
>>> Log:
>>> SLPVectorizer: Ignore users that are insertelements we can reschedule them
>>> 
>>> Patch by Arch D. Robison!
>>> 
>> 
>> This commit causes a crash in the libclc build.  To reproduce (see
>> attached test case):
>> 
>> opt -slp-vectorizer slp-regression.ll -o - -S
>> 
>> -Tom
>> 
>>> Modified:
>>>   llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>>   llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
>>> 
>>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=205018&r1=205017&r2=205018&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Fri Mar 28 12:21:22 2014
>>> @@ -365,13 +365,13 @@ public:
>>>  /// A negative number means that this is profitable.
>>>  int getTreeCost();
>>> 
>>> -  /// Construct a vectorizable tree that starts at \p Roots and is possibly
>>> -  /// used by a reduction of \p RdxOps.
>>> -  void buildTree(ArrayRef<Value *> Roots, ValueSet *RdxOps = 0);
>>> +  /// Construct a vectorizable tree that starts at \p Roots, ignoring users for
>>> +  /// the purpose of scheduling and extraction in the \p UserIgnoreLst.
>>> +  void buildTree(ArrayRef<Value *> Roots,
>>> +                 ArrayRef<Value *> UserIgnoreLst = None);
>>> 
>>>  /// Clear the internal data structures that are created by 'buildTree'.
>>>  void deleteTree() {
>>> -    RdxOps = 0;
>>>    VectorizableTree.clear();
>>>    ScalarToTreeEntry.clear();
>>>    MustGather.clear();
>>> @@ -527,8 +527,8 @@ private:
>>>  /// Numbers instructions in different blocks.
>>>  DenseMap<BasicBlock *, BlockNumbering> BlocksNumbers;
>>> 
>>> -  /// Reduction operators.
>>> -  ValueSet *RdxOps;
>>> +  /// List of users to ignore during scheduling and that don't need extracting.
>>> +  ArrayRef<Value *> UserIgnoreList;
>>> 
>>>  // Analysis and block reference.
>>>  Function *F;
>>> @@ -542,9 +542,10 @@ private:
>>>  IRBuilder<> Builder;
>>> };
>>> 
>>> -void BoUpSLP::buildTree(ArrayRef<Value *> Roots, ValueSet *Rdx) {
>>> +void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
>>> +                        ArrayRef<Value *> UserIgnoreLst) {
>>>  deleteTree();
>>> -  RdxOps = Rdx;
>>> +  UserIgnoreList = UserIgnoreLst;
>>>  if (!getSameType(Roots))
>>>    return;
>>>  buildTree_rec(Roots, 0);
>>> @@ -576,8 +577,9 @@ void BoUpSLP::buildTree(ArrayRef<Value *
>>>        if (!UserInst)
>>>          continue;
>>> 
>>> -        // Ignore uses that are part of the reduction.
>>> -        if (Rdx && std::find(Rdx->begin(), Rdx->end(), UserInst) != Rdx->end())
>>> +        // Ignore users in the user ignore list.
>>> +        if (std::find(UserIgnoreList.begin(), UserIgnoreList.end(), UserInst) !=
>>> +            UserIgnoreList.end())
>>>          continue;
>>> 
>>>        DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " <<
>>> @@ -708,8 +710,9 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>>>        continue;
>>>      }
>>> 
>>> -      // This user is part of the reduction.
>>> -      if (RdxOps && RdxOps->count(UI))
>>> +      // Ignore users in the user ignore list.
>>> +      if (std::find(UserIgnoreList.begin(), UserIgnoreList.end(), UI) !=
>>> +          UserIgnoreList.end())
>>>        continue;
>>> 
>>>      // Make sure that we can schedule this unknown user.
>>> @@ -1737,8 +1740,9 @@ Value *BoUpSLP::vectorizeTree() {
>>>          DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");
>>> 
>>>          assert((ScalarToTreeEntry.count(U) ||
>>> -                  // It is legal to replace the reduction users by undef.
>>> -                  (RdxOps && RdxOps->count(U))) &&
>>> +                  // It is legal to replace users in the ignorelist by undef.
>>> +                  (std::find(UserIgnoreList.begin(), UserIgnoreList.end(), U) !=
>>> +                   UserIgnoreList.end())) &&
>>>                 "Replacing out-of-tree value with undef");
>>>        }
>>> #endif
>>> @@ -1942,8 +1946,11 @@ private:
>>>  bool tryToVectorizePair(Value *A, Value *B, BoUpSLP &R);
>>> 
>>>  /// \brief Try to vectorize a list of operands.
>>> +  /// \@param BuildVector A list of users to ignore for the purpose of
>>> +  ///                     scheduling and that don't need extracting.
>>>  /// \returns true if a value was vectorized.
>>> -  bool tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R);
>>> +  bool tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
>>> +                          ArrayRef<Value *> BuildVector = None);
>>> 
>>>  /// \brief Try to vectorize a chain that may start at the operands of \V;
>>>  bool tryToVectorize(BinaryOperator *V, BoUpSLP &R);
>>> @@ -2116,7 +2123,8 @@ bool SLPVectorizer::tryToVectorizePair(V
>>>  return tryToVectorizeList(VL, R);
>>> }
>>> 
>>> -bool SLPVectorizer::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R) {
>>> +bool SLPVectorizer::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
>>> +                                       ArrayRef<Value *> BuildVector) {
>>>  if (VL.size() < 2)
>>>    return false;
>>> 
>>> @@ -2166,13 +2174,33 @@ bool SLPVectorizer::tryToVectorizeList(A
>>>                 << "\n");
>>>    ArrayRef<Value *> Ops = VL.slice(i, OpsWidth);
>>> 
>>> -    R.buildTree(Ops);
>>> +    ArrayRef<Value *> BuildVectorSlice;
>>> +    if (!BuildVector.empty())
>>> +      BuildVectorSlice = BuildVector.slice(i, OpsWidth);
>>> +
>>> +    R.buildTree(Ops, BuildVectorSlice);
>>>    int Cost = R.getTreeCost();
>>> 
>>>    if (Cost < -SLPCostThreshold) {
>>>      DEBUG(dbgs() << "SLP: Vectorizing pair at cost:" << Cost << ".\n");
>>> -      R.vectorizeTree();
>>> +      Value *VectorizedRoot = R.vectorizeTree();
>>> 
>>> +      // Reconstruct the build vector by extracting the vectorized root. This
>>> +      // way we handle the case where some elements of the vector are undefined.
>>> +      //  (return (inserelt <4 xi32> (insertelt undef (opd0) 0) (opd1) 2))
>>> +      if (!BuildVectorSlice.empty()) {
>>> +        Instruction *InsertAfter = cast<Instruction>(VectorizedRoot);
>>> +        for (auto &V : BuildVectorSlice) {
>>> +          InsertElementInst *IE = cast<InsertElementInst>(V);
>>> +          IRBuilder<> Builder(++BasicBlock::iterator(InsertAfter));
>>> +          Instruction *Extract = cast<Instruction>(
>>> +              Builder.CreateExtractElement(VectorizedRoot, IE->getOperand(2)));
>>> +          IE->setOperand(1, Extract);
>>> +          IE->removeFromParent();
>>> +          IE->insertAfter(Extract);
>>> +          InsertAfter = IE;
>>> +        }
>>> +      }
>>>      // Move to the next bundle.
>>>      i += VF - 1;
>>>      Changed = true;
>>> @@ -2281,7 +2309,7 @@ static Value *createRdxShuffleMask(unsig
>>> ///   *p =
>>> ///
>>> class HorizontalReduction {
>>> -  SmallPtrSet<Value *, 16> ReductionOps;
>>> +  SmallVector<Value *, 16> ReductionOps;
>>>  SmallVector<Value *, 32> ReducedVals;
>>> 
>>>  BinaryOperator *ReductionRoot;
>>> @@ -2375,7 +2403,7 @@ public:
>>>          // We need to be able to reassociate the adds.
>>>          if (!TreeN->isAssociative())
>>>            return false;
>>> -          ReductionOps.insert(TreeN);
>>> +          ReductionOps.push_back(TreeN);
>>>        }
>>>        // Retract.
>>>        Stack.pop_back();
>>> @@ -2412,7 +2440,7 @@ public:
>>> 
>>>    for (; i < NumReducedVals - ReduxWidth + 1; i += ReduxWidth) {
>>>      ArrayRef<Value *> ValsToReduce(&ReducedVals[i], ReduxWidth);
>>> -      V.buildTree(ValsToReduce, &ReductionOps);
>>> +      V.buildTree(ValsToReduce, ReductionOps);
>>> 
>>>      // Estimate cost.
>>>      int Cost = V.getTreeCost() + getReductionCost(TTI, ReducedVals[i]);
>>> @@ -2531,13 +2559,16 @@ private:
>>> ///
>>> /// Returns true if it matches
>>> ///
>>> -static bool findBuildVector(InsertElementInst *IE,
>>> -                            SmallVectorImpl<Value *> &Ops) {
>>> -  if (!isa<UndefValue>(IE->getOperand(0)))
>>> +static bool findBuildVector(InsertElementInst *FirstInsertElem,
>>> +                            SmallVectorImpl<Value *> &BuildVector,
>>> +                            SmallVectorImpl<Value *> &BuildVectorOpds) {
>>> +  if (!isa<UndefValue>(FirstInsertElem->getOperand(0)))
>>>    return false;
>>> 
>>> +  InsertElementInst *IE = FirstInsertElem;
>>>  while (true) {
>>> -    Ops.push_back(IE->getOperand(1));
>>> +    BuildVector.push_back(IE);
>>> +    BuildVectorOpds.push_back(IE->getOperand(1));
>>> 
>>>    if (IE->use_empty())
>>>      return false;
>>> @@ -2707,12 +2738,16 @@ bool SLPVectorizer::vectorizeChainsInBlo
>>>    }
>>> 
>>>    // Try to vectorize trees that start at insertelement instructions.
>>> -    if (InsertElementInst *IE = dyn_cast<InsertElementInst>(it)) {
>>> -      SmallVector<Value *, 8> Ops;
>>> -      if (!findBuildVector(IE, Ops))
>>> +    if (InsertElementInst *FirstInsertElem = dyn_cast<InsertElementInst>(it)) {
>>> +      SmallVector<Value *, 16> BuildVector;
>>> +      SmallVector<Value *, 16> BuildVectorOpds;
>>> +      if (!findBuildVector(FirstInsertElem, BuildVector, BuildVectorOpds))
>>>        continue;
>>> 
>>> -      if (tryToVectorizeList(Ops, R)) {
>>> +      // Vectorize starting with the build vector operands ignoring the
>>> +      // BuildVector instructions for the purpose of scheduling and user
>>> +      // extraction.
>>> +      if (tryToVectorizeList(BuildVectorOpds, R, BuildVector)) {
>>>        Changed = true;
>>>        it = BB->begin();
>>>        e = BB->end();
>>> 
>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll?rev=205018&r1=205017&r2=205018&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll (original)
>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll Fri Mar 28 12:21:22 2014
>>> @@ -194,4 +194,28 @@ define <4 x float> @simple_select_partia
>>>  ret <4 x float> %rb
>>> }
>>> 
>>> +; Make sure that vectorization happens even if insertelements operations
>>> +; must be rescheduled. The case here is from compiling Julia.
>>> +define <4 x float> @reschedule_extract(<4 x float> %a, <4 x float> %b) {
>>> +; CHECK-LABEL: @reschedule_extract(
>>> +; CHECK: %1 = fadd <4 x float> %a, %b
>>> +  %a0 = extractelement <4 x float> %a, i32 0
>>> +  %b0 = extractelement <4 x float> %b, i32 0
>>> +  %c0 = fadd float %a0, %b0
>>> +  %v0 = insertelement <4 x float> undef, float %c0, i32 0
>>> +  %a1 = extractelement <4 x float> %a, i32 1
>>> +  %b1 = extractelement <4 x float> %b, i32 1
>>> +  %c1 = fadd float %a1, %b1
>>> +  %v1 = insertelement <4 x float> %v0, float %c1, i32 1
>>> +  %a2 = extractelement <4 x float> %a, i32 2
>>> +  %b2 = extractelement <4 x float> %b, i32 2
>>> +  %c2 = fadd float %a2, %b2
>>> +  %v2 = insertelement <4 x float> %v1, float %c2, i32 2
>>> +  %a3 = extractelement <4 x float> %a, i32 3
>>> +  %b3 = extractelement <4 x float> %b, i32 3
>>> +  %c3 = fadd float %a3, %b3
>>> +  %v3 = insertelement <4 x float> %v2, float %c3, i32 3
>>> +  ret <4 x float> %v3
>>> +}
>>> +
>>> attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> <slp-regression.ll>
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list