RFC: Enable vectorization of call instructions in the loop vectorizer

Sat Feb 28 13:14:41 PST 2015

On Fri, Feb 27, 2015 at 4:10 PM Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Hi Arnold,
>
> These are sligthly updated patches 0007 and 0008 - I added some
> description for getCallInstrCost and refactored changes in loop vectorizer.
> I don’t know if these changes actually look better now, but at least we
> don’t have duplicated code :)
>
>
>
How'd you get the magic constants?

-eric

> What is the best way to proceed with this set of the patches? Should I
> create a separate phabricator review for each of them?
>
> Also, the 0002 patch is an independent bug fix - is it ok to commit it?
>
> Thanks,
> Michael
>
> On Feb 26, 2015, at 4:57 PM, Michael Zolotukhin <mzolotukhin at apple.com>
> wrote:
>
> Hi Arnold,
>
> On Feb 26, 2015, at 1:49 PM, Arnold Schwaighofer <aschwaighofer at apple.com>
> wrote:
>
> Hi Michael,
>
> thanks for following up on this. I think it is important to move forward
> with this.
>
> On Feb 26, 2015, at 12:01 PM, Michael Zolotukhin <mzolotukhin at apple.com>
> wrote:
>
> Hi,
>
> Some time ago James posted a patch for vectorization of calls [1]. Though
> it seems to be a missing feature in the vectorizer, it wasn’t committed, so
> I decided to revive the discussion.
>
> To begin with, I rebased the patch and split it to several semantically
> separated parts. The approach used here is pretty natural:
> TargetLibraryInfo has a mapping for a set of scalar functions to their
> vector counterparts, TargetTransformInfo consults with it when cost of call
> instruction is calculated, if the vector-version is cheaper, vectorizer
> emits vector-function.
>
> There are several question on this patch:
> 1. Is this approach still viable in general and worth pursuing?
>
>
> I believe this is still important. We want to be able to vectorize
> function calls if vectorized versions of math library functions are
> available.
>
> Thanks, I’ll continue digging this then:)
>
>
>
> 2. The cost model is changed. In theory it should become more accurate,
> but I haven’t measured performance impact of this change yet. Does these
> changes look good in general? If yes, do we need to tune some
> target-cost-model parameters, or those are the values that no one has been
> actually using yet?
>
>
> I think we will have to test performance impact of this change since we
> are most likely the first clients. We should test with/without the target
> tuple that enables the vectorized function calls. I think if we don’t see
> egregious regressions in the run that uses the modified cost model but is a
> run without vectorized versions of the math library being available we can
> commit this patch set such that it gets testing by the community.
>
> That sounds good.
>
>
>
> 3. What is the best way to populate the scalar to vector functions map?
> For now, I populate it when the target-triple contains ‘Accelerate’. If we
> have a lot of other cases, we might think of something different here.
>
>
> We use the environment substring of the target tuple (in this case
> quadruple). This is in spirit to what we are already to conditionally turn
> on/off features in TargetLibraryInfo. I think we should address the issue
> of scalability when we have a client where this gets to be an issue.
>
> Sounds good.
>
>
>
>
> Any feedback or ideas are appreciated!
>
> Thanks,
> Michael
>
> [1]: http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736
>
>
> <0001-Make-ToVectorTy-static.patch>
> <0002-Fix-a-copy-paste-bug.patch>
> <0003-TLI-Use-lambda.-NFC.patch>
> <0004-LoopVectorizer-Add-TargetTransformInfo.patch>
> <0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch>
> <0006-TLI-Add-interface-for-querying-whether-a-function-is.patch>
> <0007-TTI-Add-getCallInstrCost.patch>
> <0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch>
> <0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch>
> <0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch>
>
>
> Some comments on the code:
>
>
> This function could use some documentation. What are Tys? (Argument types).
>
> I’ll add some, thanks.
>
>
> +  unsigned getCallInstrCost(Function *F, Type *RetTy, ArrayRef<Type *>
> Tys,
> +                            const TargetLibraryInfo *TLI) {
> +
> +    // Scalar function calls are always expensive.
> +    if (!RetTy->isVectorTy())
> +      return 10;
> +
> +    // Functions with a vector form are no more expensive than a scalar
> call.
> +    if (TLI &&
> +        TLI->isFunctionVectorizable(F->getName(),
> +                                    RetTy->getVectorNumElements()))
> +      return 10;
> +
> +    // We have to scalarize this function call. Estimate the cost.
> +    unsigned ScalarizationCost = getScalarizationOverhead(RetTy, true,
> false);
> +    unsigned ScalarCalls = RetTy->getVectorNumElements();
> +    for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
> +      if (Tys[i]->isVectorTy()) {
> +        ScalarizationCost += getScalarizationOverhead(Tys[i], false,
> true);
>
> This is not obvious. Why could we have more elements in the vector type of
> a function arguments than the vector return type when vectorizing a
> function such that we need more function calls when scalarizing?
>
> It’s an attempt to handle ‘unusual’ functions here. Possible examples
> would be converts, where input element size differs from output element
> size, or reductions.I think that for now we can just assert that we only
> deal with functions where operand types are the same as the result type.
>
>
> +        ScalarCalls = std::max(ScalarCalls,
> Tys[i]->getVectorNumElements());
> +      }
> +    }
> +
> +    return ScalarCalls * 10 + ScalarizationCost;
> +  }
>
>
> There seems to be some repetition in the code below. Can we refactor that?
>
> After re-examining this code I spotted strange part (I didn’t change it in
> the patches):
>           Type *Tys[] = {CI->getType()};
>           if (VF > 1)
>             Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
> I assume that only works for instrinsics with one argument, and those are
> the only cases we handle now. But shouldn’t we guard it with an assert or
> try to handle general case here?
>
> Thanks,
> Michael
>
>
> +        default:
> +          bool HasScalarOpd = hasVectorInstrinsicScalarOpd(ID, 1);
> +          for (unsigned Part = 0; Part < UF; ++Part) {
> +            SmallVector<Value *, 4> Args;
> +            for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie;
> ++i) {
> +              if (HasScalarOpd && i == 1) {
> +                Args.push_back(CI->getArgOperand(i));
> +                continue;
> +              }
> +              VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
> +              Args.push_back(Arg[Part]);
> +            }
> +            Type *Tys[] = {CI->getType()};
> +            if (VF > 1)
> +              Tys[0] = VectorType::get(CI->getType()->getScalarType(),
> VF);
> +
> +            Function *F = Intrinsic::getDeclaration(M, ID, Tys);
> +            Entry[Part] = Builder.CreateCall(F, Args);
> +          }
> +
> +          propagateMetadata(Entry, it);
> +          break;
> +        }
> +        // Otherwise, perform the lib call.
> +      } else if (TLI && TLI->isFunctionVectorizable(FnName, VF)) {
> +        // This is a function with a vector form.
> +        StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);
> +        assert(!VFnName.empty());
> +
> +        Function *VectorF = M->getFunction(VFnName);
> +        if (!VectorF) {
> +          // Generate a declaration
> +          FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
> +          VectorF =
> +              Function::Create(FTy, Function::ExternalLinkage, VFnName,
> M);
> +          assert(VectorF);
> +        }
> +
>          for (unsigned Part = 0; Part < UF; ++Part) {
>            SmallVector<Value *, 4> Args;
>            for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie;
> ++i) {
> -            if (HasScalarOpd && i == 1) {
> -              Args.push_back(CI->getArgOperand(i));
> -              continue;
> -            }
>              VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
>              Args.push_back(Arg[Part]);
>            }
> -          Type *Tys[] = {CI->getType()};
> -          if (VF > 1)
> -            Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
> -
> -          Function *F = Intrinsic::getDeclaration(M, ID, Tys);
> -          Entry[Part] = Builder.CreateCall(F, Args);
> +          Entry[Part] = Builder.CreateCall(VectorF, Args);
>          }
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150228/1350e311/attachment.html>