RFC: Enable vectorization of call instructions in the loop vectorizer

Wed Mar 4 10:22:17 PST 2015

Hi Michael,

Thanks for working on this. Regarding the TTI implementation, you have:

+  /// \returns The cost of Call instruction.
+  unsigned getCallInstrCost(Function *F, Type *RetTy, ArrayRef<Type *> Tys,
+                            const TargetLibraryInfo *TLI) {
+
+    // Scalar function calls are always expensive.
+    if (!RetTy->isVectorTy())
+      return 10;
+
+    // Functions with a vector form are no more expensive than a scalar call.
+    if (TLI &&
+        TLI->isFunctionVectorizable(F->getName(),
+                                    RetTy->getVectorNumElements()))
+      return 10;

...

I'm a bit concerned about this '10'. The target needs to be able to override this without replacing this whole function. Maybe the function, when handling vector calls, should call itself with the scalar type to get the base cost? Then the target can override to provide the base scalar cost.

I see that this factor of 10 is also in the cost-model printing code, and we need to change it there too (I think that this value did not matter much previously, but it looks like it will matter much more with this change).

+  // The Accelerate library adds vectorizable variants of many
+  // standard library functions.
+  // FIXME: Make the following list complete.
+  if (T.getEnvironmentName() == "Accelerate") {
+    const VecDesc VecFuncs[] = {

Accelerate? In the triple?

 -Hal

----- Original Message -----
> From: "Michael Zolotukhin" <mzolotukhin at apple.com>
> To: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> Cc: "Commit Messages and Patches for LLVM" <llvm-commits at cs.uiuc.edu>, "James Molloy" <james at jamesmolloy.co.uk>,
> "Nadav Rotem" <nrotem at apple.com>, "Hal J. Finkel" <hfinkel at anl.gov>, "renato golin" <renato.golin at linaro.org>
> Sent: Monday, March 2, 2015 10:25:15 PM
> Subject: Re: RFC: Enable vectorization of call instructions in the loop vectorizer
> 
> 
> 
> Hi Arnold,
> 
> Thank you for the review! I’ve committed 0001, 0002, 0003 and 0005
> patches, which don’t add any new functionality.
> 
> I’ll proceed with the further patches later this week if no one else
> has objections and my testing goes fine. For the references these
> are the patches (updated according to your remark and renumbered
> because some patches are already committed):
> 
> 
> 
> 
> 
> Thanks,
> Michael
> 
> > On Feb 28, 2015, at 7:47 AM, Arnold Schwaighofer
> > <aschwaighofer at apple.com> wrote:
> > 
> >> <0001-Make-ToVectorTy-static.patch>
> > 
> > LGTM.
> > 
> > 
> >> <0002-Fix-a-copy-paste-bug.patch>
> > 
> > LGTM.
> > 
> >> <0003-TLI-Use-lambda.-NFC.patch>
> > 
> > LGTM.
> > 
> >> <0004-LoopVectorizer-Add-TargetTransformInfo.patch>
> > 
> > LGTM.
> > 
> >> <0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch>
> > 
> > LGTM.
> > 
> >> <0006-TLI-Add-interface-for-querying-whether-a-function-is.patch>
> > 
> > LGTM with one little change. AFAICT, we can reuse one lambda
> > function (the second to last one) ;b. Seriously though, I think
> > this would also easier to read if we factored out the lambdas into
> > a static utility functions:
> > 
> > 
> > std::sort(VectorDescs.begin(), VectorDescs.end(),
> > compareByScalarName)
> > 
> > std::sort(ScalarDescs.begin(), ScalarDescs.end(),
> > compareByVectorName)
> > 
> > std::lower_bound(VectorDescs.begin(), VectorDescs.end(),
> > FunctionName, compareScalarNameWith)
> > 
> > etc.
> > 
> > +void
> > TargetLibraryInfoImpl::addVectorizableFunctions(ArrayRef<VecDesc>
> > Fns) {
> > + VectorDescs.insert(VectorDescs.end(), Fns.begin(), Fns.end());
> > + std::sort(VectorDescs.begin(), VectorDescs.end(),
> > + [](const VecDesc &LHS, const VecDesc &RHS) {
> > + return std::strncmp(LHS.ScalarFnName, RHS.ScalarFnName,
> > + std::strlen(RHS.ScalarFnName)) < 0;
> > + });
> > +
> > + ScalarDescs.insert(ScalarDescs.end(), Fns.begin(), Fns.end());
> > + std::sort(ScalarDescs.begin(), ScalarDescs.end(),
> > + [](const VecDesc &LHS, const VecDesc &RHS) {
> > + return std::strncmp(LHS.VectorFnName, RHS.VectorFnName,
> > + std::strlen(RHS.VectorFnName)) < 0;
> > + });
> > +}
> > +
> > +bool TargetLibraryInfoImpl::isFunctionVectorizable(StringRef
> > funcName) const {
> > + funcName = sanitizeFunctionName(funcName);
> > + if (funcName.empty())
> > + return false;
> > +
> > + std::vector<VecDesc>::const_iterator I = std::lower_bound(
> > + VectorDescs.begin(), VectorDescs.end(), funcName,
> > + [](const VecDesc &LHS, StringRef S) {
> > + return std::strncmp(LHS.ScalarFnName, S.data(), S.size()) < 0;
> > + });
> > + return I != VectorDescs.end();
> > +}
> > +
> > +StringRef TargetLibraryInfoImpl::getVectorizedFunction(StringRef
> > F,
> > + unsigned VF) const {
> > + F = sanitizeFunctionName(F);
> > + if (F.empty())
> > + return F;
> > + std::vector<VecDesc>::const_iterator I = std::lower_bound(
> > + VectorDescs.begin(), VectorDescs.end(), F,
> > + [](const VecDesc &LHS, StringRef S) {
> > + return std::strncmp(LHS.ScalarFnName, S.data(), S.size()) < 0;
> > + });
> > + while (I != VectorDescs.end() && StringRef(I->ScalarFnName) == F)
> > {
> > + if (I->VectorizationFactor == VF)
> > + return I->VectorFnName;
> > + ++I;
> > + }
> > + return StringRef();
> > +}
> > +
> > +StringRef TargetLibraryInfoImpl::getScalarizedFunction(StringRef
> > F,
> > + unsigned &VF) const {
> > + F = sanitizeFunctionName(F);
> > + if (F.empty())
> > + return F;
> > +
> > + std::vector<VecDesc>::const_iterator I = std::lower_bound(
> > + ScalarDescs.begin(), ScalarDescs.end(), F,
> > + [](const VecDesc &LHS, StringRef S) {
> > + return std::strncmp(LHS.VectorFnName, S.data(), S.size()) < 0;
> > + });
> > + if (I != VectorDescs.end())
> > + return StringRef();
> > + VF = I->VectorizationFactor;
> > + return I->ScalarFnName;
> > +}
> > +
> >> <0007-TTI-Add-getCallInstrCost.patch>
> > 
> > LGTM.
> >> <0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch>
> > LGTM.
> >> <0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch>
> > LGTM.
> >> <0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch>
> > LGTM.
> > 
> > Thanks!
> > 
> > 
> > 
> > 
> >> On Feb 27, 2015, at 4:04 PM, Michael Zolotukhin
> >> <mzolotukhin at apple.com> wrote:
> >> 
> >> Hi Arnold,
> >> 
> >> These are sligthly updated patches 0007 and 0008 - I added some
> >> description for getCallInstrCost and refactored changes in loop
> >> vectorizer. I don’t know if these changes actually look better
> >> now, but at least we don’t have duplicated code :)
> >> 
> >> <0007-TTI-Add-getCallInstrCost.patch>
> >> <0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch>
> >> 
> >> What is the best way to proceed with this set of the patches?
> >> Should I create a separate phabricator review for each of them?
> >> 
> >> Also, the 0002 patch is an independent bug fix - is it ok to
> >> commit it?
> >> 
> >> Thanks,
> >> Michael
> >> 
> >>> On Feb 26, 2015, at 4:57 PM, Michael Zolotukhin
> >>> <mzolotukhin at apple.com> wrote:
> >>> 
> >>> Hi Arnold,
> >>> 
> >>>> On Feb 26, 2015, at 1:49 PM, Arnold Schwaighofer
> >>>> <aschwaighofer at apple.com> wrote:
> >>>> 
> >>>> Hi Michael,
> >>>> 
> >>>> thanks for following up on this. I think it is important to move
> >>>> forward with this.
> >>>> 
> >>>>> On Feb 26, 2015, at 12:01 PM, Michael Zolotukhin
> >>>>> <mzolotukhin at apple.com> wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> Some time ago James posted a patch for vectorization of calls
> >>>>> [1]. Though it seems to be a missing feature in the
> >>>>> vectorizer, it wasn’t committed, so I decided to revive the
> >>>>> discussion.
> >>>>> 
> >>>>> To begin with, I rebased the patch and split it to several
> >>>>> semantically separated parts. The approach used here is pretty
> >>>>> natural: TargetLibraryInfo has a mapping for a set of scalar
> >>>>> functions to their vector counterparts, TargetTransformInfo
> >>>>> consults with it when cost of call instruction is calculated,
> >>>>> if the vector-version is cheaper, vectorizer emits
> >>>>> vector-function.
> >>>>> 
> >>>>> There are several question on this patch:
> >>>>> 1. Is this approach still viable in general and worth pursuing?
> >>>> 
> >>>> I believe this is still important. We want to be able to
> >>>> vectorize function calls if vectorized versions of math library
> >>>> functions are available.
> >>> Thanks, I’ll continue digging this then:)
> >>>> 
> >>>> 
> >>>>> 2. The cost model is changed. In theory it should become more
> >>>>> accurate, but I haven’t measured performance impact of this
> >>>>> change yet. Does these changes look good in general? If yes,
> >>>>> do we need to tune some target-cost-model parameters, or those
> >>>>> are the values that no one has been actually using yet?
> >>>> 
> >>>> I think we will have to test performance impact of this change
> >>>> since we are most likely the first clients. We should test
> >>>> with/without the target tuple that enables the vectorized
> >>>> function calls. I think if we don’t see egregious regressions
> >>>> in the run that uses the modified cost model but is a run
> >>>> without vectorized versions of the math library being available
> >>>> we can commit this patch set such that it gets testing by the
> >>>> community.
> >>> That sounds good.
> >>>> 
> >>>> 
> >>>>> 3. What is the best way to populate the scalar to vector
> >>>>> functions map? For now, I populate it when the target-triple
> >>>>> contains ‘Accelerate’. If we have a lot of other cases, we
> >>>>> might think of something different here.
> >>>> 
> >>>> We use the environment substring of the target tuple (in this
> >>>> case quadruple). This is in spirit to what we are already to
> >>>> conditionally turn on/off features in TargetLibraryInfo. I
> >>>> think we should address the issue of scalability when we have a
> >>>> client where this gets to be an issue.
> >>> Sounds good.
> >>>> 
> >>>> 
> >>>>> 
> >>>>> Any feedback or ideas are appreciated!
> >>>>> 
> >>>>> Thanks,
> >>>>> Michael
> >>>>> 
> >>>>> [1]:
> >>>>> http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736
> >>>>> 
> >>>>> 
> >>>>> <0001-Make-ToVectorTy-static.patch>
> >>>>> <0002-Fix-a-copy-paste-bug.patch>
> >>>>> <0003-TLI-Use-lambda.-NFC.patch>
> >>>>> <0004-LoopVectorizer-Add-TargetTransformInfo.patch>
> >>>>> <0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch>
> >>>>> <0006-TLI-Add-interface-for-querying-whether-a-function-is.patch>
> >>>>> <0007-TTI-Add-getCallInstrCost.patch>
> >>>>> <0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch>
> >>>>> <0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch>
> >>>>> <0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch>
> >>>> 
> >>>> Some comments on the code:
> >>>> 
> >>>> 
> >>>> This function could use some documentation. What are Tys?
> >>>> (Argument types).
> >>> I’ll add some, thanks.
> >>>> 
> >>>> + unsigned getCallInstrCost(Function *F, Type *RetTy,
> >>>> ArrayRef<Type *> Tys,
> >>>> + const TargetLibraryInfo *TLI) {
> >>>> +
> >>>> + // Scalar function calls are always expensive.
> >>>> + if (!RetTy->isVectorTy())
> >>>> + return 10;
> >>>> +
> >>>> + // Functions with a vector form are no more expensive than a
> >>>> scalar call.
> >>>> + if (TLI &&
> >>>> + TLI->isFunctionVectorizable(F->getName(),
> >>>> + RetTy->getVectorNumElements()))
> >>>> + return 10;
> >>>> +
> >>>> + // We have to scalarize this function call. Estimate the cost.
> >>>> + unsigned ScalarizationCost = getScalarizationOverhead(RetTy,
> >>>> true, false);
> >>>> + unsigned ScalarCalls = RetTy->getVectorNumElements();
> >>>> + for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
> >>>> + if (Tys[i]->isVectorTy()) {
> >>>> + ScalarizationCost += getScalarizationOverhead(Tys[i], false,
> >>>> true);
> >>>> 
> >>>> This is not obvious. Why could we have more elements in the
> >>>> vector type of a function arguments than the vector return type
> >>>> when vectorizing a function such that we need more function
> >>>> calls when scalarizing?
> >>> It’s an attempt to handle ‘unusual’ functions here. Possible
> >>> examples would be converts, where input element size differs
> >>> from output element size, or reductions.I think that for now we
> >>> can just assert that we only deal with functions where operand
> >>> types are the same as the result type.
> >>> 
> >>>> 
> >>>> + ScalarCalls = std::max(ScalarCalls,
> >>>> Tys[i]->getVectorNumElements());
> >>>> + }
> >>>> + }
> >>>> +
> >>>> + return ScalarCalls * 10 + ScalarizationCost;
> >>>> + }
> >>>> 
> >>>> 
> >>>> There seems to be some repetition in the code below. Can we
> >>>> refactor that?
> >>> After re-examining this code I spotted strange part (I didn’t
> >>> change it in the patches):
> >>> Type *Tys[] = {CI->getType()};
> >>> if (VF > 1)
> >>> Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
> >>> I assume that only works for instrinsics with one argument, and
> >>> those are the only cases we handle now. But shouldn’t we guard
> >>> it with an assert or try to handle general case here?
> >>> 
> >>> Thanks,
> >>> Michael
> >>> 
> >>>> 
> >>>> + default:
> >>>> + bool HasScalarOpd = hasVectorInstrinsicScalarOpd(ID, 1);
> >>>> + for (unsigned Part = 0; Part < UF; ++Part) {
> >>>> + SmallVector<Value *, 4> Args;
> >>>> + for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie;
> >>>> ++i) {
> >>>> + if (HasScalarOpd && i == 1) {
> >>>> + Args.push_back(CI->getArgOperand(i));
> >>>> + continue;
> >>>> + }
> >>>> + VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
> >>>> + Args.push_back(Arg[Part]);
> >>>> + }
> >>>> + Type *Tys[] = {CI->getType()};
> >>>> + if (VF > 1)
> >>>> + Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
> >>>> +
> >>>> + Function *F = Intrinsic::getDeclaration(M, ID, Tys);
> >>>> + Entry[Part] = Builder.CreateCall(F, Args);
> >>>> + }
> >>>> +
> >>>> + propagateMetadata(Entry, it);
> >>>> + break;
> >>>> + }
> >>>> + // Otherwise, perform the lib call.
> >>>> + } else if (TLI && TLI->isFunctionVectorizable(FnName, VF)) {
> >>>> + // This is a function with a vector form.
> >>>> + StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);
> >>>> + assert(!VFnName.empty());
> >>>> +
> >>>> + Function *VectorF = M->getFunction(VFnName);
> >>>> + if (!VectorF) {
> >>>> + // Generate a declaration
> >>>> + FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
> >>>> + VectorF =
> >>>> + Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
> >>>> + assert(VectorF);
> >>>> + }
> >>>> +
> >>>> for (unsigned Part = 0; Part < UF; ++Part) {
> >>>> SmallVector<Value *, 4> Args;
> >>>> for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
> >>>> {
> >>>> - if (HasScalarOpd && i == 1) {
> >>>> - Args.push_back(CI->getArgOperand(i));
> >>>> - continue;
> >>>> - }
> >>>> VectorParts &Arg = getVectorValue(CI->getArgOperand(i));
> >>>> Args.push_back(Arg[Part]);
> >>>> }
> >>>> - Type *Tys[] = {CI->getType()};
> >>>> - if (VF > 1)
> >>>> - Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);
> >>>> -
> >>>> - Function *F = Intrinsic::getDeclaration(M, ID, Tys);
> >>>> - Entry[Part] = Builder.CreateCall(F, Args);
> >>>> + Entry[Part] = Builder.CreateCall(VectorF, Args);
> >>>> }
> >>>> 
> >>>> 
> >>>> 
> >>> 
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> 
> > 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory