RFC: Enable vectorization of call instructions in the loop vectorizer

Fri Jan 17 09:22:08 PST 2014

Awesome, thanks Arnold! Very clear now.

On 17 January 2014 16:45, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

>
> On Jan 17, 2014, at 2:59 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
>
> > Hi Arnold,
> >
> > > First, we are going to have the situation where there exists an
> intrinsic ID for a library function (many math library functions have an
> intrinsic version: expf -> llvm.exp.32 for example). As a consequence
> “getIntrinsicIDForCall” will return it. In this case we can have both: a
> vectorized library function version and an intrinsic function that maybe
> slower or faster. In such a case the cost model has to decide which one to
> pick. This means we have to query the cost model which one is cheaper in
> two places: when get the instruction cost and when we vectorize the call.
> >
> > Sure, I will address this.
> >
> > > Second, the way we test this. [snip]
> >
> > This is very sensible. The only reason I didn't go down this route to
> start with was that I didn't know of an available library (like Accelerate)
> and didn't want to add testing/dummy code in tree. Thanks for pointing me
> at Accelerate - that'll give me a real library to (semi) implement and test.
> >
> > > This brings me to issue three. You are currently using
> TTI->getCallCost() which is not meant to be used with the vectorizer. We
> should create a getCallInstrCost() function similar to the
> “getIntrinsicInstrCost” function we already have.
> > >
> > > BasicTTI::getCallInstrCost should query TLI->isFunctionVectorizable()
> and return a sensible value in this case (one that is lower than a
> scalarized intrinsic lowered as lib call).
> >
> > I don't understand the difference between getIntrinsicCost and
> getIntrinsicInstrCost. They both take the same arguments (but return
> different values), and the doxygen docstring does not describe the action
> in enough detail to discern what the required behaviour is.
> >
> > Could you please tell me? (and I'll update the docstrings while I'm at
> it).
>
> Sure, TargetTransformInfo is split into two “cost” metrics:
>
> * Generic target information which returns its cost in terms of
> “TargetCostConstants”:
>
>   /// \name Generic Target Information
>   /// @{
>
>   /// \brief Underlying constants for 'cost' values in this interface.
>   ///
>   /// Many APIs in this interface return a cost. This enum defines the
>   /// fundamental values that should be used to interpret (and produce)
> those
>   /// costs. The costs are returned as an unsigned rather than a member of
> this
>   /// enumeration because it is expected that the cost of one IR
> instruction
>   /// may have a multiplicative factor to it or otherwise won't fit
> directly
>   /// into the enum. Moreover, it is common to sum or average costs which
> works
>   /// better as simple integral values. Thus this enum only provides
> constants.
>   …
>   /// @}
>
> This api is used by the inliner (getUserCost) to estimate the cost (size)
> of instructions.
>
> * Throughput estimate for the vectorizer. This api attempts to estimate
> (very crudely on a instruction per instruction basis) the throughput of
> instructions (since we automatically infer most values using
> TargetLoweringInfo, and we have to do this from IR this is not going to be
> very accurate …).
>
>  /// \name Vector Target Information
>  /// @{
>  ...
>  /// \return The expected cost of arithmetic ops, such as mul, xor, fsub,
> et
>  virtual unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,
>  ...
>  /// @}
>
> At a high level, this api tries to answer the question: What does this
> instruction cost in a scalar form (“expf”, f32). Or what does this
> instruction cost in a vectorized form (“expf”, <4 x float>).
>
> BasicTTI::getIntrinsicInstrCost() assumes a cost of 1 for intrinsics that
> have a corresponding ISA instruction
> (TLoweringI->isOperationLegalOrPromote(ISD:FEXP) returns true), a cost of
> 10 for the ones that don’t and then we also incorporate things like type
> legalization costs, and overhead if we vectorize.
>
> For the new BasicTTI::getCallInstrCost(Function, RetTy, ArgTys) we would
> also return 10 for scalar versions of the function (RetTy->isVectorTy() ==
> false).
> For vector queries (RetTy->isVectorTy()==true), if there is a a
> TLibInfo->isVectorizableFunction(Function->getCalledFunction->getName(),
> RetTy->getVectorNumElements()) we should also return 10. Otherwise, we
> estimate the cost of scalarization just like we do in
> getIntrinsicInstrCost. This will guarantee that the vectorize library
> function call (Cost = 10) will be chosen over the intrinsic lowered to a
> sequence of scalarized lib calls (Cost = 10 * VF * …).
>
> Then, in LoopVectorizationCostModel::getInstructionCost() you would query
> both (if getInstrinsicIDForCall returns an id) apis and return the smallest:
>
>  case Call:
>    CallInst *CI = cast<CallInst>(I);
>
>
>     Type *RetTy = ToVectorTy(CI->getType(), VF);
>     SmallVector<Type*, 4> Tys;
>     for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
>       Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));
>     unsigned LibFuncCallCost =
> TTI.getCallInstrCost(CI->getCalledFunction(), RetTy, Tys);
>
>     if (unsigned ID = getIntrinsicIDForCall(CI, TLI))
>       return std::min(LibFuncCallCost, TTI.getIntrinsicInstrCost(ID,
> RetTy, Tys));
>    return LibFuncCallCost;
>
>
> Thanks,
> Arnold
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140117/c36593c0/attachment.html>