RFC: Enable vectorization of call instructions in the loop vectorizer

James Molloy james at jamesmolloy.co.uk
Fri Jan 17 02:59:41 PST 2014


Hi Arnold,

> First, we are going to have the situation where there exists an intrinsic
ID for a library function (many math library functions have an intrinsic
version: expf -> llvm.exp.32 for example). As a consequence
“getIntrinsicIDForCall” will return it. In this case we can have both: a
vectorized library function version and an intrinsic function that maybe
slower or faster. In such a case the cost model has to decide which one to
pick. This means we have to query the cost model which one is cheaper in
two places: when get the instruction cost and when we vectorize the call.

Sure, I will address this.

> Second, the way we test this. [snip]

This is very sensible. The only reason I didn't go down this route to start
with was that I didn't know of an available library (like Accelerate) and
didn't want to add testing/dummy code in tree. Thanks for pointing me at
Accelerate - that'll give me a real library to (semi) implement and test.

> This brings me to issue three. You are currently using TTI->getCallCost()
which is not meant to be used with the vectorizer. We should create a
getCallInstrCost() function similar to the “getIntrinsicInstrCost” function
we already have.
>
> BasicTTI::getCallInstrCost should query TLI->isFunctionVectorizable() and
return a sensible value in this case (one that is lower than a scalarized
intrinsic lowered as lib call).

I don't understand the difference between getIntrinsicCost and
getIntrinsicInstrCost. They both take the same arguments (but return
different values), and the doxygen docstring does not describe the action
in enough detail to discern what the required behaviour is.

Could you please tell me? (and I'll update the docstrings while I'm at it).

Cheers,

James

On 16 January 2014 22:51, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

> Hi James,
>
> overall I like this patch. Thanks for working on this! There three issues
> I would like to address:
>
> First, we are going to have the situation where there exists an intrinsic
> ID for a library function (many math library functions have an intrinsic
> version: expf -> llvm.exp.32 for example). As a consequence
> “getIntrinsicIDForCall” will return it. In this case we can have both: a
> vectorized library function version and an intrinsic function that maybe
> slower or faster. In such a case the cost model has to decide which one to
> pick. This means we have to query the cost model which one is cheaper in
> two places: when get the instruction cost and when we vectorize the call.
>
> Second, the way we test this. I understand that we currently don’t have
> anyone adding vectorize function calls in tree. However, I really would
> like not to have to use unit tests to test this feature. How about we use
> the Environment component (4th) to specify the available library. Say, on
> MacOSX you have the Accelerate library.
>
>      TLI.setUnavailable(LibFunc::statvfs64);
>      TLI.setUnavailable(LibFunc::tmpfile64);
>    }
> +
> +  // Make the vectorized versions available.
> +  if (T.getEnvironmentName() == "Accelerate") {
> +    const TargetLibraryInfo::VecDesc VecFuncs[] = {
> +      { "exp", "vexp", 2},
> +      { "expf", "vexpf", 4}
> +    };
> +    TLI.addVectorizableFunctions(VecFuncs);
> +  }
>  }
>
> Then, we can test this feature with
>
> target triple = "x86_64-apple-macos-Accelerate"
>
> define void @test(double* %d, double %t) {
>   ...
>   %1 = tail call double @llvm.exp.f64(double %0)
>
>
> We will also have to assign a lower cost to function calls of “vexp” in
> the cost model than for the intrinsic version (in this example). This
> brings me to issue three. You are currently using TTI->getCallCost() which
> is not meant to be used with the vectorizer. We should create a
> getCallInstrCost() function similar to the “getIntrinsicInstrCost” function
> we already have.
>
> BasicTTI::getCallInstrCost should query TLI->isFunctionVectorizable() and
> return a sensible value in this case (one that is lower than a scalarized
> intrinsic lowered as lib call).
>
>
> Thanks,
> Arnold
>
> On Jan 15, 2014, at 11:22 AM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
> > <vectorizer-tli.diff>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140117/24d49738/attachment.html>


More information about the llvm-commits mailing list