[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Wed Oct 10 06:37:20 PDT 2018

On Tue, 9 Oct 2018 at 22:45, Francesco Petrogalli
<Francesco.Petrogalli at arm.com> wrote:
> > I assume this is OMP's pragma SIMD's job, for now. We may want to work
> > that out automatically if we see vector functions being defined in
> > some header, for example.
>
> No, I meant IR attribute. There is an RFC submitted by Intel that describe such attribute: http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html

Oh, the mangled names, right! I remember that RFC.

> What do you exactly mean with “OpenMP 5 for the rest”?

For the libraries that don't follow the mangling pattern above.

> If I got your question correctly, you are asking wether we can start using SLEEF before we set up this mechanism with OpenMP 5.0. If that the question, the answer is yes.

That was the question, yes. Thanks! :)

> We could use SLEEF by adding VECLIB option like it is done now for SVML in the TLI

I don't like any approach that needs specialised compiler support in
such a low level library.

> or we could use SLEEF and support Intel and Arm by using the libmvec compatible version of the library, libsleefgnuabi.so - this is my favorite solution as it is based on the vector function ABI standards of Intel and Arm.

Agreed.

> There is no way to get the cost of the vector function, other than the wrong assumption that cost(vector version) = cost(scalar version), which is not the case. By the way, why do you think that the cost of the vector version is scalar cost / VF?

Sorry, cost after taking VF into account. The cost itself would be
(naively and wrongly) the same as scalar.

> We could argue that vectorizing a math function is always beneficial?

I'm (possibly wrongly) worried about two things:

1. Cost of prologue/epilogue/shuffles

Different architectures have different ABIs, and some are more
efficient than others when calling vector functions.

Also, some vector extensions have features other do not, for example,
scatter/gather. Some libraries try to emulate that in vector code,
which is not always obviously beneficial.

If you're calling them explicitly in the code, by programmers that
know what they're doing, this is fine (as you expect them to have
benchmarked). But if this is the compiler deciding on its own and
taking that choice, we risk upsetting users.

One thing is to produce slow code because you didn't do enough,
another is because you did too much. People often understand the
former, not usually the latter. :)

2. Skipping direct codegen

This is a minor issue, but depending on how early this transformation
passes, it may hinder specialised codegen, particular to the
architecture, that could have been more efficient.

I don't have any example to hand, but imagine a machine has a sincos
implementation in scalar that is faster than 2-lane library call. The
compiler will assume, unwittingly, that the library call is better.

> Why do you say “pollute the IR”? The heuristics would not be added to the IR, they would be added in the code of the cost model. I am not sure I understand what you mean here.

Different libraries may have different "costs" for different
architectures. We can't possibly hold them in the cost model for all
known library implementations, past, present and future.

-- 
cheers,
--renato