[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Wed Jul 4 05:50:24 PDT 2018

Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>;
> Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com
> Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
> Hi Hal,
>
> > __svml_sin8 (plus whatever shuffles are necessary).
> > The vectorizer should do this.
> > It should not generate calls to functions that don't exist.
>
> I'm not sure how vectorizer will do this, consider the case where
> "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is
> forced to generate the wider VF, and hence it may generate a call to
> __svml_sin_* which may not exist.
>
> Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to
> two __svml_sin_4 calls ?
>
> Regards,
> Ashutosh
>

If an accurate cost model was in place (which there isn't), then an
"unsupported" vectorization factor should only be selected if it was
forced.  However, in this case __svml_sin_8 is the same cost as
__svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a
call to a function which effectively doesn't exist.

The simplest way to fix it, is to simply only populate the SVML vector
library table with __svml_sin_8 when the target is AVX-512.  Alternatively,
TLI.isFunctionVectorizable() should check that the entry is available on
the target (this is more difficult as the type is not encoded).

I'm guessing that the cost model would then make VF=4 cheaper, so
generating calls to __svml_sin_4 (I'm not in work so can't check).   If the
vectorization factor was forced to 8, we'll either get a call to the
intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will
scalarize the call.  The vectorizer would not generate two calls to
__svml_sin_4 although this would be cheaper.

While this problem probably doesn't require the loop vectorizer to have
knowledge of the target ABI, others may do.  I'm thinking specifically of
D48193:

https://reviews.llvm.org/D48193

In this case we have poor code generation due to the interleave count
selected by the loop vectorizer.  I can't see how this can be fixed later,
so we will need to expose details of the ABI to the loop vectorizer (see my
latest comment D48193#1149705).

Thanks,
Rob.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/e76932ab/attachment-0001.html>