[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Wed Jul 4 09:58:58 PDT 2018

On 07/04/2018 07:50 AM, Robert Lougher wrote:
> Hi,
>
> On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     + llvm-dev
>
>     -----Original Message-----
>     From: Nema, Ashutosh
>     Sent: Wednesday, July 4, 2018 12:12 PM
>     To: Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Saito,
>     Hideki <hideki.saito at intel.com <mailto:hideki.saito at intel.com>>;
>     Sanjay Patel <spatel at rotateright.com
>     <mailto:spatel at rotateright.com>>; mzolotukhin at apple.com
>     <mailto:mzolotukhin at apple.com>
>     Cc: dccitaliano at gmail.com <mailto:dccitaliano at gmail.com>; Masten,
>     Matt <matt.masten at intel.com <mailto:matt.masten at intel.com>>
>     Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize
>     VECLIB calls?
>
>     Hi Hal,
>
>     > __svml_sin8 (plus whatever shuffles are necessary).
>     > The vectorizer should do this.
>     > It should not generate calls to functions that don't exist.
>
>     I'm not sure how vectorizer will do this, consider the case where
>     "-vectorizer-maximize-bandwidth" option is enabled and vectorizer
>     is forced to generate the wider VF, and hence it may generate a
>     call to __svml_sin_* which may not exist.
>
>     Are you expecting the vectorizer to lower the calls i.e.
>     __svml_sin_8 to two __svml_sin_4 calls ?
>
>     Regards,
>     Ashutosh
>
>
> If an accurate cost model was in place (which there isn't), then an
> "unsupported" vectorization factor should only be selected if it was
> forced.  However, in this case __svml_sin_8 is the same cost as
> __svml_sin_4, so the loop vectorizer will select a VF of 8, and
> generate a call to a function which effectively doesn't exist.

Would it actually be the same, or would there be extra shuffle costs
associated with the calls to __svml_sin_4?

>
> The simplest way to fix it, is to simply only populate the SVML vector
> library table with __svml_sin_8 when the target is AVX-512.

I believe that this is exactly what we should do. When not targeting
AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable
ABI via which we can call it), and so it should not appear in the
vectorizer's list of options at all.

 -Hal

>   Alternatively, TLI.isFunctionVectorizable() should check that the
> entry is available on the target (this is more difficult as the type
> is not encoded).
>
> I'm guessing that the cost model would then make VF=4 cheaper, so
> generating calls to __svml_sin_4 (I'm not in work so can't check).  
> If the vectorization factor was forced to 8, we'll either get a call
> to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer
> will scalarize the call.  The vectorizer would not generate two calls
> to __svml_sin_4 although this would be cheaper.
>
> While this problem probably doesn't require the loop vectorizer to
> have knowledge of the target ABI, others may do.  I'm thinking
> specifically of D48193:
>
> https://reviews.llvm.org/D48193
>
> In this case we have poor code generation due to the interleave count
> selected by the loop vectorizer.  I can't see how this can be fixed
> later, so we will need to expose details of the ABI to the loop
> vectorizer (see my latest comment D48193#1149705).
>
> Thanks,
> Rob.
>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/bae85fe1/attachment.html>