[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Fri Jun 29 15:34:00 PDT 2018

Hi Hideki -

I hinted at this problem in the summary text of
https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific
intrinsics in IR? I don't see the benefit.

I don't know if it solves all of the problems you're seeing, but it should
be a small change to transform to the platform-specific SVML or other
intrinsics in the DAG. We already do this for mathlib calls on Linux for
example when we can use the finite versions of the calls. Have a look in
SelectionDAGLegalize::ConvertNodeToLibcall():

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));

On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <hideki.saito at intel.com>
wrote:

>
>
> Ashutosh,
>
>
>
> Thanks for the repy.
>
>
>
> Related earlier topic on this appears in the review of the SVML patch
> (@mmasten). Adding few names from there.
>
> https://reviews.llvm.org/D19544
>
> There, I see Hal’s review comment “let’s start only with the
> directly-legal calls”. Apparently, what we have right now
>
> in the trunk is “not legal enough”. I’ll work on the patch to stop
> bleeding while we continue to discuss legalization topic.
>
>
>
> I suppose
>
> 1)      LV only solution (let LV emit already legalized VECLIB calls) is
> certainly not scalable. It won’t help if VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to prevent the
> legalization problem is only a workaround,
> not a solution.
>
> 2)      Assuming that we have to go to IR to IR pass route, there are 3
> ways to think:
>
> a.       Go with very generic IR to IR legalization pass comparable to
> ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
>
> b.      Go with Intrinsic-only legalization and then apply VECLIB
> afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
>
> c.       Go with generic enough function call legalization, with the
> ability to add custom legalization for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>
>
>
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more
> flexible. So, I guess we don’t really have to tie this
>
> discussion with “letting LV emit widened math call instead of VECLIB”,
> even though I strongly favor that than LV emitting
>
> VECLIB calls.
>
>
>
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
> this legalization topic. Do you know enough about
>
> LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or
> 2.c)?
>
>
>
> If we think 2.b)/2.c) are right enough directions, I can clean up what we
> have and upload it to Phabricator as a starting point
>
> to get to 2.b)/2.c).
>
>
>
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this
> week and next due to the big holiday in the U.S.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
> *Sent:* Thursday, June 28, 2018 11:37 PM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
> Hi Saito,
>
>
>
> At AMD we have our own version of vector library and faced similar
> problems, we followed the SVML path and from vectorizer generated the
> respective vector calls. When vectorizer generates the respective calls i.e
> __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching
> to identify the vector lib call. I’m not sure it’s the proper way, may be
> instead of generating respective calls it’s better to generate some
> standard call (may be intrinsics) and lower it later. A late IR pass can be
> introduced to perform lowering, this will lower the intrinsic calls to
> specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
> table driven to decide the action based on the vector library, function
> name, VF and target information, the action can be full-serialize,
> partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>
>
>
> Thanks,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Saito, Hideki via
> llvm-dev
> *Sent:* Friday, June 29, 2018 7:41 AM
> *To:* 'Saito, Hideki via llvm-dev' <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
>
>
> Illustrative Example:
>
>
>
> clang -fveclib=SVML -O3 svml.c -mavx
>
>
>
> #include <math.h>
>
> void foo(double *a, int N){
>
>   int i;
>
> #pragma clang loop vectorize_width(8)
>
>   for (i=0;i<N;i++){
>
>     a[i] = sin(i);
>
>   }
>
> }
>
>
>
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x
> double>) after the vectorizer.
>
> This is 8-element SVML sin() called with 8-element argument. On the
> surface, this looks very good.
>
> Later on, standard vector type legalization kicks-in but only the argument
> and return data are legalized.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin8
>
>         vmovups %ymm1, 32(%r15,%r12,8)
>
>         vmovups %ymm0, (%r15,%r12,8)
>
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It
> takes zmm0 and returns zmm0.
>
> i.e., not legal to use for AVX.
>
>
>
> What we need to see instead is two calls to __svml_sin4(), like below.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, 32(%r15,%r12,8)
>
>         vmovups %ymm1, ymm0
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, (%r15,%r12,8)
>
>
>
> What would be the most acceptable way to make this happen? Anybody having
> had a similar need previously?
>
>
>
> Easiest workaround is to serialize the call above “type legal”
> vectorization factor. This can be done with a few lines of code,
>
> plus the code to recognize that the call is “SVML” (which is currently
> string match against “__svml” prefix in my local workspace).
>
> If higher VF is not forced, cost model will likely favor lower VF.
> Functionally correct, but obviously not an ideal solution.
>
>
>
> Here are a few ideas I thought about:
>
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
> seem to work. We could define a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at that moment we lost
> the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm
> entries to deal with. We need a scalable solution.
>
> 2)      We could write an IR to IR pass to perform IR level legalization.
> This is essentially duplicating the functionality of LegalizeVectorType()
> but we can make this available for other similar things that can’t use ISD
> level vector type legalization. This looks to be attractive enough
> from that perspective.
>
> 3)      We have implemented something similar to 2), but legalization
> code is specialized for SVML legalization. This was much quicker than
> trying to generalize the legalization scheme, but I’d imagine community
> won’t like it.
>
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
> instructions in scalarized form, adding legalized call functionality is in
> some sense
> similar to that. Vectorizer can’t simply choose type legal function name
> with illegal vector ---- since LegalizeVectorType() will still
> end up using one call instead of two.
>
>
>
> Anything else?
>
>
>
> Also, doing any of this requires reverse mapping from VECLIB name to
> scalar function name. What’s the most recommended way to do so?
>
> Can we use TableGen to create a reverse map?
>
>
>
> Your input is greatly appreciated. Is there a real need/desire for 2)
> outside of VECLIB (or outside of SVML)?
>
>
>
> Thanks,
>
> Hideki Saito
>
> Intel Corporation
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180629/0133ab5f/attachment.html>