[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Mon Jul 2 11:48:44 PDT 2018

It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.

So yes, I think that would allow us to remove the VecLib mappings because
we are always waiting until codegen to make the translation from generic IR
to target-specific libcall. Or is there some reason that the vectorizer
needs to be aware of those libcalls?

On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <hideki.saito at intel.com>
wrote:

>
>
> Venkat, we did not invent LLVM’s VecLib functionality. The original
> version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a
> separate pass to convert widened math lib to SVML.
>
> Our preference for “vectorized sin()” is just widened sin(), that is to be
> lowered to a specific library call at a later point (either as IR to IR or
> in CodeGen). Matt tried to sell that idea and it didn’t go through.
>
> Anyone else willing to work with us to try it again? In my opinion,
> however, this is a related but different topic from legalization issue.
>
>
>
> Sanjay, I think what you are suggesting would work better if we don’t map
> math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_
> enums, one from each different math function multiplied by each
> vectorization factor --- for each different VecLib. That’s way too many. If
> that’s one per different math functions, I’d guess it’s 100+. Still a lot
> but manageable. This requires those functions to be listed in the
> intrinsics, right? That’s another reason some people favor VecLib mapping
> at vectorizer. Those math functions don’t have to be added to the
> intrinsics.
>
>
>
> I don’t insist on IR to IR legalization. However, I’m also interested in
> being able to legalize OpenMP declare simd function calls (**). These are
> user functions and as such we have no ways to list them as intrinsics or
> have RTLIB: enums predefined. For each Target, vector function ABI defines
> how the parameters need to be passed and Legalizer should be implemented
> based on the ABI, w/o knowing the details of what the user function does.
> Math lib only solution doesn’t help legalization of OpenMP declare simd.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> --------------------------------
>
> (**)
>
> #pragma omp declare simd uniform(a), linear(i)
>
> void foo(float *a, int i);
>
>
>
> …
>
>
>
> #pragma omp simd
>
> for(i) {                   // this loop could be vectorized with VF that’s
> wider than widest available vector function for foo().
>     …
>     foo(a, i)
>     …
>
> }
>
>
>
> *From:* Venkataramanan Kumar [mailto:venkataramanan.kumar.llvm at gmail.com]
> *Sent:* Sunday, July 01, 2018 11:38 PM
> *To:* Sanjay Patel <spatel at rotateright.com>
> *Cc:* Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org;
> Masten, Matt <matt.masten at intel.com>; dccitaliano at gmail.com
> *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB
> calls?
>
>
>
> Adding to Ashutosh's comments,  We are also interested in making LLVM
> generate vector math library calls that are available with glibc (version >
> 2.22).
>
>
>
> reference: https://sourceware.org/glibc/wiki/libmvec
>
>
>
> Using the example case given in the reference, we found there are  2
> vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin
> (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path
> adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can
> generate the vector version.
>
>
>
> But unable to decide which version to expand in the vectorizer. We needed
> the  TTI information (ISA ).  It looks like better to legalize or generate
> them later.
>
>
>
> regards,
>
> Venkat.
>
>
>
>
>
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Hideki -
>
>
>
> I hinted at this problem in the summary text of
> https://reviews.llvm.org/D47610:
>
> Why are we transforming from LLVM intrinsics to platform-specific
> intrinsics in IR? I don't see the benefit.
>
>
>
> I don't know if it solves all of the problems you're seeing, but it should
> be a small change to transform to the platform-specific SVML or other
> intrinsics in the DAG. We already do this for mathlib calls on Linux for
> example when we can use the finite versions of the calls. Have a look in
> SelectionDAGLegalize::ConvertNodeToLibcall():
>
>
>
>     if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
> RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80, RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>
>
>
>
>
>
>
>
>
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <hideki.saito at intel.com>
> wrote:
>
>
>
> Ashutosh,
>
>
>
> Thanks for the repy.
>
>
>
> Related earlier topic on this appears in the review of the SVML patch
> (@mmasten). Adding few names from there.
>
> https://reviews.llvm.org/D19544
>
> There, I see Hal’s review comment “let’s start only with the
> directly-legal calls”. Apparently, what we have right now
>
> in the trunk is “not legal enough”. I’ll work on the patch to stop
> bleeding while we continue to discuss legalization topic.
>
>
>
> I suppose
>
> 1)      LV only solution (let LV emit already legalized VECLIB calls) is
> certainly not scalable. It won’t help if VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to prevent the
> legalization problem is only a workaround,
> not a solution.
>
> 2)      Assuming that we have to go to IR to IR pass route, there are 3
> ways to think:
>
> a.       Go with very generic IR to IR legalization pass comparable to
> ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
>
> b.      Go with Intrinsic-only legalization and then apply VECLIB
> afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
>
> c.       Go with generic enough function call legalization, with the
> ability to add custom legalization for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>
>
>
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more
> flexible. So, I guess we don’t really have to tie this
>
> discussion with “letting LV emit widened math call instead of VECLIB”,
> even though I strongly favor that than LV emitting
>
> VECLIB calls.
>
>
>
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
> this legalization topic. Do you know enough about
>
> LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or
> 2.c)?
>
>
>
> If we think 2.b)/2.c) are right enough directions, I can clean up what we
> have and upload it to Phabricator as a starting point
>
> to get to 2.b)/2.c).
>
>
>
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this
> week and next due to the big holiday in the U.S.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
> *Sent:* Thursday, June 28, 2018 11:37 PM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
> Hi Saito,
>
>
>
> At AMD we have our own version of vector library and faced similar
> problems, we followed the SVML path and from vectorizer generated the
> respective vector calls. When vectorizer generates the respective calls i.e
> __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching
> to identify the vector lib call. I’m not sure it’s the proper way, may be
> instead of generating respective calls it’s better to generate some
> standard call (may be intrinsics) and lower it later. A late IR pass can be
> introduced to perform lowering, this will lower the intrinsic calls to
> specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
> table driven to decide the action based on the vector library, function
> name, VF and target information, the action can be full-serialize,
> partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>
>
>
> Thanks,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Saito, Hideki via
> llvm-dev
> *Sent:* Friday, June 29, 2018 7:41 AM
> *To:* 'Saito, Hideki via llvm-dev' <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
>
>
> Illustrative Example:
>
>
>
> clang -fveclib=SVML -O3 svml.c -mavx
>
>
>
> #include <math.h>
>
> void foo(double *a, int N){
>
>   int i;
>
> #pragma clang loop vectorize_width(8)
>
>   for (i=0;i<N;i++){
>
>     a[i] = sin(i);
>
>   }
>
> }
>
>
>
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x
> double>) after the vectorizer.
>
> This is 8-element SVML sin() called with 8-element argument. On the
> surface, this looks very good.
>
> Later on, standard vector type legalization kicks-in but only the argument
> and return data are legalized.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin8
>
>         vmovups %ymm1, 32(%r15,%r12,8)
>
>         vmovups %ymm0, (%r15,%r12,8)
>
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It
> takes zmm0 and returns zmm0.
>
> i.e., not legal to use for AVX.
>
>
>
> What we need to see instead is two calls to __svml_sin4(), like below.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, 32(%r15,%r12,8)
>
>         vmovups %ymm1, ymm0
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, (%r15,%r12,8)
>
>
>
> What would be the most acceptable way to make this happen? Anybody having
> had a similar need previously?
>
>
>
> Easiest workaround is to serialize the call above “type legal”
> vectorization factor. This can be done with a few lines of code,
>
> plus the code to recognize that the call is “SVML” (which is currently
> string match against “__svml” prefix in my local workspace).
>
> If higher VF is not forced, cost model will likely favor lower VF.
> Functionally correct, but obviously not an ideal solution.
>
>
>
> Here are a few ideas I thought about:
>
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
> seem to work. We could define a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at that moment we lost
> the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm
> entries to deal with. We need a scalable solution.
>
> 2)      We could write an IR to IR pass to perform IR level legalization.
> This is essentially duplicating the functionality of LegalizeVectorType()
> but we can make this available for other similar things that can’t use ISD
> level vector type legalization. This looks to be attractive enough
> from that perspective.
>
> 3)      We have implemented something similar to 2), but legalization
> code is specialized for SVML legalization. This was much quicker than
> trying to generalize the legalization scheme, but I’d imagine community
> won’t like it.
>
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
> instructions in scalarized form, adding legalized call functionality is in
> some sense
> similar to that. Vectorizer can’t simply choose type legal function name
> with illegal vector ---- since LegalizeVectorType() will still
> end up using one call instead of two.
>
>
>
> Anything else?
>
>
>
> Also, doing any of this requires reverse mapping from VECLIB name to
> scalar function name. What’s the most recommended way to do so?
>
> Can we use TableGen to create a reverse map?
>
>
>
> Your input is greatly appreciated. Is there a real need/desire for 2)
> outside of VECLIB (or outside of SVML)?
>
>
>
> Thanks,
>
> Hideki Saito
>
> Intel Corporation
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/4ed2c77c/attachment-0001.html>