[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 2 15:58:59 PDT 2018
On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>
>
> >It may not be a full solution for the problems you're trying to solve
>
>
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
> declare simd issue separately.
>
I completely agree. We need a solution to handle 'declare simd' calls,
or to put it another way, arbitrary user-defined functions. To me, this
really looks like an ABI issue. If we have a function,
__foo__computeit8(<8 x float> %x), then if our lowering of <8 x float>
doesn't match the required register assignments, then we have the wrong
ABI. Will https://reviews.llvm.org/D47188 fix this?
-Hal
>
>
> >Or is there some reason that the vectorizer needs to be aware of
> those libcalls?
>
>
>
> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib
> calls to actual library (or inlined sequence).
>
> So, that question needs to be answered by someone else.
>
>
>
> Adding Michael and Hal.
>
>
>
>
>
> *From:*Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Monday, July 02, 2018 11:49 AM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* Venkataramanan Kumar <venkataramanan.kumar.llvm at gmail.com>;
> llvm-dev at lists.llvm.org; Masten, Matt <matt.masten at intel.com>;
> dccitaliano at gmail.com
> *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB
> calls?
>
>
>
> It may not be a full solution for the problems you're trying to solve,
> but I don't know why adding to
> include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself.
> Certainly, it's a mess that could be organized, especially so we're
> not repeating everything for each data type as we do right now.
>
>
>
> So yes, I think that would allow us to remove the VecLib mappings
> because we are always waiting until codegen to make the translation
> from generic IR to target-specific libcall. Or is there some reason
> that the vectorizer needs to be aware of those libcalls?
>
>
>
> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <hideki.saito at intel.com
> <mailto:hideki.saito at intel.com>> wrote:
>
>
>
> Venkat, we did not invent LLVM’s VecLib functionality. The
> original version of D19544
> (https://reviews.llvm.org/D19544?id=55036) was indeed a separate
> pass to convert widened math lib to SVML.
>
> Our preference for “vectorized sin()” is just widened sin(), that
> is to be lowered to a specific library call at a later point
> (either as IR to IR or in CodeGen). Matt tried to sell that idea
> and it didn’t go through.
>
> Anyone else willing to work with us to try it again? In my
> opinion, however, this is a related but different topic from
> legalization issue.
>
>
>
> Sanjay, I think what you are suggesting would work better if we
> don’t map math lib calls to VecLib. Otherwise, we’ll have too many
> RTLIB:VECLIB_ enums, one from each different math function
> multiplied by each vectorization factor --- for each different
> VecLib. That’s way too many. If that’s one per different math
> functions, I’d guess it’s 100+. Still a lot but manageable. This
> requires those functions to be listed in the intrinsics, right?
> That’s another reason some people favor VecLib mapping at
> vectorizer. Those math functions don’t have to be added to the
> intrinsics.
>
>
>
> I don’t insist on IR to IR legalization. However, I’m also
> interested in being able to legalize OpenMP declare simd function
> calls (**). These are user functions and as such we have no ways
> to list them as intrinsics or have RTLIB: enums predefined. For
> each Target, vector function ABI defines how the parameters need
> to be passed and Legalizer should be implemented based on the ABI,
> w/o knowing the details of what the user function does. Math lib
> only solution doesn’t help legalization of OpenMP declare simd.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> --------------------------------
>
> (**)
>
> #pragma omp declare simd uniform(a), linear(i)
>
> void foo(float *a, int i);
>
>
>
> …
>
>
>
> #pragma omp simd
>
> for(i) { // this loop could be vectorized with
> VF that’s wider than widest available vector function for foo().
> …
> foo(a, i)
> …
>
> }
>
>
>
> *From:*Venkataramanan Kumar
> [mailto:venkataramanan.kumar.llvm at gmail.com
> <mailto:venkataramanan.kumar.llvm at gmail.com>]
> *Sent:* Sunday, July 01, 2018 11:38 PM
> *To:* Sanjay Patel <spatel at rotateright.com
> <mailto:spatel at rotateright.com>>
> *Cc:* Saito, Hideki <hideki.saito at intel.com
> <mailto:hideki.saito at intel.com>>; llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>; Masten, Matt
> <matt.masten at intel.com <mailto:matt.masten at intel.com>>;
> dccitaliano at gmail.com <mailto:dccitaliano at gmail.com>
> *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize
> VECLIB calls?
>
>
>
> Adding to Ashutosh's comments, We are also interested in making
> LLVM generate vector math library calls that are available with
> glibc (version > 2.22).
>
>
>
> reference: https://sourceware.org/glibc/wiki/libmvec
>
>
>
> Using the example case given in the reference, we found there are
> 2 vector versions for "sin" (4 X double) with same VF namely
> _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.
> Following the SVML path adding new entry in VecDesc structure in
> TargetLibraryInfo.cpp, we can generate the vector version.
>
>
>
> But unable to decide which version to expand in the vectorizer. We
> needed the TTI information (ISA ). It looks like better to
> legalize or generate them later.
>
>
>
> regards,
>
> Venkat.
>
>
>
>
>
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Hi Hideki -
>
>
>
> I hinted at this problem in the summary text of
> https://reviews.llvm.org/D47610:
>
> Why are we transforming from LLVM intrinsics to
> platform-specific intrinsics in IR? I don't see the benefit.
>
>
>
> I don't know if it solves all of the problems you're seeing,
> but it should be a small change to transform to the
> platform-specific SVML or other intrinsics in the DAG. We
> already do this for mathlib calls on Linux for example when we
> can use the finite versions of the calls. Have a look in
> SelectionDAGLegalize::ConvertNodeToLibcall():
>
>
>
> if (CanUseFiniteLibCall &&
> DAG.getLibInfo().has(LibFunc_log_finite))
> Results.push_back(ExpandFPLibCall(Node,
> RTLIB::LOG_FINITE_F32,
> RTLIB::LOG_FINITE_F64,
> RTLIB::LOG_FINITE_F80,
> RTLIB::LOG_FINITE_F128,
>
> RTLIB::LOG_FINITE_PPCF128));
> else
> Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
> RTLIB::LOG_F64,
> RTLIB::LOG_F80,
> RTLIB::LOG_F128,
> RTLIB::LOG_PPCF128));
>
>
>
>
>
>
>
>
>
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki
> <hideki.saito at intel.com <mailto:hideki.saito at intel.com>> wrote:
>
>
>
> Ashutosh,
>
>
>
> Thanks for the repy.
>
>
>
> Related earlier topic on this appears in the review of the
> SVML patch (@mmasten). Adding few names from there.
>
> https://reviews.llvm.org/D19544
>
> There, I see Hal’s review comment “let’s start only with
> the directly-legal calls”. Apparently, what we have right now
>
> in the trunk is “not legal enough”. I’ll work on the patch
> to stop bleeding while we continue to discuss legalization
> topic.
>
>
>
> I suppose
>
> 1) LV only solution (let LV emit already legalized
> VECLIB calls) is certainly not scalable. It won’t help if
> VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to
> prevent the legalization problem is only a workaround,
> not a solution.
>
> 2) Assuming that we have to go to IR to IR pass
> route, there are 3 ways to think:
>
> a. Go with very generic IR to IR legalization pass
> comparable to ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
>
> b. Go with Intrinsic-only legalization and then apply
> VECLIB afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
>
> c. Go with generic enough function call
> legalization, with the ability to add custom legalization
> for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>
>
>
> I think the cost of 2.b) and 2.c) are similar and 2.c)
> seems to be more flexible. So, I guess we don’t really
> have to tie this
>
> discussion with “letting LV emit widened math call instead
> of VECLIB”, even though I strongly favor that than LV emitting
>
> VECLIB calls.
>
>
>
> @Davide, in D19544, @spatel thought LibCallSimplifier has
> relevance to this legalization topic. Do you know enough about
>
> LibCallSimiplifer to tell whether it can be extended to
> deal with 2.b) or 2.c)?
>
>
>
> If we think 2.b)/2.c) are right enough directions, I can
> clean up what we have and upload it to Phabricator as a
> starting point
>
> to get to 2.b)/2.c).
>
>
>
> Continue waiting for more feedback. I guess I shouldn’t
> expect a lot this week and next due to the big holiday in
> the U.S.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com
> <mailto:Ashutosh.Nema at amd.com>]
> *Sent:* Thursday, June 28, 2018 11:37 PM
> *To:* Saito, Hideki <hideki.saito at intel.com
> <mailto:hideki.saito at intel.com>>
> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB
> calls?
>
>
>
> Hi Saito,
>
>
>
> At AMD we have our own version of vector library and faced
> similar problems, we followed the SVML path and from
> vectorizer generated the respective vector calls. When
> vectorizer generates the respective calls i.e __svml_sin_4
> or __amdlibm_sin_4, later one can perform only string
> matching to identify the vector lib call. I’m not sure
> it’s the proper way, may be instead of generating
> respective calls it’s better to generate some standard
> call (may be intrinsics) and lower it later. A late IR
> pass can be introduced to perform lowering, this will
> lower the intrinsic calls to specific lib
> calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
> table driven to decide the action based on the vector
> library, function name, VF and target information, the
> action can be full-serialize, partial-serialize(VF8 to 2
> VF4) or generate the lib call with same VF.
>
>
>
> Thanks,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org]
> *On Behalf Of *Saito, Hideki via llvm-dev
> *Sent:* Friday, June 29, 2018 7:41 AM
> *To:* 'Saito, Hideki via llvm-dev'
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize
> VECLIB calls?
>
>
>
>
>
> Illustrative Example:
>
>
>
> clang -fveclib=SVML -O3 svml.c -mavx
>
>
>
> #include <math.h>
>
> void foo(double *a, int N){
>
> int i;
>
> #pragma clang loop vectorize_width(8)
>
> for (i=0;i<N;i++){
>
> a[i] = sin(i);
>
> }
>
> }
>
>
>
> Currently, this results in a call to <8 x double>
> __svml_sin8(<8 x double>) after the vectorizer.
>
> This is 8-element SVML sin() called with 8-element
> argument. On the surface, this looks very good.
>
> Later on, standard vector type legalization kicks-in but
> only the argument and return data are legalized.
>
> vmovaps %ymm0, %ymm1
>
> vcvtdq2pd %xmm1, %ymm0
>
> vextractf128 $1, %ymm1, %xmm1
>
> vcvtdq2pd %xmm1, %ymm1
>
> callq __svml_sin8
>
> vmovups %ymm1, 32(%r15,%r12,8)
>
> vmovups %ymm0, (%r15,%r12,8)
>
> Unfortunately, __svml_sin8() doesn’t use this form of
> input/output. It takes zmm0 and returns zmm0.
>
> i.e., not legal to use for AVX.
>
>
>
> What we need to see instead is two calls to __svml_sin4(),
> like below.
>
> vmovaps %ymm0, %ymm1
>
> vcvtdq2pd %xmm1, %ymm0
>
> vextractf128 $1, %ymm1, %xmm1
>
> vcvtdq2pd %xmm1, %ymm1
>
> callq __svml_sin4
>
> vmovups %ymm0, 32(%r15,%r12,8)
>
> vmovups %ymm1, ymm0
>
> callq __svml_sin4
>
> vmovups %ymm0, (%r15,%r12,8)
>
>
>
> What would be the most acceptable way to make this happen?
> Anybody having had a similar need previously?
>
>
>
> Easiest workaround is to serialize the call above “type
> legal” vectorization factor. This can be done with a few
> lines of code,
>
> plus the code to recognize that the call is “SVML” (which
> is currently string match against “__svml” prefix in my
> local workspace).
>
> If higher VF is not forced, cost model will likely favor
> lower VF. Functionally correct, but obviously not an ideal
> solution.
>
>
>
> Here are a few ideas I thought about:
>
> 1) Standard LegalizeVectorType() in
> CodeGen/SelectionDAG doesn’t seem to work. We could define
> a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at
> that moment we lost the information about which function
> to call.
> We can’t define ISD opcode per function. There will be too
> many libm entries to deal with. We need a scalable solution.
>
> 2) We could write an IR to IR pass to perform IR
> level legalization. This is essentially duplicating the
> functionality of LegalizeVectorType()
> but we can make this available for other similar things
> that can’t use ISD level vector type legalization. This
> looks to be attractive enough
> from that perspective.
>
> 3) We have implemented something similar to 2), but
> legalization code is specialized for SVML legalization.
> This was much quicker than
> trying to generalize the legalization scheme, but I’d
> imagine community won’t like it.
>
> 4) Vectorizer emit legalized VECLIB calls. Since it
> can emit instructions in scalarized form, adding legalized
> call functionality is in some sense
> similar to that. Vectorizer can’t simply choose type legal
> function name with illegal vector ---- since
> LegalizeVectorType() will still
> end up using one call instead of two.
>
>
>
> Anything else?
>
>
>
> Also, doing any of this requires reverse mapping from
> VECLIB name to scalar function name. What’s the most
> recommended way to do so?
>
> Can we use TableGen to create a reverse map?
>
>
>
> Your input is greatly appreciated. Is there a real
> need/desire for 2) outside of VECLIB (or outside of SVML)?
>
>
>
> Thanks,
>
> Hideki Saito
>
> Intel Corporation
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/e9207e05/attachment-0001.html>
More information about the llvm-dev
mailing list