[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Mon Jul 2 15:58:59 PDT 2018

On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>  
>
> >It may not be a full solution for the problems you're trying to solve
>
>  
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
> declare simd issue separately.
>

I completely agree. We need a solution to handle 'declare simd' calls,
or to put it another way, arbitrary user-defined functions. To me, this
really looks like an ABI issue. If we have a function,
__foo__computeit8(<8 x float> %x), then if our lowering of <8 x float>
doesn't match the required register assignments, then we have the wrong
ABI. Will https://reviews.llvm.org/D47188 fix this?

 -Hal

>  
>
> >Or is there some reason that the vectorizer needs to be aware of
> those libcalls?
>
>  
>
> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib
> calls to actual library (or inlined sequence).
>
> So, that question needs to be answered by someone else.
>
>  
>
> Adding Michael and Hal.
>
>  
>
>  
>
> *From:*Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Monday, July 02, 2018 11:49 AM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* Venkataramanan Kumar <venkataramanan.kumar.llvm at gmail.com>;
> llvm-dev at lists.llvm.org; Masten, Matt <matt.masten at intel.com>;
> dccitaliano at gmail.com
> *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB
> calls?
>
>  
>
> It may not be a full solution for the problems you're trying to solve,
> but I don't know why adding to
> include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself.
> Certainly, it's a mess that could be organized, especially so we're
> not repeating everything for each data type as we do right now.
>
>  
>
> So yes, I think that would allow us to remove the VecLib mappings
> because we are always waiting until codegen to make the translation
> from generic IR to target-specific libcall. Or is there some reason
> that the vectorizer needs to be aware of those libcalls?
>
>  
>
> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <hideki.saito at intel.com
> <mailto:hideki.saito at intel.com>> wrote:
>
>      
>
>     Venkat, we did not invent LLVM’s VecLib functionality. The
>     original version of D19544
>     (https://reviews.llvm.org/D19544?id=55036) was indeed a separate
>     pass to convert widened math lib to SVML.
>
>     Our preference for “vectorized sin()” is just widened sin(), that
>     is to be lowered to a specific library call at a later point
>     (either as IR to IR or in CodeGen). Matt tried to sell that idea
>     and it didn’t go through.
>
>     Anyone else willing to work with us to try it again? In my
>     opinion, however, this is a related but different topic from
>     legalization issue.
>
>      
>
>     Sanjay, I think what you are suggesting would work better if we
>     don’t map math lib calls to VecLib. Otherwise, we’ll have too many
>     RTLIB:VECLIB_ enums, one from each different math function
>     multiplied by each vectorization factor --- for each different
>     VecLib. That’s way too many. If that’s one per different math
>     functions, I’d guess it’s 100+. Still a lot but manageable. This
>     requires those functions to be listed in the intrinsics, right?
>     That’s another reason some people favor VecLib mapping at
>     vectorizer. Those math functions don’t have to be added to the
>     intrinsics.
>
>      
>
>     I don’t insist on IR to IR legalization. However, I’m also
>     interested in being able to legalize OpenMP declare simd function
>     calls (**). These are user functions and as such we have no ways
>     to list them as intrinsics or have RTLIB: enums predefined. For
>     each Target, vector function ABI defines how the parameters need
>     to be passed and Legalizer should be implemented based on the ABI,
>     w/o knowing the details of what the user function does. Math lib
>     only solution doesn’t help legalization of OpenMP declare simd.
>
>      
>
>     Thanks,
>
>     Hideki
>
>      
>
>     --------------------------------
>
>     (**)
>
>     #pragma omp declare simd uniform(a), linear(i)
>
>     void foo(float *a, int i);
>
>      
>
>     …
>
>      
>
>     #pragma omp simd
>
>     for(i) {                   // this loop could be vectorized with
>     VF that’s wider than widest available vector function for foo().
>         …
>         foo(a, i)
>         …
>
>     }
>
>      
>
>     *From:*Venkataramanan Kumar
>     [mailto:venkataramanan.kumar.llvm at gmail.com
>     <mailto:venkataramanan.kumar.llvm at gmail.com>]
>     *Sent:* Sunday, July 01, 2018 11:38 PM
>     *To:* Sanjay Patel <spatel at rotateright.com
>     <mailto:spatel at rotateright.com>>
>     *Cc:* Saito, Hideki <hideki.saito at intel.com
>     <mailto:hideki.saito at intel.com>>; llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>; Masten, Matt
>     <matt.masten at intel.com <mailto:matt.masten at intel.com>>;
>     dccitaliano at gmail.com <mailto:dccitaliano at gmail.com>
>     *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize
>     VECLIB calls?
>
>      
>
>     Adding to Ashutosh's comments,  We are also interested in making
>     LLVM generate vector math library calls that are available with
>     glibc (version > 2.22).
>
>      
>
>     reference: https://sourceware.org/glibc/wiki/libmvec
>
>      
>
>     Using the example case given in the reference, we found there are 
>     2 vector versions for "sin" (4 X double) with same VF namely
>     _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions. 
>     Following the SVML path adding new entry in VecDesc structure in
>     TargetLibraryInfo.cpp,  we can generate the vector version.
>
>      
>
>     But unable to decide which version to expand in the vectorizer. We
>     needed the  TTI information (ISA ).  It looks like better to
>     legalize or generate them later.
>
>      
>
>     regards,
>
>     Venkat.
>
>      
>
>      
>
>     On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>         Hi Hideki -
>
>          
>
>         I hinted at this problem in the summary text of
>         https://reviews.llvm.org/D47610:
>
>         Why are we transforming from LLVM intrinsics to
>         platform-specific intrinsics in IR? I don't see the benefit.
>
>          
>
>         I don't know if it solves all of the problems you're seeing,
>         but it should be a small change to transform to the
>         platform-specific SVML or other intrinsics in the DAG. We
>         already do this for mathlib calls on Linux for example when we
>         can use the finite versions of the calls. Have a look in
>         SelectionDAGLegalize::ConvertNodeToLibcall():
>
>          
>
>             if (CanUseFiniteLibCall &&
>         DAG.getLibInfo().has(LibFunc_log_finite))
>               Results.push_back(ExpandFPLibCall(Node,
>         RTLIB::LOG_FINITE_F32,
>                                                 RTLIB::LOG_FINITE_F64,
>                                                 RTLIB::LOG_FINITE_F80,
>                                                 RTLIB::LOG_FINITE_F128,
>                                                
>         RTLIB::LOG_FINITE_PPCF128));
>             else
>               Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
>         RTLIB::LOG_F64,
>                                                 RTLIB::LOG_F80,
>         RTLIB::LOG_F128,
>                                                 RTLIB::LOG_PPCF128));
>
>          
>
>          
>
>          
>
>          
>
>         On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki
>         <hideki.saito at intel.com <mailto:hideki.saito at intel.com>> wrote:
>
>              
>
>             Ashutosh,
>
>              
>
>             Thanks for the repy.
>
>              
>
>             Related earlier topic on this appears in the review of the
>             SVML patch (@mmasten). Adding few names from there.
>
>             https://reviews.llvm.org/D19544
>
>             There, I see Hal’s review comment “let’s start only with
>             the directly-legal calls”. Apparently, what we have right now
>
>             in the trunk is “not legal enough”. I’ll work on the patch
>             to stop bleeding while we continue to discuss legalization
>             topic.
>
>              
>
>             I suppose
>
>             1)      LV only solution (let LV emit already legalized
>             VECLIB calls) is certainly not scalable. It won’t help if
>             VECLIB calls
>             are generated elsewhere. Also, keeping VF low enough to
>             prevent the legalization problem is only a workaround,
>             not a solution.
>
>             2)      Assuming that we have to go to IR to IR pass
>             route, there are 3 ways to think:
>
>             a.       Go with very generic IR to IR legalization pass
>             comparable to ISD level legalization. This is most general
>             but I’d think this is the highest cost for development.
>
>             b.      Go with Intrinsic-only legalization and then apply
>             VECLIB afterwards. This requires all scalar functions
>             with VECLIB mapping to be added to intrinsic.
>
>             c.       Go with generic enough function call
>             legalization, with the ability to add custom legalization
>             for each VECLIB
>             (and if needed each VECLIB or non-VECLIB entry).
>
>              
>
>             I think the cost of 2.b) and 2.c) are similar and 2.c)
>             seems to be more flexible. So, I guess we don’t really
>             have to tie this
>
>             discussion with “letting LV emit widened math call instead
>             of VECLIB”, even though I strongly favor that than LV emitting
>
>             VECLIB calls.
>
>              
>
>             @Davide, in D19544, @spatel thought LibCallSimplifier has
>             relevance to this legalization topic. Do you know enough about
>
>             LibCallSimiplifer to tell whether it can be extended to
>             deal with 2.b) or 2.c)?
>
>              
>
>             If we think 2.b)/2.c) are right enough directions, I can
>             clean up what we have and upload it to Phabricator as a
>             starting point
>
>             to get to 2.b)/2.c).
>
>              
>
>             Continue waiting for more feedback. I guess I shouldn’t
>             expect a lot this week and next due to the big holiday in
>             the U.S.
>
>              
>
>             Thanks,
>
>             Hideki
>
>              
>
>             *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com
>             <mailto:Ashutosh.Nema at amd.com>]
>             *Sent:* Thursday, June 28, 2018 11:37 PM
>             *To:* Saito, Hideki <hideki.saito at intel.com
>             <mailto:hideki.saito at intel.com>>
>             *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>             *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB
>             calls?
>
>              
>
>             Hi Saito,
>
>              
>
>             At AMD we have our own version of vector library and faced
>             similar problems, we followed the SVML path and from
>             vectorizer generated the respective vector calls. When
>             vectorizer generates the respective calls i.e __svml_sin_4
>             or __amdlibm_sin_4, later one can perform only string
>             matching to identify the vector lib call. I’m not sure
>             it’s the proper way, may be instead of generating
>             respective calls it’s better to generate some standard
>             call (may be intrinsics) and lower it later. A late IR
>             pass can be introduced to perform lowering, this will
>             lower the intrinsic calls to specific lib
>             calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
>             table driven to decide the action based on the vector
>             library, function name, VF and target information, the
>             action can be full-serialize, partial-serialize(VF8 to 2
>             VF4) or generate the lib call with same VF.
>
>              
>
>             Thanks,
>
>             Ashutosh
>
>              
>
>             *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org]
>             *On Behalf Of *Saito, Hideki via llvm-dev
>             *Sent:* Friday, June 29, 2018 7:41 AM
>             *To:* 'Saito, Hideki via llvm-dev'
>             <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>             *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize
>             VECLIB calls?
>
>              
>
>              
>
>             Illustrative Example:
>
>              
>
>             clang -fveclib=SVML -O3 svml.c -mavx
>
>              
>
>             #include <math.h>
>
>             void foo(double *a, int N){
>
>               int i;
>
>             #pragma clang loop vectorize_width(8)
>
>               for (i=0;i<N;i++){
>
>                 a[i] = sin(i);
>
>               }
>
>             }
>
>              
>
>             Currently, this results in a call to <8 x double>
>             __svml_sin8(<8 x double>) after the vectorizer.
>
>             This is 8-element SVML sin() called with 8-element
>             argument. On the surface, this looks very good.
>
>             Later on, standard vector type legalization kicks-in but
>             only the argument and return data are legalized.
>
>                     vmovaps %ymm0, %ymm1
>
>                     vcvtdq2pd       %xmm1, %ymm0
>
>                     vextractf128    $1, %ymm1, %xmm1
>
>                     vcvtdq2pd       %xmm1, %ymm1
>
>                     callq   __svml_sin8
>
>                     vmovups %ymm1, 32(%r15,%r12,8)
>
>                     vmovups %ymm0, (%r15,%r12,8)
>
>             Unfortunately, __svml_sin8() doesn’t use this form of
>             input/output. It takes zmm0 and returns zmm0.
>
>             i.e., not legal to use for AVX.
>
>              
>
>             What we need to see instead is two calls to __svml_sin4(),
>             like below.
>
>                     vmovaps %ymm0, %ymm1
>
>                     vcvtdq2pd       %xmm1, %ymm0
>
>                     vextractf128    $1, %ymm1, %xmm1
>
>                     vcvtdq2pd       %xmm1, %ymm1
>
>                     callq   __svml_sin4
>
>                     vmovups %ymm0, 32(%r15,%r12,8)
>
>                     vmovups %ymm1, ymm0
>
>                     callq   __svml_sin4
>
>                     vmovups %ymm0, (%r15,%r12,8)
>
>              
>
>             What would be the most acceptable way to make this happen?
>             Anybody having had a similar need previously?
>
>              
>
>             Easiest workaround is to serialize the call above “type
>             legal” vectorization factor. This can be done with a few
>             lines of code,
>
>             plus the code to recognize that the call is “SVML” (which
>             is currently string match against “__svml” prefix in my
>             local workspace).
>
>             If higher VF is not forced, cost model will likely favor
>             lower VF. Functionally correct, but obviously not an ideal
>             solution.
>
>              
>
>             Here are a few ideas I thought about:
>
>             1)      Standard LegalizeVectorType() in
>             CodeGen/SelectionDAG doesn’t seem to work. We could define
>             a generic ISD::VECLIB
>             and try to split into two or more VECLIB nodes, but at
>             that moment we lost the information about which function
>             to call.
>             We can’t define ISD opcode per function. There will be too
>             many libm entries to deal with. We need a scalable solution.
>
>             2)      We could write an IR to IR pass to perform IR
>             level legalization. This is essentially duplicating the
>             functionality of LegalizeVectorType()
>             but we can make this available for other similar things
>             that can’t use ISD level vector type legalization. This
>             looks to be attractive enough
>             from that perspective.
>
>             3)      We have implemented something similar to 2), but
>             legalization code is specialized for SVML legalization.
>             This was much quicker than
>             trying to generalize the legalization scheme, but I’d
>             imagine community won’t like it.
>
>             4)      Vectorizer emit legalized VECLIB calls. Since it
>             can emit instructions in scalarized form, adding legalized
>             call functionality is in some sense
>             similar to that. Vectorizer can’t simply choose type legal
>             function name with illegal vector ---- since
>             LegalizeVectorType() will still
>             end up using one call instead of two.
>
>              
>
>             Anything else?
>
>              
>
>             Also, doing any of this requires reverse mapping from
>             VECLIB name to scalar function name. What’s the most
>             recommended way to do so?
>
>             Can we use TableGen to create a reverse map?
>
>              
>
>             Your input is greatly appreciated. Is there a real
>             need/desire for 2) outside of VECLIB (or outside of SVML)?
>
>              
>
>             Thanks,
>
>             Hideki Saito
>
>             Intel Corporation
>
>              
>
>              
>
>          
>
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>      
>
>  
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/e9207e05/attachment-0001.html>