RFC: Enable vectorization of call instructions in the loop vectorizer

Wed Dec 25 09:03:58 PST 2013

Hi Arnold,

Thanks for the reply over the Christmas season :)

Yep, your proposed scheme would work for me and I will implement that.
Given that I'm ditching uniforms (at least for now), I don't think a
library function is really required as the generation of the vectorized
function call is trivial.

Merry Christmas,

James

On 24 December 2013 22:43, Arnold <aschwaighofer at apple.com> wrote:

>
>
> Sent from my iPhone
>
> On Dec 22, 2013, at 4:04 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
>
> Hi Arnold,
>
> No worries, it's Christmas season so I expect long delays between replies
> (hence the day delay with this reply!)
>
> > I don't think that TargetLibraryInfo should do the transformation from
> scalar to vectorized call instruction itself.
> > I think, TargetLibraryInfo should just provide a mapping from function
> name to function name (including the vector factor, etc). There should > be
> a function that lives in lib/transforms that does the transformation. I
> don’t think IR type transformations should go into lib/Target.
>
> > Do we need to reuse the LibFunc enum to identify these functions?
>
> OK. I tried to go via LibFuncs purely so that TLI exposed a more coherent
> interface (call getLibFunc on a function name, get back an identifier for
> that function which you can then use to query the rest of TLI). I'm very
> happy with going direct from fn name to fn name, and leaving the creation
> of the CallInst to something else (probably LoopVectorizer itself).
>
>
> We will want to vectorize function calls in both vectorizers so a library
> function would be best.
>
>
> Regarding uniforms - I think the best way to handle these is to ignore
> them for the moment. At least in OpenCL, any function that can take a
> scalar uniform can also take a vector non-uniform. So mapping from all
> arguments scalar to all arguments vector is always valid, and will simplify
> the logic a lot. Then a later pass could, if it wanted, reidentify uniforms
> and change which function is called.
>
>
> Okay.
>
>
> > I don't understand why the function calls need to be virtual. The
> mapping function name to family of functions should capture everything we
> need?
> >  (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform
> params, 1}, …}
>
>
> As Nadav mentioned, the entire suite of OpenCL built in functions is
> massive - hundreds of functions. Pre-seeding a map with all of these
> functions is something I'd quite like to avoid. There are two options -
> parameterise TLI with some map, like you said, or make TLI's functionality
> overridable. I prefer the latter, because it allows an implementation to do
> a lot of work lazily which is important for compile time performance (see
> the performance improvements I had to implement in the LLVM bitcode linker
> as an example of how much lazy link performance matters - small kernel, big
> library).
>
> Making TLI's functionality overridable is, however, a soft requirement and
> I'm keen to satisfy reviewers. So if this use case doesn't wash with you,
> let me know and I'll make it take a map and sort out laziness my own way
> downstream.
>
>
>
> I don’t think we need to use virtual function calls to lazily setup the
> table.
>
> One of the reasons why I don’t like subclassing as an extension mechanism
> is that it does not compose. Say we had VecLibA and VecLibB (and other
> times just one of them) it is hard to express this with subclassing.
>
> How about a scheme similar to this:
>
> VecDesc *veclibtbl = {{ “cos”, “cos2”, 2}, {"cos", “cos4”, 4}, {"sin",
> "sin4", 4}, ...};
> SmallVector<VecDesc *, 4> VectorTables;
> mutable StringMap<const char *> VecToScalar;
>
> during initialization we push VectorDescs onto the vector.
>
> intialize(*tli) {
>   if (data layout.contains(“OpenCL”)
>     tli->addVectorTable(openclvectbl);
>   if (data layout.contains(“veclib”);
>     tli->addVectorTable(veclibtbl);
>   ...
> }
>
> When we query TLI for a vector variant we binary search each VecDesc for
> the function (similar to how we do for LibFuncs entries now).
>
> The VecToScalar map can be lazily initialized the first time it is queried.
>
>
> Would this work for you?
>
>
> > Can you infer the vectorized function name from the scalar function name
> in a predictable way (but why use an enum then)? I don’t understand the use
> case that requires virtual functions.
>
> Alas no, CL (my main use case) functions are overloaded and thus are
> name-mangled. I don't want to put  logic in *anywhere*.
>
>
> On 20 December 2013 18:45, Arnold <aschwaighofer at apple.com> wrote:
>
>> Hi James,
>>
>> Again thank you for moving this forward! Sorry for not chiming in
>> earlier, I am in the midst of a move.
>>
>> I don't think that TargetLibraryInfo should do the transformation from
>> scalar to vectorized call instruction itself.
>> I think, TargetLibraryInfo should just provide a mapping from function
>> name to function name (including the vector factor, etc). There should be a
>> function that lives in lib/transforms that does the transformation. I don’t
>> think IR type transformations should go into lib/Target.
>>
>> I don't understand why the function calls need to be virtual. The mapping
>> function name to family of functions should capture everything we need?
>>  (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform
>> params, 1}, …}
>> Can you infer the vectorized function name from the scalar function name
>> in a predictable way (but why use an enum then)? I don’t understand the use
>> case that requires virtual functions.
>>
>> We can then initialize this mapping in a static function like we do now
>> or at later point have this mapping additionally initialized by Module
>> metadata.
>>
>> Do we need to reuse the LibFunc enum to identify these functions? Do you
>> want to add (before-patch) TargetLibraryInfo::LibFunc style optimizations
>> to the optimizer? (This would go beyond just enabling vectorization of call
>> instruction).
>> It seems to me you are solving two separate things with this patch: one
>> is vectorization of call instructions and the second one is adding target
>> (really "target language") defined “LibFunc” functions?
>>
>> For the former, I think a map of function names should be enough if we
>> just want to know scalar and vector variants of functions? Otherwise, we
>> would have to go through a map of function name string to LibFunc to
>> vectorized function name. I think we can omit the intermediate step. If we
>> had a string mapping, this would also readily support Module metadata.
>>
>> The latter I think is out of scope for this patch and something that
>> needs to be discussed separately.
>>
>> Thanks,
>> Arnold
>>
>>
>> > On Dec 19, 2013, at 5:43 AM, James Molloy <James.Molloy at arm.com> wrote:
>> >
>> > Hi all,
>> >
>> > Attached is the first patch in a sequence to implement this behaviour
>> in the way described in this thread.
>> >
>> > This patch simply:
>> > * Changes most methods on TargetLibraryInfo to be virtual, to allow
>> clients to override them.
>> > * Adds three new functions for querying the vectorizability (and
>> scalarizability) of library functions. The default implementation returns
>> failure for all of these queries.
>> >
>> > Please review!
>> >
>> > James
>> >
>> >> -----Original Message-----
>> >> From: Hal Finkel [mailto:hfinkel at anl.gov]
>> >> Sent: 16 December 2013 21:10
>> >> To: Arnold Schwaighofer
>> >> Cc: llvm-commits; James Molloy
>> >> Subject: Re: RFC: Enable vectorization of call instructions in the loop
>> >> vectorizer
>> >>
>> >> ----- Original Message -----
>> >>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
>> >>> To: "Hal Finkel" <hfinkel at anl.gov>
>> >>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "James Molloy"
>> >> <James.Molloy at arm.com>
>> >>> Sent: Monday, December 16, 2013 3:08:13 PM
>> >>> Subject: Re: RFC: Enable vectorization of call instructions in the
>> loop
>> >> vectorizer
>> >>>
>> >>>
>> >>>> On Dec 16, 2013, at 2:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >>>>
>> >>>> ----- Original Message -----
>> >>>>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
>> >>>>> To: "James Molloy" <James.Molloy at arm.com>
>> >>>>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
>> >>>>> Sent: Monday, December 16, 2013 12:03:02 PM
>> >>>>> Subject: Re: RFC: Enable vectorization of call instructions in the
>> >>>>> loop     vectorizer
>> >>>>>
>> >>>>>
>> >>>>> On Dec 16, 2013, at 11:08 AM, James Molloy <James.Molloy at arm.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi Renato, Nadav,
>> >>>>>>
>> >>>>>> Attached is a proof of concept[1] patch for adding the ability to
>> >>>>>> vectorize calls. The intended use case for this is in domain
>> >>>>>> specific languages such as OpenCL where tuned implementation of
>> >>>>>> functions for differing vector widths exist and can be guaranteed
>> >>>>>> to be semantically the same as the scalar version.
>> >>>>>>
>> >>>>>> I’ve considered two approaches to this. The first was to create a
>> >>>>>> set of hooks that allow the LoopVectorizer to interrogate its
>> >>>>>> client as to whether calls are vectorizable and if so, how.
>> >>>>>> Renato
>> >>>>>> argued that this was suboptimal as it required a client to invoke
>> >>>>>> the LoopVectorizer manually and couldn’t be tested through opt. I
>> >>>>>> agree.
>> >>>>>
>> >>>>> I don’t understand this argument.
>> >>>>>
>> >>>>> We could extend target library info with additional api calls to
>> >>>>> query whether a function is vectorizable at a vector factor.
>> >>>>> This can be tested by providing the target triple string (e.g
>> >>>>> “target
>> >>>>> triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file
>> >>>>> that informs the optimizer that a set of vector library calls is
>> >>>>> available.
>> >>>>>
>> >>>>> The patch seems to restrict legal vector widths dependent on
>> >>>>> available vectorizable function calls. I don’t think this should
>> >>>>> work like this.
>> >>>>> I believe, there should be an api on TargetTransformInfo for
>> >>>>> library
>> >>>>> function calls. The vectorizer chooses the cheapest of either an
>> >>>>> intrinsic call or a library function call.
>> >>>>> The overall cost model determines which VF will be chosen.
>> >>>>
>> >>>> We don't have a good model currently for non-intrinsic function
>> >>>> calls. Once we do, we'll want to know how expensive the vectorized
>> >>>> versions are compared to the scalar version. Short of that, I
>> >>>> think that a reasonable approximation is that any function calls
>> >>>> will be the most expensive things in a loop, and the ability to
>> >>>> vectorize them will be the most important factor in determining
>> >>>> the vectorization factor.
>> >>>
>> >>> Yes and we can easily model this in the cost model by asking what is
>> >>> the cost of a (library) function call (vectorized or not) and have
>> >>> this return a reasonably high value.
>> >>
>> >> Sounds good to me.
>> >>
>> >> -Hal
>> >>
>> >>>
>> >>>
>> >>>>
>> >>>> -Hal
>> >>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Arnold
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> llvm-commits mailing list
>> >>>>> llvm-commits at cs.uiuc.edu
>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>>>
>> >>>> --
>> >>>> Hal Finkel
>> >>>> Assistant Computational Scientist
>> >>>> Leadership Computing Facility
>> >>>> Argonne National Laboratory
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>> > -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium.  Thank you.
>> >
>> > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No:  2557590
>> > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>> 9NJ, Registered in England & Wales, Company No:  2548782
>> > <vectorizer-tli.diff>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131225/32e7ab22/attachment.html>