RFC: Enable vectorization of call instructions in the loop vectorizer

Sun Dec 22 04:04:19 PST 2013

Hi Arnold,

No worries, it's Christmas season so I expect long delays between replies
(hence the day delay with this reply!)

> I don't think that TargetLibraryInfo should do the transformation from
scalar to vectorized call instruction itself.
> I think, TargetLibraryInfo should just provide a mapping from function
name to function name (including the vector factor, etc). There should > be
a function that lives in lib/transforms that does the transformation. I
don’t think IR type transformations should go into lib/Target.

> Do we need to reuse the LibFunc enum to identify these functions?

OK. I tried to go via LibFuncs purely so that TLI exposed a more coherent
interface (call getLibFunc on a function name, get back an identifier for
that function which you can then use to query the rest of TLI). I'm very
happy with going direct from fn name to fn name, and leaving the creation
of the CallInst to something else (probably LoopVectorizer itself).

Regarding uniforms - I think the best way to handle these is to ignore them
for the moment. At least in OpenCL, any function that can take a scalar
uniform can also take a vector non-uniform. So mapping from all arguments
scalar to all arguments vector is always valid, and will simplify the logic
a lot. Then a later pass could, if it wanted, reidentify uniforms and
change which function is called.

> I don't understand why the function calls need to be virtual. The mapping
function name to family of functions should capture everything we need?
>  (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform
params, 1}, …}

As Nadav mentioned, the entire suite of OpenCL built in functions is
massive - hundreds of functions. Pre-seeding a map with all of these
functions is something I'd quite like to avoid. There are two options -
parameterise TLI with some map, like you said, or make TLI's functionality
overridable. I prefer the latter, because it allows an implementation to do
a lot of work lazily which is important for compile time performance (see
the performance improvements I had to implement in the LLVM bitcode linker
as an example of how much lazy link performance matters - small kernel, big
library).

Making TLI's functionality overridable is, however, a soft requirement and
I'm keen to satisfy reviewers. So if this use case doesn't wash with you,
let me know and I'll make it take a map and sort out laziness my own way
downstream.

> Can you infer the vectorized function name from the scalar function name
in a predictable way (but why use an enum then)? I don’t understand the use
case that requires virtual functions.

Alas no, CL (my main use case) functions are overloaded and thus are
name-mangled. I don't want to put  logic in *anywhere*.

On 20 December 2013 18:45, Arnold <aschwaighofer at apple.com> wrote:

> Hi James,
>
> Again thank you for moving this forward! Sorry for not chiming in earlier,
> I am in the midst of a move.
>
> I don't think that TargetLibraryInfo should do the transformation from
> scalar to vectorized call instruction itself.
> I think, TargetLibraryInfo should just provide a mapping from function
> name to function name (including the vector factor, etc). There should be a
> function that lives in lib/transforms that does the transformation. I don’t
> think IR type transformations should go into lib/Target.
>
> I don't understand why the function calls need to be virtual. The mapping
> function name to family of functions should capture everything we need?
>  (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform
> params, 1}, …}
> Can you infer the vectorized function name from the scalar function name
> in a predictable way (but why use an enum then)? I don’t understand the use
> case that requires virtual functions.
>
> We can then initialize this mapping in a static function like we do now or
> at later point have this mapping additionally initialized by Module
> metadata.
>
> Do we need to reuse the LibFunc enum to identify these functions? Do you
> want to add (before-patch) TargetLibraryInfo::LibFunc style optimizations
> to the optimizer? (This would go beyond just enabling vectorization of call
> instruction).
> It seems to me you are solving two separate things with this patch: one is
> vectorization of call instructions and the second one is adding target
> (really "target language") defined “LibFunc” functions?
>
> For the former, I think a map of function names should be enough if we
> just want to know scalar and vector variants of functions? Otherwise, we
> would have to go through a map of function name string to LibFunc to
> vectorized function name. I think we can omit the intermediate step. If we
> had a string mapping, this would also readily support Module metadata.
>
> The latter I think is out of scope for this patch and something that needs
> to be discussed separately.
>
> Thanks,
> Arnold
>
>
> > On Dec 19, 2013, at 5:43 AM, James Molloy <James.Molloy at arm.com> wrote:
> >
> > Hi all,
> >
> > Attached is the first patch in a sequence to implement this behaviour in
> the way described in this thread.
> >
> > This patch simply:
> > * Changes most methods on TargetLibraryInfo to be virtual, to allow
> clients to override them.
> > * Adds three new functions for querying the vectorizability (and
> scalarizability) of library functions. The default implementation returns
> failure for all of these queries.
> >
> > Please review!
> >
> > James
> >
> >> -----Original Message-----
> >> From: Hal Finkel [mailto:hfinkel at anl.gov]
> >> Sent: 16 December 2013 21:10
> >> To: Arnold Schwaighofer
> >> Cc: llvm-commits; James Molloy
> >> Subject: Re: RFC: Enable vectorization of call instructions in the loop
> >> vectorizer
> >>
> >> ----- Original Message -----
> >>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> >>> To: "Hal Finkel" <hfinkel at anl.gov>
> >>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "James Molloy"
> >> <James.Molloy at arm.com>
> >>> Sent: Monday, December 16, 2013 3:08:13 PM
> >>> Subject: Re: RFC: Enable vectorization of call instructions in the loop
> >> vectorizer
> >>>
> >>>
> >>>> On Dec 16, 2013, at 2:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> >>>>> To: "James Molloy" <James.Molloy at arm.com>
> >>>>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
> >>>>> Sent: Monday, December 16, 2013 12:03:02 PM
> >>>>> Subject: Re: RFC: Enable vectorization of call instructions in the
> >>>>> loop     vectorizer
> >>>>>
> >>>>>
> >>>>> On Dec 16, 2013, at 11:08 AM, James Molloy <James.Molloy at arm.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Renato, Nadav,
> >>>>>>
> >>>>>> Attached is a proof of concept[1] patch for adding the ability to
> >>>>>> vectorize calls. The intended use case for this is in domain
> >>>>>> specific languages such as OpenCL where tuned implementation of
> >>>>>> functions for differing vector widths exist and can be guaranteed
> >>>>>> to be semantically the same as the scalar version.
> >>>>>>
> >>>>>> I’ve considered two approaches to this. The first was to create a
> >>>>>> set of hooks that allow the LoopVectorizer to interrogate its
> >>>>>> client as to whether calls are vectorizable and if so, how.
> >>>>>> Renato
> >>>>>> argued that this was suboptimal as it required a client to invoke
> >>>>>> the LoopVectorizer manually and couldn’t be tested through opt. I
> >>>>>> agree.
> >>>>>
> >>>>> I don’t understand this argument.
> >>>>>
> >>>>> We could extend target library info with additional api calls to
> >>>>> query whether a function is vectorizable at a vector factor.
> >>>>> This can be tested by providing the target triple string (e.g
> >>>>> “target
> >>>>> triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file
> >>>>> that informs the optimizer that a set of vector library calls is
> >>>>> available.
> >>>>>
> >>>>> The patch seems to restrict legal vector widths dependent on
> >>>>> available vectorizable function calls. I don’t think this should
> >>>>> work like this.
> >>>>> I believe, there should be an api on TargetTransformInfo for
> >>>>> library
> >>>>> function calls. The vectorizer chooses the cheapest of either an
> >>>>> intrinsic call or a library function call.
> >>>>> The overall cost model determines which VF will be chosen.
> >>>>
> >>>> We don't have a good model currently for non-intrinsic function
> >>>> calls. Once we do, we'll want to know how expensive the vectorized
> >>>> versions are compared to the scalar version. Short of that, I
> >>>> think that a reasonable approximation is that any function calls
> >>>> will be the most expensive things in a loop, and the ability to
> >>>> vectorize them will be the most important factor in determining
> >>>> the vectorization factor.
> >>>
> >>> Yes and we can easily model this in the cost model by asking what is
> >>> the cost of a (library) function call (vectorized or not) and have
> >>> this return a reasonably high value.
> >>
> >> Sounds good to me.
> >>
> >> -Hal
> >>
> >>>
> >>>
> >>>>
> >>>> -Hal
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Arnold
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> llvm-commits mailing list
> >>>>> llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>
> >>>> --
> >>>> Hal Finkel
> >>>> Assistant Computational Scientist
> >>>> Leadership Computing Facility
> >>>> Argonne National Laboratory
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >
> >
> > -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
> >
> > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2557590
> > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> 9NJ, Registered in England & Wales, Company No:  2548782
> > <vectorizer-tli.diff>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131222/8d54296c/attachment.html>