RFC: Enable vectorization of call instructions in the loop vectorizer

Fri Dec 20 10:45:25 PST 2013

Hi James,

Again thank you for moving this forward! Sorry for not chiming in earlier, I am in the midst of a move.

I don't think that TargetLibraryInfo should do the transformation from scalar to vectorized call instruction itself.
I think, TargetLibraryInfo should just provide a mapping from function name to function name (including the vector factor, etc). There should be a function that lives in lib/transforms that does the transformation. I don’t think IR type transformations should go into lib/Target.

I don't understand why the function calls need to be virtual. The mapping function name to family of functions should capture everything we need?
 (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform params, 1}, …} 
Can you infer the vectorized function name from the scalar function name in a predictable way (but why use an enum then)? I don’t understand the use case that requires virtual functions.

We can then initialize this mapping in a static function like we do now or at later point have this mapping additionally initialized by Module metadata.

Do we need to reuse the LibFunc enum to identify these functions? Do you want to add (before-patch) TargetLibraryInfo::LibFunc style optimizations to the optimizer? (This would go beyond just enabling vectorization of call instruction).
It seems to me you are solving two separate things with this patch: one is vectorization of call instructions and the second one is adding target (really "target language") defined “LibFunc” functions?

For the former, I think a map of function names should be enough if we just want to know scalar and vector variants of functions? Otherwise, we would have to go through a map of function name string to LibFunc to vectorized function name. I think we can omit the intermediate step. If we had a string mapping, this would also readily support Module metadata.

The latter I think is out of scope for this patch and something that needs to be discussed separately.

Thanks,
Arnold

> On Dec 19, 2013, at 5:43 AM, James Molloy <James.Molloy at arm.com> wrote:
> 
> Hi all,
> 
> Attached is the first patch in a sequence to implement this behaviour in the way described in this thread.
> 
> This patch simply:
> * Changes most methods on TargetLibraryInfo to be virtual, to allow clients to override them.
> * Adds three new functions for querying the vectorizability (and scalarizability) of library functions. The default implementation returns failure for all of these queries.
> 
> Please review!
> 
> James
> 
>> -----Original Message-----
>> From: Hal Finkel [mailto:hfinkel at anl.gov]
>> Sent: 16 December 2013 21:10
>> To: Arnold Schwaighofer
>> Cc: llvm-commits; James Molloy
>> Subject: Re: RFC: Enable vectorization of call instructions in the loop
>> vectorizer
>> 
>> ----- Original Message -----
>>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
>>> To: "Hal Finkel" <hfinkel at anl.gov>
>>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "James Molloy"
>> <James.Molloy at arm.com>
>>> Sent: Monday, December 16, 2013 3:08:13 PM
>>> Subject: Re: RFC: Enable vectorization of call instructions in the loop
>> vectorizer
>>> 
>>> 
>>>> On Dec 16, 2013, at 2:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
>>>>> To: "James Molloy" <James.Molloy at arm.com>
>>>>> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
>>>>> Sent: Monday, December 16, 2013 12:03:02 PM
>>>>> Subject: Re: RFC: Enable vectorization of call instructions in the
>>>>> loop     vectorizer
>>>>> 
>>>>> 
>>>>> On Dec 16, 2013, at 11:08 AM, James Molloy <James.Molloy at arm.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Renato, Nadav,
>>>>>> 
>>>>>> Attached is a proof of concept[1] patch for adding the ability to
>>>>>> vectorize calls. The intended use case for this is in domain
>>>>>> specific languages such as OpenCL where tuned implementation of
>>>>>> functions for differing vector widths exist and can be guaranteed
>>>>>> to be semantically the same as the scalar version.
>>>>>> 
>>>>>> I’ve considered two approaches to this. The first was to create a
>>>>>> set of hooks that allow the LoopVectorizer to interrogate its
>>>>>> client as to whether calls are vectorizable and if so, how.
>>>>>> Renato
>>>>>> argued that this was suboptimal as it required a client to invoke
>>>>>> the LoopVectorizer manually and couldn’t be tested through opt. I
>>>>>> agree.
>>>>> 
>>>>> I don’t understand this argument.
>>>>> 
>>>>> We could extend target library info with additional api calls to
>>>>> query whether a function is vectorizable at a vector factor.
>>>>> This can be tested by providing the target triple string (e.g
>>>>> “target
>>>>> triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file
>>>>> that informs the optimizer that a set of vector library calls is
>>>>> available.
>>>>> 
>>>>> The patch seems to restrict legal vector widths dependent on
>>>>> available vectorizable function calls. I don’t think this should
>>>>> work like this.
>>>>> I believe, there should be an api on TargetTransformInfo for
>>>>> library
>>>>> function calls. The vectorizer chooses the cheapest of either an
>>>>> intrinsic call or a library function call.
>>>>> The overall cost model determines which VF will be chosen.
>>>> 
>>>> We don't have a good model currently for non-intrinsic function
>>>> calls. Once we do, we'll want to know how expensive the vectorized
>>>> versions are compared to the scalar version. Short of that, I
>>>> think that a reasonable approximation is that any function calls
>>>> will be the most expensive things in a loop, and the ability to
>>>> vectorize them will be the most important factor in determining
>>>> the vectorization factor.
>>> 
>>> Yes and we can easily model this in the cost model by asking what is
>>> the cost of a (library) function call (vectorized or not) and have
>>> this return a reasonably high value.
>> 
>> Sounds good to me.
>> 
>> -Hal
>> 
>>> 
>>> 
>>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Arnold
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
> 
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782
> <vectorizer-tli.diff>