RFC: Enable vectorization of call instructions in the loop vectorizer

Mon Dec 16 10:03:02 PST 2013

On Dec 16, 2013, at 11:08 AM, James Molloy <James.Molloy at arm.com> wrote:

> Hi Renato, Nadav,
>  
> Attached is a proof of concept[1] patch for adding the ability to vectorize calls. The intended use case for this is in domain specific languages such as OpenCL where tuned implementation of functions for differing vector widths exist and can be guaranteed to be semantically the same as the scalar version.
>  
> I’ve considered two approaches to this. The first was to create a set of hooks that allow the LoopVectorizer to interrogate its client as to whether calls are vectorizable and if so, how. Renato argued that this was suboptimal as it required a client to invoke the LoopVectorizer manually and couldn’t be tested through opt. I agree.

I don’t understand this argument.

We could extend target library info with additional api calls to query whether a function is vectorizable at a vector factor.
This can be tested by providing the target triple string (e.g “target triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file that informs the optimizer that a set of vector library calls is available.

The patch seems to restrict legal vector widths dependent on available vectorizable function calls. I don’t think this should work like this.
I believe, there should be an api on TargetTransformInfo for library function calls. The vectorizer chooses the cheapest of either an intrinsic call or a library function call.
The overall cost model determines which VF will be chosen.

Thanks,
Arnold