RFC: Enable vectorization of call instructions in the loop vectorizer

Mon Dec 16 12:59:40 PST 2013

----- Original Message -----
> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> To: "James Molloy" <James.Molloy at arm.com>
> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>
> Sent: Monday, December 16, 2013 12:03:02 PM
> Subject: Re: RFC: Enable vectorization of call instructions in the loop	vectorizer
> 
> 
> On Dec 16, 2013, at 11:08 AM, James Molloy <James.Molloy at arm.com>
> wrote:
> 
> > Hi Renato, Nadav,
> >  
> > Attached is a proof of concept[1] patch for adding the ability to
> > vectorize calls. The intended use case for this is in domain
> > specific languages such as OpenCL where tuned implementation of
> > functions for differing vector widths exist and can be guaranteed
> > to be semantically the same as the scalar version.
> >  
> > I’ve considered two approaches to this. The first was to create a
> > set of hooks that allow the LoopVectorizer to interrogate its
> > client as to whether calls are vectorizable and if so, how. Renato
> > argued that this was suboptimal as it required a client to invoke
> > the LoopVectorizer manually and couldn’t be tested through opt. I
> > agree.
> 
> I don’t understand this argument.
> 
> We could extend target library info with additional api calls to
> query whether a function is vectorizable at a vector factor.
> This can be tested by providing the target triple string (e.g “target
> triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file
> that informs the optimizer that a set of vector library calls is
> available.
> 
> The patch seems to restrict legal vector widths dependent on
> available vectorizable function calls. I don’t think this should
> work like this.
> I believe, there should be an api on TargetTransformInfo for library
> function calls. The vectorizer chooses the cheapest of either an
> intrinsic call or a library function call.
> The overall cost model determines which VF will be chosen.

We don't have a good model currently for non-intrinsic function calls. Once we do, we'll want to know how expensive the vectorized versions are compared to the scalar version. Short of that, I think that a reasonable approximation is that any function calls will be the most expensive things in a loop, and the ability to vectorize them will be the most important factor in determining the vectorization factor.

 -Hal

> 
> Thanks,
> Arnold
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory