RFC: Enable vectorization of call instructions in the loop vectorizer

Mon Dec 16 09:26:31 PST 2013

Hi James!

Thanks for working on this. 

> Attached is a proof of concept[1] patch for adding the ability to vectorize calls. The intended use case for this is in domain specific languages such as OpenCL where tuned implementation of functions for differing vector widths exist and can be guaranteed to be semantically the same as the scalar version.

Excellent!

> I’ve considered two approaches to this. The first was to create a set of hooks that allow the LoopVectorizer to interrogate its client as to whether calls are vectorizable and if so, how. Renato argued that this was suboptimal as it required a client to invoke the LoopVectorizer manually and couldn’t be tested through opt. I agree.
>  
> So the version attached reads metadata attached to CallInsts. The schema for the metadata is detailed in the proposed LangRef addition, but basically it describes a list of potential vectorization candidates. Each candidate has a vector width, a llvm::Function* (or MDString) giving the target function and a string describing how the function arguments need to be handled.

I think that this kind of logic should go into TargetLibraryInfo, and not as part of the vectorizers. The vectorizers should use some kind of API that will translate cos into cos4. I don’t like the vectorizer.call metadata because it does not solve the general problem. Yes, it will allow the vectorization of some OpenCL functions but it will not help the vectorization of other math functions in regular C loops. 

>  
> The mangled function arguments string allows us to handle vectorizations beyond just the pure “vectorize every argument” scenario. Consider for example the statement “a = clamp(b, 2.0f);”. OpenCL provides two forms of “clamp2” – one with the second argument a vector and one with the second argument a scalar. It is quite possible that the scalar form is more optimal, and should be selected if the second argument is uniform.

I understand this problem but I don’t want to add OpenCL-specific knowledge into the loop-vectorizer.  One possible solution would be to work around this problem by extending your OpenCL library and by introducing OpenCL-specific passes that will detect these kind of patterns and optimize your vectorized code using the OpenCL knowledge. 

Thanks,
Nadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131216/7841a5d0/attachment.html>