RFC: Enable vectorization of call instructions in the loop vectorizer

Mon Dec 16 09:33:55 PST 2013

Hi Nadav,

Thanks for the quick reply!

> I don't like the vectorizer.call metadata because it does not solve the
general problem. Yes, it will allow the vectorization of some OpenCL
functions but it will not help the vectorization of other math functions in
regular C loops. 

The difficulty I had is how to effectively model the transforms in a way
that works for OpenCL (which, I admit, is the only case I care about at the
moment) but also for whatever others may reasonably want to use it for. Is
your argument that the constraints on whether vectorization of general math
functions is possible cannot be expressed in metadata? I'm inclined to
agree, which is why I initially pushed for the querying hook mechanism.

> I think that this kind of logic should go into TargetLibraryInfo, and not
as part of the vectorizers.

This sounds similar to my original approach of adding hooks. And indeed,
TargetLibraryInfo certainly sounds like the best place for them.

> I understand this problem but I don't want to add OpenCL-specific
knowledge into the loop-vectorizer.

I do understand your concern and I agree, but I don't see how the solution I
just proposed is OpenCL-specific. is it that the solution is just not
applicable to other uses? (as mentioned in my first paragraph)

> One possible solution would be to work around this problem by extending
your OpenCL library and by introducing OpenCL-specific passes that will
detect these kind of patterns and optimize your vectorized code using the
OpenCL knowledge. 

Totally doable. I'm very keen to add the "right" support upstream - I don't
expect upstream to solve my problem completely. I'm happy to do whatever
cleanup or modifications are needed downstream to keep upstream uncluttered.

If I've understood you correctly (please let me know if I haven't!) then
I'll get on creating a prototype using TLI hooks.

Cheers,

James

From: Nadav Rotem [mailto:nrotem at apple.com] 
Sent: 16 December 2013 17:27
To: James Molloy
Cc: llvm-commits at cs.uiuc.edu for LLVM; Renato Golin
Subject: Re: RFC: Enable vectorization of call instructions in the loop
vectorizer

Hi James!

Thanks for working on this. 

Attached is a proof of concept[1] patch for adding the ability to vectorize
calls. The intended use case for this is in domain specific languages such
as OpenCL where tuned implementation of functions for differing vector
widths exist and can be guaranteed to be semantically the same as the scalar
version.

Excellent!

I've considered two approaches to this. The first was to create a set of
hooks that allow the LoopVectorizer to interrogate its client as to whether
calls are vectorizable and if so, how. Renato argued that this was
suboptimal as it required a client to invoke the LoopVectorizer manually and
couldn't be tested through opt. I agree.

So the version attached reads metadata attached to CallInsts. The schema for
the metadata is detailed in the proposed LangRef addition, but basically it
describes a list of potential vectorization candidates. Each candidate has a
vector width, a llvm::Function* (or MDString) giving the target function and
a string describing how the function arguments need to be handled.

I think that this kind of logic should go into TargetLibraryInfo, and not as
part of the vectorizers. The vectorizers should use some kind of API that
will translate cos into cos4. I don't like the vectorizer.call metadata
because it does not solve the general problem. Yes, it will allow the
vectorization of some OpenCL functions but it will not help the
vectorization of other math functions in regular C loops. 

The mangled function arguments string allows us to handle vectorizations
beyond just the pure "vectorize every argument" scenario. Consider for
example the statement "a = clamp(b, 2.0f);". OpenCL provides two forms of
"clamp2" - one with the second argument a vector and one with the second
argument a scalar. It is quite possible that the scalar form is more
optimal, and should be selected if the second argument is uniform.

I understand this problem but I don't want to add OpenCL-specific knowledge
into the loop-vectorizer.  One possible solution would be to work around
this problem by extending your OpenCL library and by introducing
OpenCL-specific passes that will detect these kind of patterns and optimize
your vectorized code using the OpenCL knowledge. 

Thanks,

Nadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131216/e7a88a7d/attachment.html>