[LLVMdev] Extending vector operations

Wed Jul 23 09:15:06 PDT 2008

On Monday 21 July 2008 15:21, Stefanus Du Toit wrote:

> We would like to extend the vector operations in llvm a bit. We're
> hoping to get some feedback on the right way to go, or some starting
> points. I had previously had some discussion on this list about a
> subset of the changes we have in mind.

Woohoo!  We've been interested in talking about this for some time.

> All of these changes are intended to make target-independent IR (i.e.
> IR without machine specific intrinsics) generate better code or be
> easier to generate from a frontend with vector support (whether from
> manual or autovectorization).

Very excellent.

> ===
> 1) Vector shl, lshr, ashr
>
> I think these are no-brainers. We would like to extend the semantics
> of the shifting instructions to naturally apply to vectors as well.
> One issue is that these operations often only support a single shift
> amount for an entire vector. I assume it should be fairly

So you're assuming a shift of a vector by a scalar?  What about the
general vector-by-vector version?

> straightforward to select on this pattern, and scalarize the general
> case as necessary.

Yep.

> 2) Vector strunc, sext, zext, fptrunc and fpext
>
> Again, I think these are hopefully straightforward. Please let me know
> if you expect any issues with vector operations that change element
> sizes from the RHS to the LHS, e.g. around legalization.

Is the assumption that all elements are changed in the same way?

> 4) Vector select
>
> We consider a vector select extremely important for a number of
> operations. This would be an extension of select to support an <N x
> i1> vector mask to select between elements of <N x T> vectors for some
> basic type T. Vector min, max, sign, etc. can be built on top of this
> operation.

Yes, merge/blend is a very important operation.  Also, it would be nice to
think about generalizing this to apply masks to all vector operations,
particularly loads and stores.

> 5) Vector comparisons that return <N x i1>
>
> This is maybe not a must-have, and perhaps more a question of
> preference. I understand the current vfcmp/vicmp semantics, returning
> a vector of iK where K matches the bitwidth of the operands being
> compared with the high bit set or not, are there for pragmatic
> reasons, and that these functions exist to aid with code emitted that
> uses machine-specific intrinsics.

As long as we have some way of representing the result of a vector
compare, I'm not too worried.  As you say, target-specific code can
convert these to masks or whatever form the target architecture has.

I don't have any experience trying to do such a conversion, though, so
there may be gotchas we're not aware of.

> For code that does not use machine intrinsics, I believe it would be
> cleaner, simpler, and potentially more efficient, to have a vector
> compare that returns <N x i1> instead. For example, in conjunction
> with the above-mentioned vector select, this would allow a max to be
> expressed simply as a sequence of compare and select.

True enough.  It also leads naturally to generalized masked vector operations,
which are _extremely_ handy.

> Vector bitshifts would actually help with the amount of code generated
> for something like a vectorized max, but makes the patterns for
> recognizing these a lot longer.

Right.

> I realize this is probably the most controversial change amongst
> these. I gather there is some concern about representing "variable
> width" i1s, but I would contend that that's the case even for i1s
> which are not vectors.

What do you mean by "variable width?"

> In addition to the above suggestions, I'd also like to hear what
> others think about handling vector operations that aren't powers of
> two in size, e.g. <3 x float> operations. I gather the status quo is

It's goodness.

> that only POT sizes are expected to work (although we've found some
> bugs for things like <2 x float> that we're submitting). Ideally
> things like <3 x float> operands would usually be rounded up to the
> size supported by the machine directly. We can try to do this in the

You might need mask support as well, especially if the operation can trap.

> frontend, but it would of course be ideal if these just worked. I'm
> curious if anyone else out there has dealt with this already and has
> some suggestions.

It's something we've thought about but not really delved into yet.

> Please let me know what you think,

Let's connect at the dev meeting along with others interested in this stuff
and start thinking about how to proceed.

                                                   -Dave