[LLVMdev] Extending vector operations
Nate Begeman
natebegeman at mac.com
Mon Jul 21 16:46:38 PDT 2008
On Jul 21, 2008, at 1:21 PM, Stefanus Du Toit wrote:
> Hi,
>
> We would like to extend the vector operations in llvm a bit. We're
> hoping to get some feedback on the right way to go, or some starting
> points. I had previously had some discussion on this list about a
> subset of the changes we have in mind.
>
> All of these changes are intended to make target-independent IR (i.e.
> IR without machine specific intrinsics) generate better code or be
> easier to generate from a frontend with vector support (whether from
> manual or autovectorization).
>
> If you have any insight into how to best get started with any of these
> changes, and whether they are feasible and sensible, please let me
> know. We're mostly interested in x86 as a target in the short term,
> but obviously want these to apply to other LLVM targets as well. We're
> prepared to implement these changes, but would like to hear any
> suggestions and objections you might have.
>
> Below are the specific additions we have in mind.
>
> ===
> 1) Vector shl, lshr, ashr
>
> I think these are no-brainers. We would like to extend the semantics
> of the shifting instructions to naturally apply to vectors as well.
> One issue is that these operations often only support a single shift
> amount for an entire vector. I assume it should be fairly
> straightforward to select on this pattern, and scalarize the general
> case as necessary.
That seems reasonable.
> 2) Vector strunc, sext, zext, fptrunc and fpext
>
> Again, I think these are hopefully straightforward. Please let me know
> if you expect any issues with vector operations that change element
> sizes from the RHS to the LHS, e.g. around legalization.
Is the proposed semantics here that the number of elements stays the
same size, and the overall vector width changes?
> 3) Vector intrinsics for floor, ceil, round, frac/modf
>
> These are operations that are not trivially specified in terms of
> simpler operations. It would be nice to have these as overloaded,
> target-independent intrinsics, in the same way as llvm.cos etc. are
> supported now.
It seems like these could be handled through intrinsics in the LLVM
IR, and could use general improvement in the selection dag.
> 4) Vector select
>
> We consider a vector select extremely important for a number of
> operations. This would be an extension of select to support an <N x
> i1> vector mask to select between elements of <N x T> vectors for some
> basic type T. Vector min, max, sign, etc. can be built on top of this
> operation.
How is this anything other than AND/ANDN/OR on any integer vector
type? I don't see what adding this to the IR gets you for vectors,
since "vector of i1" doesn't mean "vector of bits" necessarily.
> 5) Vector comparisons that return <N x i1>
>
> This is maybe not a must-have, and perhaps more a question of
> preference. I understand the current vfcmp/vicmp semantics, returning
> a vector of iK where K matches the bitwidth of the operands being
> compared with the high bit set or not, are there for pragmatic
> reasons, and that these functions exist to aid with code emitted that
> uses machine-specific intrinsics.
I totally disagree with this approach; A vector of i1 doesn't actually
match what you want to do with the hardware, unless you had say, 128 x
i1 for SSE, and it's strange when you have to spill and reload it.
The current VICMP and VFCMP instructions do not exist for use with
machine intrinsics; they exist to allow code written use C-style
comparison operators to generate efficient code on a wide range of
both scalar and vector hardware.
> For code that does not use machine intrinsics, I believe it would be
> cleaner, simpler, and potentially more efficient, to have a vector
> compare that returns <N x i1> instead. For example, in conjunction
> with the above-mentioned vector select, this would allow a max to be
> expressed simply as a sequence of compare and select.
Having gone down this path, I'd have to disagree with you.
>
> In addition to the above suggestions, I'd also like to hear what
> others think about handling vector operations that aren't powers of
> two in size, e.g. <3 x float> operations. I gather the status quo is
> that only POT sizes are expected to work (although we've found some
> bugs for things like <2 x float> that we're submitting). Ideally
> things like <3 x float> operands would usually be rounded up to the
> size supported by the machine directly. We can try to do this in the
> frontend, but it would of course be ideal if these just worked. I'm
> curious if anyone else out there has dealt with this already and has
> some suggestions.
Handling NPOT vectors in the code generator ideally would be great; I
know some people are working on widening the operations to a wider
legal vector type, and scalarizing is always a possibility as well.
The main problem here is what to do with address calculations, and
alignment.
Nate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080721/ed9a1278/attachment.html>
More information about the llvm-dev
mailing list