[LLVMdev] Extending vector operations

Wed Jul 23 09:34:51 PDT 2008

On Monday 21 July 2008 18:46, Nate Begeman wrote:

> > 4) Vector select
> >
> > We consider a vector select extremely important for a number of
> > operations. This would be an extension of select to support an <N x
> > i1> vector mask to select between elements of <N x T> vectors for some
> > basic type T. Vector min, max, sign, etc. can be built on top of this
> > operation.
>
> How is this anything other than AND/ANDN/OR on any integer vector
> type?  I don't see what adding this to the IR gets you for vectors,
> since "vector of i1" doesn't mean "vector of bits" necessarily.

Yes, you can implement the operation with bit twiddling but it's less 
convenient.  Since we have a scalar select, it seems only natural to
provide a vector version.  Some vector hardware has direct support
for this operation.  AVX is just one example.

> > 5) Vector comparisons that return <N x i1>
> >
> > This is maybe not a must-have, and perhaps more a question of
> > preference. I understand the current vfcmp/vicmp semantics, returning
> > a vector of iK where K matches the bitwidth of the operands being
> > compared with the high bit set or not, are there for pragmatic
> > reasons, and that these functions exist to aid with code emitted that
> > uses machine-specific intrinsics.
>
> I totally disagree with this approach; A vector of i1 doesn't actually
> match what you want to do with the hardware, unless you had say, 128 x
> i1 for SSE, and it's strange when you have to spill and reload it.

I don't follow.  "What you want to do with the hardware" depends on what
hardware you have.  Don't be restricted in thinking SSE-style only.  And
for spill-reload on SSE, why couldn't you do a bitcast to some SSE-supported
type and just dump it to memory?

> The current VICMP and VFCMP instructions do not exist for use with
> machine intrinsics; they exist to allow code written use C-style
> comparison operators to generate efficient code on a wide range of
> both scalar and vector hardware.

Well, either you convert the current model to a bitvector for some hardware
or you convert a bitvector to the current model for other hardware.  Perhaps
this can all be done by isel without too much trouble.  I honestly don't know.

> > For code that does not use machine intrinsics, I believe it would be
> > cleaner, simpler, and potentially more efficient, to have a vector
> > compare that returns <N x i1> instead. For example, in conjunction
> > with the above-mentioned vector select, this would allow a max to be
> > expressed simply as a sequence of compare and select.
>
> Having gone down this path, I'd have to disagree with you.

Can you elaborate?  What were your experiences?  What architecture were
you targeting?  Real-world experience is valuable.

> Handling NPOT vectors in the code generator ideally would be great; I
> know some people are working on widening the operations to a wider
> legal vector type, and scalarizing is always a possibility as well.
> The main problem here is what to do with address calculations, and
> alignment.

With a vector select one could potentially use it to insert "safe" values
into the unaffected vector elements on fixed-VL architectures like SSE.
This would allow us to do NPOT operations on vector hardware that
requires POT vector lengths without scalarizing.

                                                -Dave