[LLVMdev] Extending vector operations

Wed Jul 23 12:26:51 PDT 2008

On Wednesday 23 July 2008 11:48, Stefanus Du Toit wrote:

> No, I am proposing a general vector-by-vector version, just pointing
> out that many architectures do not support this natively, but that
> this should be a simple matter of legalization.

Cool.

It would be handy to have LLVM IR handle vector/scalar generally as well.
For example, multiply a vector by a scalar, add a scalar to a vector, etc.
isel/codegen would have to expand the scalar on architectures that
don't have hardware to do these operations.

> >> 2) Vector strunc, sext, zext, fptrunc and fpext

> The intention is that for any of these conversion ops,
>
>    %A = op <N x S> %B to <N x T>
>
> is semantically equivalent to this pseudo-IR:
>
>    %A = undef <N x T>
>    for i = 0 .. N - 1:
>      %t1 = extractelement <N x S> %B, i
>      %t2 = op S %t1 to T
>      %A = insertelement <N x T> %A, %t2, i

Yep, that's what I would expect.

> > Yes, merge/blend is a very important operation.  Also, it would be
> > nice to
> > think about generalizing this to apply masks to all vector operations,
> > particularly loads and stores.
>
> Can you elaborate about what you mean by this? I've built vector IRs
> in the past that have writemasking effectively built into all IR ops,
> and I don't know if I would recommend this approach generally. But I'm
> not sure that's what you mean.

That's basically what I mean.  What about this gives you pause?  Generating
code for an architecture without these mask operations might get interesting
(you'd have to do reverse if-conversion).  But right now, there's no way to 
effectively support machines that _do_ have this hardware.

The if-conversion could be done depending on whether or not the vector
hardware supports generalized masked operations.  Right now the only
way to get this support is to define a whole set of MachineInstructions
that describe these operations and write the if-conversion passes to make
the transformation on machine IR.  It would be nice to provide one model that
can be reused.

Obviously, a lot more discussion about this needs to happen.

> >> I realize this is probably the most controversial change amongst
> >> these. I gather there is some concern about representing "variable
> >> width" i1s, but I would contend that that's the case even for i1s
> >> which are not vectors.
> >
> > What do you mean by "variable width?"
>
> At some point the i1 vectors may need to be converted to masks. The
> sizes for these masks will often depend on how they're used. On ISAs
> like SSE and CellSPU, a comparison generates a mask corresponding to
> the size of the operands being compared, and a selection requires a
> mask corresponding to the size of the operands being selected.

Right.

> The "simple" way to deal with this is to insert appropriate conversion
> code at uses of i1s, and pick a representation for a given i1 based on
> its SSA birthpoint. It's a little more ambiguous when you start adding
> phis into the mix, e.g.:
>
> a:
> 	%a1 = fcmp olt <2 x float> %f1, %f2 ; yields <2 x i1>
> 	br label %c
>
> b:
> 	%a2 = fcmp olt <2 x double> %d1, %d2 ; yields <2 x i1>
> 	br label %c
>
> c:
> 	%a3 = phi <2 x i1> [%a1, %a], [%a2, %b]
> 	select <2 x i1> %a3, %s1, %s2 ; where s1, s2 are <2 x i16>
> 	select <2 x i1> %a3, %c1, %c2 ; where s1, s2 are <2 x i8>
>
> The representation for %a1 may be <2 x i32>, for %a2 <2 x i64>, but
> for %a3 it's less lear.

As I said in a previous message, I'm not sure that code like this makes
sense even without i1.  I would assume you'd just generate two different
masks, one for <2 x double> and one for <2 x float>.  If the masks can
later be combined into one register, great!  But that's a machine-dependent
decision.

I'm trying to think of what kind of source code would produce this.  The
codes I imagine would either generate two vector loops with different
masks corresponding to checking doubles or floats or would generate
a loop that sets a vector of integers based on either doubles or floats
and then another loop that uses the vector of integers to mask operations.

In either code, I don't imagine a phi like this would be generated.  Because
vector masks generally result from if-conversion, it's strange to even see a
phi here.

Perhaps I'm thinking small.  Can you give a source code example you
could see mapping to this?

My gut tells me we should declare this kind of thing illegal.

> I don't think this is a huge problem, but it's something to be aware
> of. Note the same issue can affect even scalar i1 values. In fact,
> Scott Michel posted to llvm-dev recently about dealing with this issue
> for setcc in the CellSPU backend.

Thanks for the pointer.  I'll go read that thread.

I should think conversion at the point of use as you describe would be
workable.  But I still really question whether we have to worry about this
at all.

> >> that only POT sizes are expected to work (although we've found some
> >> bugs for things like <2 x float> that we're submitting). Ideally
> >> things like <3 x float> operands would usually be rounded up to the
> >> size supported by the machine directly. We can try to do this in the
> >
> > You might need mask support as well, especially if the operation can
> > trap.
>
> Hmm. Yes, divisions by zero etc. are something we should think about.

You can always scalarize, but that kinda defeats the point of non-POT.  
Still, for vector hardware that supports non-POT, this support is a win.,

                                                        -Dave