[LLVMdev] Extending vector operations
David Greene
dag at cray.com
Wed Jul 23 12:26:51 PDT 2008
On Wednesday 23 July 2008 11:48, Stefanus Du Toit wrote:
> No, I am proposing a general vector-by-vector version, just pointing
> out that many architectures do not support this natively, but that
> this should be a simple matter of legalization.
Cool.
It would be handy to have LLVM IR handle vector/scalar generally as well.
For example, multiply a vector by a scalar, add a scalar to a vector, etc.
isel/codegen would have to expand the scalar on architectures that
don't have hardware to do these operations.
> >> 2) Vector strunc, sext, zext, fptrunc and fpext
> The intention is that for any of these conversion ops,
>
> %A = op <N x S> %B to <N x T>
>
> is semantically equivalent to this pseudo-IR:
>
> %A = undef <N x T>
> for i = 0 .. N - 1:
> %t1 = extractelement <N x S> %B, i
> %t2 = op S %t1 to T
> %A = insertelement <N x T> %A, %t2, i
Yep, that's what I would expect.
> > Yes, merge/blend is a very important operation. Also, it would be
> > nice to
> > think about generalizing this to apply masks to all vector operations,
> > particularly loads and stores.
>
> Can you elaborate about what you mean by this? I've built vector IRs
> in the past that have writemasking effectively built into all IR ops,
> and I don't know if I would recommend this approach generally. But I'm
> not sure that's what you mean.
That's basically what I mean. What about this gives you pause? Generating
code for an architecture without these mask operations might get interesting
(you'd have to do reverse if-conversion). But right now, there's no way to
effectively support machines that _do_ have this hardware.
The if-conversion could be done depending on whether or not the vector
hardware supports generalized masked operations. Right now the only
way to get this support is to define a whole set of MachineInstructions
that describe these operations and write the if-conversion passes to make
the transformation on machine IR. It would be nice to provide one model that
can be reused.
Obviously, a lot more discussion about this needs to happen.
> >> I realize this is probably the most controversial change amongst
> >> these. I gather there is some concern about representing "variable
> >> width" i1s, but I would contend that that's the case even for i1s
> >> which are not vectors.
> >
> > What do you mean by "variable width?"
>
> At some point the i1 vectors may need to be converted to masks. The
> sizes for these masks will often depend on how they're used. On ISAs
> like SSE and CellSPU, a comparison generates a mask corresponding to
> the size of the operands being compared, and a selection requires a
> mask corresponding to the size of the operands being selected.
Right.
> The "simple" way to deal with this is to insert appropriate conversion
> code at uses of i1s, and pick a representation for a given i1 based on
> its SSA birthpoint. It's a little more ambiguous when you start adding
> phis into the mix, e.g.:
>
> a:
> %a1 = fcmp olt <2 x float> %f1, %f2 ; yields <2 x i1>
> br label %c
>
> b:
> %a2 = fcmp olt <2 x double> %d1, %d2 ; yields <2 x i1>
> br label %c
>
> c:
> %a3 = phi <2 x i1> [%a1, %a], [%a2, %b]
> select <2 x i1> %a3, %s1, %s2 ; where s1, s2 are <2 x i16>
> select <2 x i1> %a3, %c1, %c2 ; where s1, s2 are <2 x i8>
>
> The representation for %a1 may be <2 x i32>, for %a2 <2 x i64>, but
> for %a3 it's less lear.
As I said in a previous message, I'm not sure that code like this makes
sense even without i1. I would assume you'd just generate two different
masks, one for <2 x double> and one for <2 x float>. If the masks can
later be combined into one register, great! But that's a machine-dependent
decision.
I'm trying to think of what kind of source code would produce this. The
codes I imagine would either generate two vector loops with different
masks corresponding to checking doubles or floats or would generate
a loop that sets a vector of integers based on either doubles or floats
and then another loop that uses the vector of integers to mask operations.
In either code, I don't imagine a phi like this would be generated. Because
vector masks generally result from if-conversion, it's strange to even see a
phi here.
Perhaps I'm thinking small. Can you give a source code example you
could see mapping to this?
My gut tells me we should declare this kind of thing illegal.
> I don't think this is a huge problem, but it's something to be aware
> of. Note the same issue can affect even scalar i1 values. In fact,
> Scott Michel posted to llvm-dev recently about dealing with this issue
> for setcc in the CellSPU backend.
Thanks for the pointer. I'll go read that thread.
I should think conversion at the point of use as you describe would be
workable. But I still really question whether we have to worry about this
at all.
> >> that only POT sizes are expected to work (although we've found some
> >> bugs for things like <2 x float> that we're submitting). Ideally
> >> things like <3 x float> operands would usually be rounded up to the
> >> size supported by the machine directly. We can try to do this in the
> >
> > You might need mask support as well, especially if the operation can
> > trap.
>
> Hmm. Yes, divisions by zero etc. are something we should think about.
You can always scalarize, but that kinda defeats the point of non-POT.
Still, for vector hardware that supports non-POT, this support is a win.,
-Dave
More information about the llvm-dev
mailing list