[LLVMdev] Modeling GPU vector registers, again (with my implementation)
David Greene
dag at cray.com
Wed Feb 18 17:19:20 PST 2009
On Friday 13 February 2009 11:47, Alex wrote:
> It seems to me that LLVM sub-register is not for the following hardware
> architecture.
>
> All instructions of a hardware are vector instructions. All registers
> contains
> 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w.
>
> Most instructions write more than one elements in this way:
>
> mul r0.xyw, r1, r2
> add r0.z, r3, r4
> sub r5, r0, r1
>
> Notice that the four elements of r0 are written by two different
> instructions.
>
> My question is how should I model these sub-registers. If I treat each
> component
> as a register, and do the register allocation individually, it seems very
> difficult to merge the scalars operations back into one vetor operation.
This is a very good use case for vector masks in LLVM. Expressing this as
two masked operations and a merge:
** Warning, pseudo-LLVM code ***
mul r0, r1, r2, [1101] ; [xy_w]
add r6, r3, r4, [0010] ; [__z_]
** The assumption here is that masked elements are undefined, so we need a
merge **
select r0, r0, r6, [1101] ; Select 1's from r0, 0's from r6, merge
sub r5, r0, r1, [1111] ; Or have no mask == full mask
The registers are just vector registers then. They don't have component
pieces. Regalloc will have no problem with them.
The MachineInstrs for your architecture would have to preserve the mask
semantics. In the AsmPrinter for your architecture, it would be a simple
matter to dump out the mask as the field specifier on a register name.
The masks would allow you to get rid of the shufflevector stuff. Since you
don't have a hardware merge instruction you could keep your pre- and
post-regalloc passes to rewrite things or a very simple post-regalloc peephole
pass could examine the masks of the merge and rewrite the registers in the
defs without a pre-regalloc pass needed to remember things.
Alas, we do not have masks in LLVM just yet. But I'm getting to the point
where I'm ready to restart that discussion. :)
This also won't directly handle the more general case of swizzling:
r0.wyzx = ...
But a "regular" masked operation followed by a shufflevector should do it.
-Dave
More information about the llvm-dev
mailing list