[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Wed Feb 18 17:19:20 PST 2009

On Friday 13 February 2009 11:47, Alex wrote:
> It seems to me that LLVM sub-register is not for the following hardware
> architecture.
>
> All instructions of a hardware are vector instructions. All registers
> contains
> 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w.
>
> Most instructions write more than one elements in this way:
>
>   mul r0.xyw, r1, r2
>   add r0.z, r3, r4
>   sub r5, r0, r1
>
> Notice that the four elements of r0 are written by two different
> instructions.
>
> My question is how should I model these sub-registers. If I treat each
> component
> as a register, and do the register allocation individually, it seems very
> difficult to merge the scalars operations back into one vetor operation.

This is a very good use case for vector masks in LLVM.  Expressing this as 
two masked operations and a merge:

** Warning, pseudo-LLVM code ***
mul r0, r1, r2, [1101] ; [xy_w]
add r6, r3, r4, [0010] ; [__z_]
** The assumption here is that masked elements are undefined, so we need a 
merge **
select r0, r0, r6, [1101] ; Select 1's from r0, 0's from r6, merge
sub r5, r0, r1, [1111] ; Or have no mask == full mask

The registers are just vector registers then.  They don't have component 
pieces.  Regalloc will have no problem with them.

The MachineInstrs for your architecture would have to preserve the mask 
semantics.  In the AsmPrinter for your architecture, it would be a simple 
matter to dump out the mask as the field specifier on a register name.

The masks would allow you to get rid of the shufflevector stuff.  Since you 
don't have a hardware merge instruction you could keep your pre- and 
post-regalloc passes to rewrite things or a very simple post-regalloc peephole 
pass could examine the masks of the merge and rewrite the registers in the 
defs without a pre-regalloc pass needed to remember things.

Alas, we do not have masks in LLVM just yet.  But I'm getting to the point 
where I'm ready to restart that discussion.  :)

This also won't directly handle the more general case of swizzling:

r0.wyzx = ...

But a "regular" masked operation followed by a shufflevector should do it.

                                      -Dave