[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Alex
alex.lavoro.propio at gmail.com
Fri Feb 13 09:47:52 PST 2009
It seems to me that LLVM sub-register is not for the following hardware
architecture.
All instructions of a hardware are vector instructions. All registers
contains
4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w.
Most instructions write more than one elements in this way:
mul r0.xyw, r1, r2
add r0.z, r3, r4
sub r5, r0, r1
Notice that the four elements of r0 are written by two different
instructions.
My question is how should I model these sub-registers. If I treat each
component
as a register, and do the register allocation individually, it seems very
difficult to merge the scalars operations back into one vetor operation.
// each %reg is a sub-register
// r1, r2, r3, r4 here are virtual register number
mul %reg1024, r1, r2 // x
mul %reg1025, r1, r2 // y
mul %reg1026, r1, r2 // z
add %reg1027, r3, r4 // w
sub %reg1028, %reg1024, r1
sub %reg1029, %reg1025, r1
sub %reg1030, %reg1026, r1
sub %reg1031, %reg1027, r1
So I decided to model each 4-element register as one Register in *.td file.
Here are the details.
Since all the 4 elements of a vector register occupy the same 'alloca',
during the conversion of shader assembly to LLVM IR, I check if a vector
register is written (to different elements) by different instructions. When
the second write happens, I generate a shufflevector to multiplex the
existing value and the new value, and store the result of shufflevector.
Input assembly language:
mul r0.xy, r1, r2
add r0.zw, r3, r4
sub r5, r0, r1
is converted to LLVM IR:
%r0 = alloca <4 x float>
%mul_1 = mul <4 x float> %r1, %r2
store <4 x float> %mul_1, <4 x float>* %r0
...
%add_1 = add <4 x float> %r3, %r4
; a store does not immediately happen here
%load_1 = load <4 x float>* %r0
; select the first two elements from the existing value,
; the last two elements from the newly generated value
%merge_1 = shufflevector <4 x float> %load_1,
<4 x float> %add_1,
<4 x i32> < i32 0, i32 1, i32 6, i32 7 >
; store the multiplexed value
store <4 x float> %merge_1, <4 x float>* %r0
After mem2reg:
%mul_1 = mul <4 x float> %r1, %r2
%add_1 = add <4 x float> %r3, %r4
%merge_1 = shufflevector <4 x float> %mul_1,
<4 x float> %add_1,
<4 x i32> < i32 0, i32 1, i32 6, i32 7 >
After instruction selection:
MUL %reg1024, %reg1025, %reg1026
ADD %reg1027, %reg1028, %reg1029
MERGE %reg1030, %reg1024, "xy", %reg1027, "zw"
The 'shufflevector' is selected to a MERGE instruction by the default LLVM
instruction selector. The hardware doesn't have this instruction. I have a
*pre*-register allocation FunctionPass to remember:
The phyicial regsiter allocated to the destination register of MERGE
(%reg1030) should replace the destination register allocated to the
destination register of MUL (%reg1024) and ADD(%reg1027).
In this way I ensure MUL and ADD write to the same physical register. This
replacement is done in the other FunctionPass *after* register allocation.
MUL and ADD have an 'OptionalDefOperand' writemask. By default the writemask
is
"xyzw" (all elmenets are written).
// 0xF == all elements are written by default
def WRITEMASK : OptionalDefOperand<OtherVT, (ops i32imm), (ops (i32 0xF))>
{...}
def MUL : MyInst<(outs REG4X32:$dst),
(ins REG4X32:$src0, REG4X32:$src1, WRITEMASK:$wm),
In the said post-register-allocation FunctionPass, in addition to replace
the
destination registers as described before, the writemask ($wm) of each
instruction is also replaced with the writemask operands of MERGE. So:
MUL %R0, %R1, %R2, "xyzw"
ADD %R5, %R3, %R4, "xyzw"
MERGE %R6, %R0, "xy", %R5, "zw"
==>
MUL %R6, %R1, %R2, "xy" // "xy" comes from MERGE operand 2
ADD %R6, %R3, %R4, "zw"
// MERGE %R6, %R0, "xy", %R5, "zw" <== REMOVED
Final machine code:
MUL r6.xy, r1, r2
ADD r6.zw, r3, r4
SUB r8, r6, r1
I don't feel very comfortable with these two very ad-hoc FunctionPass.
Alex.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090213/ec84f395/attachment.html>
More information about the llvm-dev
mailing list