[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Villmow, Micah
Micah.Villmow at amd.com
Mon Feb 16 09:24:18 PST 2009
Alex,
From my experience in working with GPU vector registers; there is no
support for swizzles in the manner that you would normally code them,
and in my case I have 6^4 permutations on src registers and 24
combinations in the dst registers. The way that I ended up handling this
was to have different register classes for 1, 2, 3 and 4 component
vectors. This made the generic cases very simple but still made
swizzling fairly difficult.
In order to get swizzling to work you only need to handle three
SDNodes, insert_vector_elt, extract_vector_elt and build_vector while
expanding the rest. For those three nodes I then custom lowered them to
a target specific node with an extra integer constant per register that
would encode the swizzle mask in 32bits. The correct swizzles can then
be generated in the asm printer by decoding the integer constant. This
does require having extra moves, but your example below would end up
being something like the following:
dp4 r100, r1, r2
mov r0.x, r100 (float4 => float1 extract_vector_elt)
dp4 r101, r4, r5
mov r3.x, r101 (float4 => float1 extract_vector_elt)
iadd r6.xy__, r0.x000, r3.0x00(float1 + float1 => float2 build_vector)
dp4 r7.x, r8, r9
<as above>
dp4 r10.x, r11, r12
<as above>
iadd r13.xy__, r7.x000, f10.0x00(float1 + float1 => float2 build_vector)
iadd r14, r13.xy00, r6.00xy (float2 + float2 => float4 build_vector)
sub r15, r14, r9
It's not as compact and neat but it works and the move instructions will
get optimized away by the lower level gpu compiler.
Hope this helps,
Micah
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
On Behalf Of [Alex]
Sent: Monday, February 16, 2009 2:33 AM
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Modeling GPU vector registers, again (with my
implementation)
Evan Cheng-2 wrote:
>
> Well, how many possible permutations are there? Is it possible to
> model each case as a separate physical register?
>
> Evan
>
I don't think so. There are 4x4x4x4 = 256 permutations. For example:
* xyzw: default
* zxyw
* yyyy: splat
Even if can model each of these 256 cases as a separate physical
register,
how can I model the use of r0.xyzw in the following example:
// dp4 = dot product 4-element
dp4 r0.x, r1, r2
dp4 r0.y, r3, r4
dp4 r0.z, r5, r6
dp4 r0.w, r7, r8
sub r5, r0.xyzw, r6
--
View this message in context:
http://www.nabble.com/Modeling-GPU-vector-registers%2C-again-%28with-my-
implementation%29-tp22001613p22034856.html
Sent from the LLVM - Dev mailing list archive at Nabble.com.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list