[LLVMdev] RE: LLVM extension v.s. DirectX Shaders

Sat Apr 8 12:39:39 PDT 2006

Way back on Wed Dec 14, 2005, Tzu-Chien Chiu wrote:

> To write a compiler for Microsoft Direct3D shaders from our hardware,
> I have a program which translates the Direct3D shader assembly to LLVM
> assembly. I added several intrinsics for this purpose.
> It's a vector ISA and has some special instructions like:
> * rcp (reciprocal)
>  ...
> These operations are very specific to 3D graphics and missing from the
> LLVM instructions.

In case you haven't already noticed, mainline CVS has significantly better 
support for adding target-specific intrinsics like this, and lots of 
examples.  Mainline CVS now supports all of the altivec intrinsics, and a 
big chunk of the SSE ones (still in progress).  Adding support for 
Direct3D shaders should be straight-forward.

Take a look at llvm/include/llvm/IntrinsicsPowerPC.td for examples. 
Adding an intrinsic to LLVM now is just a matter of adding it to the 
include/llvm/Intrinsics*.td file and adding a line to your code generator 
.td file.

> DSP and other scientific programs do not permuate the vectors as 
> frequent as 3D programs do. Almost each 3D instruction requires to 
> permuate its operands. For example:
>
>  // Each register is a 4-component vector
>  // the names of the components are x, y, z, w
>  add r0.xy, r1.zxyw, r2.yyyy
>
> The components of r1 and r2 and permuted before the addition, but the
> permeation result is _not_ written backed to r1 and r2. 'zxyw' and
> 'yyyy' are the permutation patterns (they are called 'swizzle').

> 'xy' is called the write mask. The result is written to only x and y
> component of r0. z and w are left untouched.

To support this, and other things, LLVM now has a new shufflevector 
instruction. In particular, you can write this as something like this:

   %r1.1 = ...
   %r0.1 = ...
   ; Swizzle the inputs
   %tmp1 = shufflevector <4 x float> %r1.1, <4 x float> undef,
                         <4 x uint> <uint 3, uint 1, uint 2, uint 0>
   %tmp2 = shufflevector <4 x float> %r0.1, <4 x float> undef,
                         <4 x uint> <uint 2, uint 2, uint 2, uint 2>
   ; do the add
   %tmp3 = add <4 x float> %tmp1, %tmp2
   ; insert the values according to the write mask.
   %r0.2 = shufflevector <4 x float> %r0.1, %tmp3,
                         <4 x uint> <uint ...>

If you are using a selection-dag based code generator, pattern matching 
this as an add with two shuffle inputs and a shuffle result should be 
straight-forward.

Hopefully this helps!

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/