[LLVMdev] Vector swizzling and write masks code generation

Thu Sep 27 08:16:40 PDT 2007

Hi Zack,

On Sep 27, 2007, at 09:54, Zack Rusin wrote:

> as some of you may know we're in process of experimenting with LLVM  
> in Gallium3D (Mesa's new driver model), where LLVM would be used  
> both in the software only (by just JIT executing shaders) and  
> hardware (drivers will implement LLVM code-generators) cases.

Neat.

> That is graphics hardware (basically every single programmable gpu)  
> has instruction level support for vector swizzling and write masks.
>
> For example the following represents a valid gpu shader instruction:
> ADD dst.xyz   src1.yxzw  src2.zwxy
> which performs an addition that stores the result to the dst  
> operated (each operarand is a vector type of four data elements)  
> The instruction uses source swizzle modifiers and destination mask  
> modifier.
>
> So if a language is capable of expressing such constructs (as GLSL,  
> HLSL and few others are) I'd like to make sure that the code  
> generator is actually capable of generating instructions with  
> exactly those semantics.
>
> Right now vector operations utilizing swizzling and write masks in  
> LLVM IR have to expressed with series of load/extractelement/ 
> instertelement/store constructs. As in
>
> vec2 = vec4.xy
>
> would end up being:
> %tmp = load <4 x float>* @vec4
> %tmp1 = extractelement <4 x float> %tmp, i32 0
> %tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
> %tmp3 = extractelement <4 x float> %tmp, i32 1
> %tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1	
> store <2 x float> %tmp4, <2 x float>* @vec2
> or the like.

Loads and stores are always explicit; the code generator will fold  
them into the machine instructions if possible.

You may be able to take advantage of the shufflevector instruction.  
Although its result will be a <4 x float> instead of a <2 x float>,  
same as the source vector. So you'll need to find a way to write  
"extract subvector" that codegens well. Perhaps this will work:

%shufvec = shufflevector <4 x float> ...
%src1 = extractelement %shufvec, 0
%src2 = extractelement %shufvec, 1
%tmp = insertelement <2 x float> undef, %src1, 0
%res = insertelement %tmp, %src2, 1

If that's no good, then you might want to add intrinsics to do the  
job. You can then easily pattern match on (llvm.extractvector  
(shufflevector ...), which) where llvm.extractvector is your intrinsic.

> So I think my options come down to:
>
> 1) figure out a way of having code generator be actually able to  
> combine all
> those IR instructions back into
> OP dst.writemask src1.swizzle1 src2.swizzle2
>
> 2) have some kind of instruction level support for it in LLVM IR

It's much easier to define intrinsic functions than instructions, so  
start there if you need to go that route.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070927/78fabd1a/attachment.html>