[LLVMdev] Vector swizzling and write masks code generation

Thu Sep 27 06:54:10 PDT 2007

Hey,

as some of you may know we're in process of experimenting with LLVM in 
Gallium3D (Mesa's new driver model), where LLVM would be used both in the 
software only (by just JIT executing shaders) and hardware (drivers will 
implement LLVM code-generators) cases.

While the software only case is pretty straight forward I just realized I 
missed something in my initial evaluation. 

That is graphics hardware (basically every single programmable gpu) has 
instruction level support for vector swizzling and write masks.

For example the following represents a valid gpu shader instruction:
ADD dst.xyz   src1.yxzw  src2.zwxy
which performs an addition that stores the result to the dst operated (each 
operarand is a vector type of four data elements) The instruction uses source 
swizzle modifiers and destination mask modifier.

So if a language is capable of expressing such constructs (as GLSL, HLSL and 
few others are) I'd like to make sure that the code generator is actually 
capable of generating instructions with exactly those semantics. 

Right now vector operations utilizing swizzling and write masks in LLVM IR 
have to expressed with series of load/extractelement/instertelement/store 
constructs. As in 

vec2 = vec4.xy 

would end up being:
%tmp = load <4 x float>* @vec4
%tmp1 = extractelement <4 x float> %tmp, i32 0
%tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
%tmp3 = extractelement <4 x float> %tmp, i32 1
%tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1	
store <2 x float> %tmp4, <2 x float>* @vec2
or the like.

So I think my options come down to:

1) figure out a way of having code generator be actually able to combine all 
those IR instructions back into 
OP dst.writemask src1.swizzle1 src2.swizzle2

2) have some kind of instruction level support for it in LLVM IR

With my limited knowledge of code generators in LLVM I don't see a way of 
doing #1 and I'm afraid #2 might be the only option.
I'd appreciate any ideas and/or comments that could potentially help to solve 
this problem.

z