[LLVMdev] Vector swizzling and write masks code generation

Chris Lattner sabre at nondot.org
Thu Sep 27 10:30:55 PDT 2007


On Thu, 27 Sep 2007, Zack Rusin wrote:
> as some of you may know we're in process of experimenting with LLVM in
> Gallium3D (Mesa's new driver model), where LLVM would be used both in the
> software only (by just JIT executing shaders) and hardware (drivers will
> implement LLVM code-generators) cases.

Yep, nifty!

> That is graphics hardware (basically every single programmable gpu) has
> instruction level support for vector swizzling and write masks.

ok

> For example the following represents a valid gpu shader instruction:
> ADD dst.xyz   src1.yxzw  src2.zwxy
> which performs an addition that stores the result to the dst operated (each
> operarand is a vector type of four data elements) The instruction uses source
> swizzle modifiers and destination mask modifier.

Right.

> So if a language is capable of expressing such constructs (as GLSL, HLSL and
> few others are) I'd like to make sure that the code generator is actually
> capable of generating instructions with exactly those semantics.

Ok.  Are you planning to use the LLVM code generator, or roll your own?

> Right now vector operations utilizing swizzling and write masks in LLVM IR
> have to expressed with series of load/extractelement/instertelement/store
> constructs. As in
>
> vec2 = vec4.xy
>
> would end up being:
> %tmp = load <4 x float>* @vec4
> %tmp1 = extractelement <4 x float> %tmp, i32 0
> %tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
> %tmp3 = extractelement <4 x float> %tmp, i32 1
> %tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
> store <2 x float> %tmp4, <2 x float>* @vec2
> or the like.

Yes, you're right.  If you are staying within the same width of operand 
(e.g. vec4 -> vec4) you can use the shufflevector instruction, but if not, 
you have to use insert/extract.

> So I think my options come down to:
>
> 1) figure out a way of having code generator be actually able to combine all
> those IR instructions back into
> OP dst.writemask src1.swizzle1 src2.swizzle2

Yep.  If you're using the LLVM code generator, it makes it reasonably easy 
to pattern match on this sort of thing and/or introduce machine specific 
abstractions to describe them.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/



More information about the llvm-dev mailing list