[LLVMdev] Vector swizzling and write masks code generation
Chris Lattner
sabre at nondot.org
Thu Sep 27 10:30:55 PDT 2007
On Thu, 27 Sep 2007, Zack Rusin wrote:
> as some of you may know we're in process of experimenting with LLVM in
> Gallium3D (Mesa's new driver model), where LLVM would be used both in the
> software only (by just JIT executing shaders) and hardware (drivers will
> implement LLVM code-generators) cases.
Yep, nifty!
> That is graphics hardware (basically every single programmable gpu) has
> instruction level support for vector swizzling and write masks.
ok
> For example the following represents a valid gpu shader instruction:
> ADD dst.xyz src1.yxzw src2.zwxy
> which performs an addition that stores the result to the dst operated (each
> operarand is a vector type of four data elements) The instruction uses source
> swizzle modifiers and destination mask modifier.
Right.
> So if a language is capable of expressing such constructs (as GLSL, HLSL and
> few others are) I'd like to make sure that the code generator is actually
> capable of generating instructions with exactly those semantics.
Ok. Are you planning to use the LLVM code generator, or roll your own?
> Right now vector operations utilizing swizzling and write masks in LLVM IR
> have to expressed with series of load/extractelement/instertelement/store
> constructs. As in
>
> vec2 = vec4.xy
>
> would end up being:
> %tmp = load <4 x float>* @vec4
> %tmp1 = extractelement <4 x float> %tmp, i32 0
> %tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
> %tmp3 = extractelement <4 x float> %tmp, i32 1
> %tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
> store <2 x float> %tmp4, <2 x float>* @vec2
> or the like.
Yes, you're right. If you are staying within the same width of operand
(e.g. vec4 -> vec4) you can use the shufflevector instruction, but if not,
you have to use insert/extract.
> So I think my options come down to:
>
> 1) figure out a way of having code generator be actually able to combine all
> those IR instructions back into
> OP dst.writemask src1.swizzle1 src2.swizzle2
Yep. If you're using the LLVM code generator, it makes it reasonably easy
to pattern match on this sort of thing and/or introduce machine specific
abstractions to describe them.
-Chris
--
http://nondot.org/sabre/
http://llvm.org/
More information about the llvm-dev
mailing list