[LLVMdev] [Mesa3d-dev] Folding vector instructions

Tue Dec 30 14:52:00 PST 2008

On Tue, Dec 30, 2008 at 21:30, Chris Lattner <clattner at apple.com> wrote:
> On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote:
>>> However, the special instrucions cannot directly be mapped to LLVM
>>> IR, like
>>> "min", the conversion involves in 'extract' the vector, create
>>> less-than-compare, create 'select' instruction, and create 'insert-
>>> element'
>>> instruction.
>
> Using scalar operations obviously works, but will probably produce
> very inefficient code.  One positive thing is that all target-specific
> operations of supported vector ISAs (Altivec and SSE[1-4] currently)
> are exposed either through LLVM IR ops or through target-specific
> builtins/intrinsics.  This means that you can get access to all the
> crazy SSE instructions, but it means that your codegen would have to
> handle this target-specific code generation.

Well, scalar is surely an option we're aiming at. NV50 or even your
regular FPU are examples of fully scalar architectures. As for SSE
generation, it was solved by using horizontal parallelism (i.e.
processing four fragments or vertices at once) instead of vertical
parallelism. Sadly this doens't work with GPUs.

So what remains are chips that are natively vector GPUs. The question
is more whether we'll be able to have llvm build up vector
instructions from scalar ones, and from my limited testing with SSE
and simple test programs it seemed to work, so I suppose the same can
be obtained from GPU targets.

Stephane