[LLVMdev] llvm, gpu execution environments
keith at tungstengraphics.com
Fri May 18 03:23:24 PDT 2007
I'm interested in understanding the extent of the assumptions which llvm
makes about the types of hardware it is capable of targeting.
In particular, I'm investigating a proposal by Zack Rusin to use llvm as
the shader compilation engine within Mesa, targeting GPU backends.
I'm aware of the Apple GLSL compiler, and also I've seen the Vector LLVA
paper. However, I'm not sure that either of these quite bridges the gap
to the execution environment provided by modern GPUs.
Though there are a couple of question marks, I'll pick the most obvious
It seems that LLVA and by extension Vector-LLVA assumes that looping and
branching control flow can be expressed in terms of a simple "br" branch
Typically GPU environments cannot provide such a facility as they tend
to run 16, 32 or 64 simd threads all with the same program counter.
Though this is a wide vector environment, each of the threads is
typically a scalar program and at any branch point, some of those
threads may take the branch and some not. So, to provide dynamic
branching facilities in this environment, you end up with per-channel
execution masks, and opcodes like "IF", "THEN", and "ELSE" which
manipulate those per-channel masks, and use stack semantics for pushing
and popping masks to emulate nested control structures.
This is probably all very familiar to anybody who's thought about simd
program execution. But it means that GPUs, and low-level GPU
abstractions tend not to have branch instructions.
The question then, is to what extent it is possible to target this type
of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???
Is it necessary (or feasible) to try to analyse LLVA programs and
extract IF/THEN/ELSE semantics from a set of arbitary branch instructions?
Is it possible to extend LLVA with these 'high level' control flow
instructions and end up generating those instead of branches, and if so
how does that affect the rest of LLVM?
Is it for some reason just not feasible at all?
More information about the llvm-dev