[LLVMdev] llvm, gpu execution environments
sabre at nondot.org
Sat May 19 01:01:57 PDT 2007
On Fri, 18 May 2007, Keith Whitwell wrote:
> I'm interested in understanding the extent of the assumptions which llvm
> makes about the types of hardware it is capable of targeting.
Different pieces of the compiler make different assumptions. In
particular, the code generator we ship is good for targetting certain
classes of devices, but isn't fully general (it doesn't help if you're
synthesizing a netlist from llvm, for example).
> In particular, I'm investigating a proposal by Zack Rusin to use llvm as
> the shader compilation engine within Mesa, targeting GPU backends.
> It seems that LLVA and by extension Vector-LLVA assumes that looping and
> branching control flow can be expressed in terms of a simple "br" branch
LLVA is not a part of LLVM, so I won't answer for it.
> Typically GPU environments cannot provide such a facility as they tend
> to run 16, 32 or 64 simd threads all with the same program counter.
> Though this is a wide vector environment, each of the threads is
> typically a scalar program and at any branch point, some of those
> threads may take the branch and some not. So, to provide dynamic
> branching facilities in this environment, you end up with per-channel
> execution masks, and opcodes like "IF", "THEN", and "ELSE" which
> manipulate those per-channel masks, and use stack semantics for pushing
> and popping masks to emulate nested control structures.
Right, it's basically a form of predication.
> This is probably all very familiar to anybody who's thought about simd
> program execution. But it means that GPUs, and low-level GPU
> abstractions tend not to have branch instructions.
> The question then, is to what extent it is possible to target this type
> of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???
> Is it necessary (or feasible) to try to analyse LLVA programs and
> extract IF/THEN/ELSE semantics from a set of arbitary branch instructions?
> Is it possible to extend LLVA with these 'high level' control flow
> instructions and end up generating those instead of branches, and if so
> how does that affect the rest of LLVM?
The code generator and llvm should be able to handle this just fine, with
only minimal extensions.
Basically, you want to model this as predicated execution, and you want
the code generator to predicate away as many branches etc as possible.
One observation can be made though: there will always be some programs
that you can't map onto the hardware. For example, if you don't have
branches, you can't do loops that execute for a variable number of
As such, I'd structure the compiler as a typical code generator with an
early predication pass that flattens branches. If you get to the end of
the codegen and have some dynamic branches left, you detect the error
condition and reject the shader from the hardware path (so you have to
emulate it in software).
Does this make sense?
More information about the llvm-dev