[LLVMdev] llvm, gpu execution environments

Fri May 18 03:23:24 PDT 2007

I'm interested in understanding the extent of the assumptions which llvm 
makes about the types of hardware it is capable of targeting.

In particular, I'm investigating a proposal by Zack Rusin to use llvm as 
the shader compilation engine within Mesa, targeting GPU backends.

I'm aware of the Apple GLSL compiler, and also I've seen the Vector LLVA 
paper.  However, I'm not sure that either of these quite bridges the gap 
to the execution environment provided by modern GPUs.

Though there are a couple of question marks, I'll pick the most obvious 
one:

It seems that LLVA and by extension Vector-LLVA assumes that looping and 
branching control flow can be expressed in terms of a simple "br" branch 
operation.

Typically GPU environments cannot provide such a facility as they tend 
to run 16, 32 or 64 simd threads all with the same program counter. 
Though this is a wide vector environment, each of the threads is 
typically a scalar program and at any branch point, some of those 
threads may take the branch and some not.  So, to provide dynamic 
branching facilities in this environment, you end up with per-channel 
execution masks, and opcodes like "IF", "THEN", and "ELSE" which 
manipulate those per-channel masks, and use stack semantics for pushing 
and popping masks to emulate nested control structures.

This is probably all very familiar to anybody who's thought about simd 
program execution.  But it means that GPUs, and low-level GPU 
abstractions tend not to have branch instructions.

The question then, is to what extent it is possible to target this type 
of execution environment with LLVM and the LLVA/Vector-LLVA ISAs???

Is it necessary (or feasible) to try to analyse LLVA programs and 
extract IF/THEN/ELSE semantics from a set of arbitary branch instructions?

Is it possible to extend LLVA with these 'high level' control flow 
instructions and end up generating those instead of branches, and if so 
how does that affect the rest of LLVM?

Is it for some reason just not feasible at all?

Keith