[LLVMdev] llvm, gpu execution environments

Wed May 23 00:58:11 PDT 2007

Chris Lattner wrote:
> On Sat, 19 May 2007, Keith Whitwell wrote:
>>>> It seems that LLVA and by extension Vector-LLVA assumes that looping and
>>>> branching control flow can be expressed in terms of a simple "br" branch
>>>> operation.
>>> LLVA is not a part of LLVM, so I won't answer for it.
>> OK, I guess I misunderstood the papers I pulled down - my impression was
>> that at some stage programs in llvm would be represented in LLVA.
>>
>> What, out of interest, is the relationship between LLVM and LLVA?
> 
> Vikram already answered this, but LLVA is a research project that uses 
> LLVM.  LLVM does already have vector support in place.
> 
>>> Basically, you want to model this as predicated execution, and you want
>>> the code generator to predicate away as many branches etc as possible.
>>>
>>> One observation can be made though: there will always be some programs
>>> that you can't map onto the hardware.  For example, if you don't have
>>> branches, you can't do loops that execute for a variable number of
>>> iterations.
>> Actually, you can - there is a program counter, the loop keeps executing
>> until the execution mask reaches zero.  Likewise branches are dynamic.
> 
> Ok, but can you supports 4 level deep loops with arbitrary indexed loads 
> in them?

Yes, absolutely.

There may be limits to the depth of the stacks that underly the 
predication mechanism, and there may be special actions required to be 
taken when those stacks are exhausted, but this type of detail varies 
from GPU to GPU.

The hardware we have access to (Intel i965) just needs some hand-holding 
when the predication stacks max out.

Some GPUs may use an entirely different internal architecture and 
implementation details like stacks and predication may not apply.  But 
the evidence from benchmarks of dynamic branching behaviour, etc, on 
those GPUs suggests things must be pretty similar.

>>> As such, I'd structure the compiler as a typical code generator with an
>>> early predication pass that flattens branches.  If you get to the end of
>>> the codegen and have some dynamic branches left, you detect the error
>>> condition and reject the shader from the hardware path (so you have to
>>> emulate it in software).
>> The hardware *does* support dynamic branching, and looping, provided it
>> is expressed in IF/THEN/ELSE, LOOP/BREAK/CONTINUE/ENDLOOP type
>> instructions.  Even CALL/RETURN.  The only thing it can't do is execute
>> something like "GOTO" or "BRANCH" dynamically.
> 
> Ahh, ok.  Very interesting.
> 
>>> Does this make sense?
>> Yes, but at slight cross-purposes.
>>
>> There are no cases where compilation should fail to produce a hardware
>> executable result, within the constraints of the high-level language we
>> are compiling.  Dynamic branches and looping are entirely within the
>> capability of the hardware, provided they are expressed in terms of the
>> hardware IF/THEN/ELSE, LOOP/ENDLOOP, etc, opcodes.
> 
> Okay.
> 
>> But it seems like my initial understanding of the intermediate
>> representation within llvm is incorrect & I probably should just dive
>> into the source to figure out what's going on.
> 
> Always good :)
> 
>> My concern was that llva throws away the information that I'd probably 
>> need to reconstruct these high-level opcodes required by the hardware - 
>> if the code generator can come in at a higher level while that 
>> information still exists, then a lot of things get easier.
> 
> s/llva/llvm/  But yes, you're right.  Reconstructing loops etc from LLVM 
> is actually really easy, but we don't have good support for it in the code 
> generator yet.  This is a desired area of extension that we'd like to do 
> at some point, see http://llvm.org/PR1353

OK, I'll take a look.

Thanks,

Keith