[PATCH] D18162: AMDGPU: Add SIWholeQuadMode pass

Mon Mar 21 11:14:08 PDT 2016

nhaehnle added a comment.

For kill: Okay, side effects do not imply convergent, but the kill can only be moved in a way that preserves its execution. So IR-level optimizations are allowed to move a kill into both the if and the else branch of a subsequent if-else block, but this still results in correct code: both branches may mask away some bits of EXEC, but since the control flow is joined again with a bit-wise OR, that's okay.

In http://reviews.llvm.org/D18162#379477, @arsenm wrote:

> In http://reviews.llvm.org/D18162#378752, @nhaehnle wrote:
>
> >
>
>
>
>
> > The derivative taken by the llvm.SI.image.sample is undefined in GLSL if the control-flow is dynamically non-uniform, so it is perfectly legal to sink the llvm.amdgcn.image.load into the IF block (and the same applies to any other computation that leads to a derivative).
>
>
> This is the same situation as barriers, so I think this should still be convergent. The problem it is solving is if LLVM introduces uses that do not hit uniform control flow, like introducing a call in either side of an if/then block

I do not think this is the same situation as barriers. In barriers, they need to be convergent because all threads have to execute the same barrier instruction (at the same program counter) for the semantics to remain correct.

If (plain) loads are pushed down into either side of an if/else block, that's inefficient but correct (of course assuming no stores in between etc.). There is no synchronization or data exchange between threads, so all that matters is that the right value is loaded into the right register on a per-thread basis; which instructions do the job at which program counter value is not important.

If I am still misunderstanding you, perhaps an example would be helpful.

http://reviews.llvm.org/D18162