[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Mon Apr 24 10:13:50 PDT 2017

nhaehnle added a comment.

In https://reviews.llvm.org/D31762#735528, @mareko wrote:

> In https://reviews.llvm.org/D31762#735505, @nhaehnle wrote:
>
> > I don't see anything wrong with the code.
> >
> > I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.
> >
> > Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.
>
>
> Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.

Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Yeah, I looked at the monolithic shader stuff briefly. I think the LLVM IR is fine. Adding another intrinsic for switching EXEC in the middle of a shader is bound to run into lots of problems in CodeGen around scheduling and such.

The LLVM CodeGen could generally grow some more smarts around EXEC and perhaps even pattern-match the v_mbcnt+v_cmp. I also think it's pretty low priority though.

https://reviews.llvm.org/D31762