[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Mon Apr 24 10:16:39 PDT 2017

mareko added a comment.

In https://reviews.llvm.org/D31762#735585, @nhaehnle wrote:

> In https://reviews.llvm.org/D31762#735528, @mareko wrote:
>
> > In https://reviews.llvm.org/D31762#735505, @nhaehnle wrote:
> >
> > > I don't see anything wrong with the code.
> > >
> > > I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.
> > >
> > > Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.
> >
> >
> > Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.
>
>
> Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.

Yes, thread_count == 0 is possible, and it's explained here: https://patchwork.freedesktop.org/patch/152356/

https://reviews.llvm.org/D31762