[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Marek Olšák via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 24 10:35:59 PDT 2017


mareko added a comment.

In https://reviews.llvm.org/D31762#735630, @nhaehnle wrote:

> In https://reviews.llvm.org/D31762#735590, @mareko wrote:
>
> > In https://reviews.llvm.org/D31762#735585, @nhaehnle wrote:
> >
> > > Is it at all possible to get merged shaders where either part has thread_count == 0? We might want a way to annotate branches so that the skip-jump for EXEC=0 is not introduced.
> >
> >
> > Yes, thread_count == 0 is possible, and it's explained here: https://patchwork.freedesktop.org/patch/152356/
>
>
> Ah, so that's why the barrier instruction is needed between shader parts? Interesting.


There are two cases when the barrier isn't needed: 1) When GS is processing points without any amplification. 2) When HS has input control points == output control points, and each HS thread doesn't access other threads' inputs. In both cases, the barrier and the LDS traffic can be removed and the previous shader can put outputs into VGPRs to get a fully merged shader.


https://reviews.llvm.org/D31762





More information about the llvm-commits mailing list