[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Marek Olšák via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 24 09:42:25 PDT 2017


mareko added a comment.

In https://reviews.llvm.org/D31762#735505, @nhaehnle wrote:

> I don't see anything wrong with the code.
>
> I agree that the design is a bit iffy. It's almost like these intrinsics are something that is part of the calling convention. But even these intrinsics cannot quite lead to optimal code for merged monolithic shaders, because there's an unnecessary initialization of EXEC in the first part of the shader.
>
> Since what we need to do here in general really doesn't fit well into LLVM IR semantics, I suspect that no matter what we come up with, it's bound to be ugly. So we might as well go with this particular solution here.


Merged monolithic shaders set exec = -1 and then they use "if (tid < thread_count) ...". I think that's the only way to jump over the conditional code right now if the wave has thread_count == 0. If we don't want v_mbcnt+v_cmp, we could do something like "if (amdgcn.set.thread_count(n)) ..." that sets EXEC regardless of current EXEC and skips the conditional for thread_count == 0. The performance of that solution is unlikely to justify the implementation effort.


https://reviews.llvm.org/D31762





More information about the llvm-commits mailing list