[PATCH] D70781: AMDGPU: Fix handling of infinite loops in fragment shaders

Mon Dec 2 08:52:52 PST 2019

cwabbott added a comment.

In D70781#1765237 <https://reviews.llvm.org/D70781#1765237>, @arsenm wrote:

> This seems fine to me, although I know nothing about exports, and I would prefer if we could eventually fix the kill intrinsic design

That would be nice. I've thought about it a little, and it's tricky. Your first instinct might be to try to model the actual thread-level semantics, and have this new kill intrinsic do something like "jump here (which is an empty block with just a return) if this thread is killed but some threads are still live, otherwise jump to a block that just does a null export and then returns." So you'd have three possible exits, the normal one, the "I'm killed but some threads are still live" one, and "all threads are killed." However that's problematic for a number of reasons:

1. You'd have to either worry about handling cases where it doesn't jump to a block containing just a return or add some validation that this is always the case. The former case adds unnecessary complexity, the latter case makes the intrinsic still quite "magic," just in a different way.
2. The exec mask has to be zero when executing the null export in the all-threads-are-killed case. This sort of breaks the idea that this is "just a normal jump".
3. radeonsi actually relies on the exec mask not containing killed threads when returning, because the exports actually happen in an epilog which is a separate function whose code is pasted after the main shader. In other words, for threads that are killed, reconvergance actually doesn't conceptually happen until *after* the epilog. So the semantics here are still misleading, and you have to rely on the "optimization" of making it just turn off the bits in the exec mask for correctness.

I think that more doable would be an intrinsic that would do something like "jump to this block that does a null export if all threads are killed, otherwise jump to this block with the killed threads turned off in exec," and then in the frontend you'd make it jump to the next "normal" block in the case where not everything is killed. So now there are only two branch destinations. This is a little less honest, since if a thread is killed but some remain, then the thread that's killed doesn't actually continue on to the next normal block. However it has less problems.

One remaining problem, which is also a problem with the other idea, is that now you somehow have to get a mask of threads that haven't been killed or returned yet, and computing it is going to involve inserting instructions that are largely redundant with the control-flow instructions inserted by `SILowerControlFlow`, but there's currently no good way to get what we want directly from `SILowerControlFlow`. With some work you'll probably be able to write optimizations to remove these instructions in the simplest of cases, but that still leaves more complex cases with redundant instructions and no easy way to fix it. And of course, because the final null-export block has to be done with an exec mask of 0, you still can't fit it directly into the control-flow machinery without some special cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70781/new/

https://reviews.llvm.org/D70781