[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Thu Aug 13 04:02:00 PDT 2020

nhaehnle added inline comments.

================
Comment at: llvm/docs/ConvergentOperations.rst:174-175
+
+The compiler frontend can emit IR that expresses the convergence constraints
+as follows:
+
----------------
sameerds wrote:
> But this use of the intrinsics does not add any new constraints, right? This specific optimization is already sufficiently constrained by control dependence.
It doesn't add any constraints for existing generic transforms in LLVM that I'm aware of, but there's still a bit of non-trivial content to it at least in theory. Whether it matters in practice depends on the backend.

E.g., it doesn't matter for AMDGPU, but modern versions of CUDA say that some sort of divergence can basically happen at any point in the program. If you wanted to take code that uses the convergent operations and translate it to CUDA builtins, the control intrinsics make a difference. In that case, you'd want the uniform threadmask to replace the entry intrinsic. If it was an anchor somewhere instead, you'd want to replace the anchor by `__activemask()` and then use its return value. In both cases, you'd possibly modify the mask somehow to account for additional control dependencies between the anchor and its use. This "modify the mask somehow" hides a lot of complexity, but thinking about it quite a bit I believe it's a similar amount of complexity to what we have in the AMDGPU backend to make things work, probably less because more of the burden is shouldered by hardware in the end.

Plus there's the composability aspect of it if we're talking about functions that aren't kernel entry points and might be inlined.

================
Comment at: llvm/docs/ConvergentOperations.rst:743-744
+
+This problem goes away by inserting a loop heart intrinsic, which establishes
+a relationship between loop iterations across threads.
+
----------------
sameerds wrote:
> The exhausted reader just begs to see the //corrected// version at this point! :) 
The exhausted author is taking a note and will get around to it soon ;)

================
Comment at: llvm/docs/ConvergentOperations.rst:807-809
+There is a cycle (closed walk in the CFG) that goes through both loop heart
+intrinsics using ``%anchor`` but not through the definition of ``%anchor``,
+so this code is invalid.
----------------
sameerds wrote:
> Following the structure of previous examples, it would be good to have a demonstration of how this can result in misinterpreted convergence. That would explain why this example //should be// illegal. This paragraph directly applies the rules to show how the example is //recognized// as illegal.
Isn't it just the same as in the example directly above? You'd expand C / E to a longer sequence of what happens in those inner loops, but the essentially difficulty is the same.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603