[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Fri Apr 9 00:53:23 PDT 2021

sameerds added a comment.

In D85603#2676631 <https://reviews.llvm.org/D85603#2676631>, @Anastasia wrote:

> Regarding the HLL and frontend side, I believe this could be represented fairly similarly in different C/C++-based languages - considering that we already follow the same implementation for existing convergent semantics at least between CUDA and OpenCL. However, it isn't yet in its optimal state and perhaps we can attempt to refine this topic holistically for example also addressing the following rework that removes the need to make everything convergent: https://reviews.llvm.org/D69498. Otherwise, we will likely have to generate the convergent intrinsics absolutely everywhere, which is not ideal!

As far as I could skim through the specs, OpenCL requires that all threads in a workgroup or subgroup encounter a convergent operation. On the other hand, SPIRV and CUDA allow a more general "non-uniform" version that are executed by "currently active" threads. This proposal is general enough to cover both cases (independent of the newer CUDA primitives that take explicit masks).

Also, this new proposal supersedes https://reviews.llvm.org/D69498. In fact it presents a generalization that can even entirely eliminate the need for the `convergent` attribute. (See Nicolai's older comment about keeping `convergent` for now ... it can be used by frontends who elect to keep the current broken-but-well-known formalism it represents.

> Looking at the wording in some parts of your convergent semantics definition there might be options resulting in some tradeoff between tooling complexity and optimization opportunities:
>
>   +The
>   +:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
>   +intrinsic is typically expected to appear in the header of a natural loop.
>   +However, it can also appear in non-header blocks of a loop. In that case, the
>   +loop can generally not be unrolled.

I believe this is meant to say that the formalism //does not forbid// putting the loop intrinsic in a non-header block, but that is not expected in most known cases. It is not an optimization choice that every flow must make.

> I understand this is not in the scope of this work. And I think it is perfectly reasonable to provide experimental support that could help with further evaluation and productization too. However, it would be good to make some preliminary assessment for the frontend support rather soon. What I think could speed up the progress on the frontend/HLL is some sort of description about the conditions where the new intrinsics have to be inserted. My understanding is that the plan is not to expose them to the application code that would require educating the application developers about all the low-level details? Looking at your transformation pass in https://reviews.llvm.org/D69498 it seems that adding those automatically should somehow be possible and you already have some rules defined where and how those can be added? But there are certain things that can be done in IR that are very constrained in AST as it makes Parsing more complicated.

This other review request is likely to demonstrate what you are asking for:
https://reviews.llvm.org/D85609

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603