[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Fri Apr 9 05:29:18 PDT 2021

Anastasia added a comment.

In D85603#2678742 <https://reviews.llvm.org/D85603#2678742>, @sameerds wrote:

> In D85603#2676631 <https://reviews.llvm.org/D85603#2676631>, @Anastasia wrote:
>
>> Regarding the HLL and frontend side, I believe this could be represented fairly similarly in different C/C++-based languages - considering that we already follow the same implementation for existing convergent semantics at least between CUDA and OpenCL. However, it isn't yet in its optimal state and perhaps we can attempt to refine this topic holistically for example also addressing the following rework that removes the need to make everything convergent: https://reviews.llvm.org/D69498. Otherwise, we will likely have to generate the convergent intrinsics absolutely everywhere, which is not ideal!
>
> As far as I could skim through the specs, OpenCL requires that all threads in a workgroup or subgroup encounter a convergent operation. On the other hand, SPIRV and CUDA allow a more general "non-uniform" version that are executed by "currently active" threads. This proposal is general enough to cover both cases (independent of the newer CUDA primitives that take explicit masks).

We do have a new functionality in OpenCL that requires supporting convergent operations in non-uniform CF too:
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#_extended_subgroup_functions
https://llvm.org/PR46199

> Also, this new proposal supersedes https://reviews.llvm.org/D69498. In fact it presents a generalization that can even entirely eliminate the need for the `convergent` attribute. (See Nicolai's older comment about keeping `convergent` for now ... it can be used by frontends who elect to keep the current broken-but-well-known formalism it represents.

Sorry for not being clear - I was talking about two separate threads here (1) generalizing convergent attribute to non-uniform CF that is addressed by this patch and (2) inverting convergent attribute that is addressed in https://reviews.llvm.org/D69498. Just to provide more details regarding (2) - right now in clang we have a logic that adds convergent to every single function because when we parse the function we don't know whether it will call any function in a call tree that would use convergent operations. Therefore we need to be conservative to prevent incorrect optimizations but this is not ideal for multiple reasons. The optimiser can undo all or some of those convergent decorations if it can prove they are not needed. And for the uniform CF convergent operations this was the only "broken" functionality to my memory.

To address this there was an attempt to invert the behavior of convergent attribute in this patch (https://reviews.llvm.org/D69498) then the frontend wouldn't need to generate the attribute everywhere and the optimizer wouldn't need to undo what frontend does. The change in this review doesn't address (2) as far as I can see - it seems it only generalized old convergent semantics to cover the cases with non-uniform CF. I am not clear yet about the details of how and what frontend should generate in IR for this new logic but it looks more complex than before. And if we have to stick to the conservative approach of assuming everything is convergent as it is now this might complicate and slow down the parsing. So I am just checking whether addressing (2) is still feasible with the new approach or it is not a direction we can/should go?

>> Looking at the wording in some parts of your convergent semantics definition there might be options resulting in some tradeoff between tooling complexity and optimization opportunities:
>>
>>   +The
>>   +:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
>>   +intrinsic is typically expected to appear in the header of a natural loop.
>>   +However, it can also appear in non-header blocks of a loop. In that case, the
>>   +loop can generally not be unrolled.
>
> I believe this is meant to say that the formalism //does not forbid// putting the loop intrinsic in a non-header block, but that is not expected in most known cases. It is not an optimization choice that every flow must make.
>
>> I understand this is not in the scope of this work. And I think it is perfectly reasonable to provide experimental support that could help with further evaluation and productization too. However, it would be good to make some preliminary assessment for the frontend support rather soon. What I think could speed up the progress on the frontend/HLL is some sort of description about the conditions where the new intrinsics have to be inserted. My understanding is that the plan is not to expose them to the application code that would require educating the application developers about all the low-level details? Looking at your transformation pass in https://reviews.llvm.org/D69498 it seems that adding those automatically should somehow be possible and you already have some rules defined where and how those can be added? But there are certain things that can be done in IR that are very constrained in AST as it makes Parsing more complicated.
>
> This other review request is likely to demonstrate what you are asking for:
> https://reviews.llvm.org/D85609

Thanks, I was referring to this review indeed. Perhaps it is easier if I spawn a separate discussion there to see whether and how we can apply the same logic to the frontend and also how it combines with the conservative approach of generating convergent attribute everywhere that we have right now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603