[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Thu Apr 15 05:09:05 PDT 2021

Anastasia added a comment.

>> To address this there was an attempt to invert the behavior of convergent attribute in this patch (https://reviews.llvm.org/D69498) then the frontend wouldn't need to generate the attribute everywhere and the optimizer wouldn't need to undo what frontend does. The change in this review doesn't address (2) as far as I can see - it seems it only generalized old convergent semantics to cover the cases with non-uniform CF. I am not clear yet about the details of how and what frontend should generate in IR for this new logic but it looks more complex than before. And if we have to stick to the conservative approach of assuming everything is convergent as it is now this might complicate and slow down the parsing. So I am just checking whether addressing (2) is still feasible with the new approach or it is not a direction we can/should go?
>
> To be honest, I was not aware of this other effort, and even after you pointed it out, I wasn't paying attention to the words that I was reading. It seems like the current spec has so far focussed on demonstrating the soundness of the formalism. But I think it is possible to cover (2), which is to make the default setting conservative. This will need a bit of a rewording. In particular, this definition from the spec:
>
>   The convergence control intrinsics described in this document and convergent
>   operations that have a ``convergencectrl`` operand bundle are considered
>   *controlled* convergent operations.
>   
>   Other convergent operations are *uncontrolled*.
>
> This needs to be inverted in the spirit of D69498 <https://reviews.llvm.org/D69498>. I would propose the following tweak:
>
> 1. By default, every call has an implicit `convergencectrl` bundle with a token returned by the `@llvm.experimental.convergence.entry` intrinsic from the entry block of the caller. This default is the most conservative setting within the semantics defined here.
> 2. A more informed frontend or a suitable transformation can replace this conservative token with one of the following:
>   1. A token returned by any of the other intrinsics, which provides more specific information about convergence at this callsite.
>   2. A predefined constant token (say `none`), which indicates complete freedom. This would be equivalent to the `noconvergent` attribute proposed in D69498 <https://reviews.llvm.org/D69498>.
>
> Such a rewording would invert how we approach the spec. Instead of a representation that explicitly talks about special intrinsics that "need" convergence, the new semantics applies to all function calls. The redefined default is conservative instead of free, and the presence of the bundles relaxes the default instead of adding constraints.

Sounds good. If that would be acceptable to the wider community it might help to simplify the frontend design and improve the user interface and the coherence of the interfaces within the compiler stack too.

FYI, if we forced early inlining in the LLVM stack, the frontend would not need to mark every function as convergent conservatively but in the Compute scenarios we occasionally have very large functions that when inlined result in huge binaries and longer compilation time. And we also have extern functions too that we have no information of during the compilation. So this doesn't seem like a route we can safely take at least not for all languages.

If we invert the convergent logic then we can add **nocovergent** attribute or even a pragma directive for the application developers to indicate what code doesn't contain cross-threads operations and can be optimized more aggressively.

> Also, answering one of your comments in the other review (D85609#inline-943432 <https://reviews.llvm.org/D85609#inline-943432>) about the relevance of the `llvm.experimental.convergence.anchor`, this intrinsic cannot be inferred by the frontend. It represents a new ability to represent optimization opportunities like the one demonstrated in the "opportunistic convergence" example. The intrinsic says that the call that uses this token doesn't depend on any specific set of threads, but merely marks the threads that do reach it. This is most useful when multiple calls agree on the same set of threads. Identifying such sets of operations will need help from the user (or more realistically, a library writer). Something like the following might work, where the actual value of `group` doesn't really matter beyond relating the various calls to each other.
>
>   auto group = non_uniform_group_active_workitems();
>   op1(group);
>   if (C)
>      op2(group);
>   op3(group);

Ok, this makes sense. Thanks for clarifications.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603