[llvm-dev] RFC: make calls "convergent" by default

Tue Jun 1 04:58:33 PDT 2021

TL;DR
=====

We propose the following changes to LLVM IR in order to better support
operations that are sensitive to the set of threads that execute them
together:

- Redefine "convergent" in terms of thread divergence in a
  multi-threaded execution.
- Fix all optimizations that examine the "convergent" attribute to also
  depend on divergence analysis. This avoids any impact on CPU
  compilation since control flow is always uniform on CPUs.
- Make all function calls "convergent" by default (D69498). Introduce a
  new "noconvergent" attribute, and make "convergent" a nop.
- Update the "convergence tokens" proposal to take into account this new
  default property (D85603).

Motivation
==========

This effort is necessary because the current "convergent" attribute is
considered under-defined and sorely needs replacement.

1. On GPU targets, the "convergent" attribute is required for
   correctness. This is unlike other attributes that are only
   used as optimization hints. Missing an attribute should not
   result in a miscompilation.

2. The current definition of "convergent" attribute does not precisely
   represent the constraints on the compiler for a GPU target. The
   actual implementation in LLVM sources is far more conservative than
   what the definition says.

3. Due to the same lack of precision, the attribute cannot properly
   represent the side-effects of jump threading on a GPU program.

Background
==========

This RFC is a continuation of a discussion split across the following
two reviews. The two reviews compose well to cover all the shortcomings
of the convergent attribute.

  D69498: IR: Invert convergent attribute handling
  https://reviews.llvm.org/D69498

The above review aims to make all function calls "convergent" by
default, but it received strong opposition due to the requirement that
CPU frontends must now emit a new "noconvergent" attribute on every
function call.

  D85603: IR: Add convergence control operand bundle and intrinsics
  https://reviews.llvm.org/D85603

The above review defines a "convergent operation" in terms of divergent
control flow in multi-threaded executions. It introduces a "convergence
token" passed as an operand bundle argument at a call, representing the
set of threads that together execute that call. This review has
progressed to the point where there don't seem to be any major
objections to it, but there is some interest in combining it with the
original idea of making all calls convergent by default.

Terms Used
==========

The following definitions are paraphrased from D85603:

Convergent Operation

  Some parallel execution environments execute threads in groups that
  allow efficient communication within each group. When control flow
  diverges, i.e. threads of the same group follow different paths
  through the CFG, not all threads of the group may be available to
  participate in this communication. A convergent operation involves
  inter-thread communication or synchronization that occurs outside of
  the memory model, where the set of threads which participate in
  communication is implicitly affected by control flow.

Dynamic Instance

  Every execution of an LLVM IR instruction occurs in a dynamic instance
  of the instruction. Different executions of the same instruction by a
  single thread give rise to different dynamic instances of that
  instruction. Executions of different instructions always occur in
  different dynamic instances. Executions of the same instruction by
  different threads may occur in the same dynamic instance. When
  executing a convergent operation, the set of threads that execute the
  same dynamic instance is the set of threads that communicate with each
  other for that operation.

Optimization Constraints due to Convergent Calls
================================================

In general, an optimization that modifies control flow in the program
must ensure that the set of threads executing each dynamic instance of a
convergent call is not affected.

By default, every call in LLVM IR is assumed to be convergent. A
frontend may further relax this in the following ways:

  1. The "noconvergent" attribute may be added to indicate that a call
     is not sensitive to the set of threads executing any dynamic
     instance of that call.

  2. A "convergencectrl" operand bundle may be passed to the call. The
     semantics of such a "token", provides fine-grained control over the
     transforms possible near the callsite.

The overall effect is to make the notion of convergence and divergence a
universal property of LLVM IR. This provides a "safe default" in the IR
semantics, so that frontends and optimizations cannot produce incorrect
IR on a GPU target by merely missing an attribute.

At the same time, there is no effect on CPU optimizations. An
optimization may use divergence analysis along with the above
information to determine if a transformation is possible. The only
impact on CPU compilation flows is the addition of divergence analysis
as a dependency when checking for convergent operations. This analysis
is trivial on CPUs where branches do not have divergence and hence all
control flow is uniform.

Implementation
==============

The above proposal will be implemented as follows:

1. Optimizations that check for convergent operations will be updated to
   depend on divergent analysis. For example, the following change will
   be made in llvm/lib/Transforms/Scalar/Sink.cpp:

   Before:

     bool isSafeToMove(Instruction *Inst) {
         ...
         if (auto *Call = dyn_cast<CallBase>(Inst)) {
             ...
             if (Call->isConvergent())
                 return false;
             ...
         }
     }

   After:

     bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) {
         ...
         // don't sink a convergent call across a divergent branch
         if (auto *Call = dyn_cast<CallBase>(Inst)) {
             ...
             auto Term = Inst->getParent()->getTerminator();
             if (Call->isConvergent() && DA.isDivergent(Term))
                 return false;
             ...
         }
     }

2. D69498 will be updated so that the convergent property is made
   default, but the new requirements on CPU frontends will be retracted.

3. D85603 will be revised to include the new default convergent
   property.

Thanks,
Sameer.