[LLVMdev] RFC: Convergent attribute

Owen Anderson resistor at mac.com
Wed May 13 13:17:09 PDT 2015


Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM.  Comments and feedback welcome.

—Owen





In order to make LLVM more suitable for programming models variously called SPMD
and SIMT, we would like to propose a new intrinsic and MachineInstr annotation
called "convergent", which will be used to impose certain control flow and/or
code motion constraints that are necessary for the correct compilation of some
common constructs in these programming models.

Our proposal strives to define the semantics of these annotations *without*
introducing a definition of SPMD/SIMT programming models into LLVM IR.  Rather,
the properties that must be preserved are specified purely in terms of single
thread semantics.  This allows pass authors to reason about the constraints
without having to consider alternative programming models.  The downside to
this approach is that the motivation and necessity of these constraints in not
easily understood without understanding the programming model from which they
derive.

*** WHAT ***

(Thanks to Phil Reames for input on this definition.)

An operation marked convergent may be transformed or moved within the program
if and only the post-transform placement of the convergent operation is
control equivalent (A dominated B, B post-dominates A, or vice-versa) to
its original position.

This definition is overly strict with respect to some SPMD/SIMT models,
but cannot be relaxed without introducing a specific model into LLVM IR. We
believe it is important for LLVM itself to remain agnostic to any specific
model.  This allows core passes to preserve correctness for stricter models,
while more relaxed models can implement additional transforms that use
weaker constraints on top of core LLVM.

*** HOW ***

Once the attribute has been added, we anticipate the following changes to
optimization passes will be required:
  - Restrict Sink and MachineSink for convergent operations
  - Disabling PRE for convergent operations
  - Disabling jump threading of convergent operations
  - Auditing SimplifyCFG for additional transforms that break convergent guarantees

*** WHY ***

SPMD/SIMT programming models are a family of related programming models in
which multiple threads execute in a per-instruction lockstep fashion.
Predication is typically used to implement acyclic control flow that would
otherwise diverge the PC address of the lockstep threads.

In these models, each thread's register set is typically indepedent, but there
exist a small number of important circumstances in which a thread may access
register storage from one of its lockstep neighbors.  Examples include gradient
computation for texture lookups, as well a cross-thread broadcast and shuffle
operations.

These operations that provide access to another thread's register storage pose
a particular challenge to the compiler, particularly when combined with the
use of predication for control flow.  Consider the following example:

// texture lookup that computes gradient of r0, last use of r0
r1 = texture2D(..., r0, ...)
if (...) {
  // r0 used as temporary here
  r0 = ...
  r2 = r0 + ...
} else {
  // only use of r1
  r2 = r1 + ...
}

In this example, various optimizations might try to sink the texture2D operation
into the else block, like so:

if (...) {
  r0 = ...
  r2 = r0 + ...
} else {
  r1 = texture2D(..., r0, ...)
  r2 = r1 + ...
}

At this point, it starts to become clear that a problem can occur when two
neighbor threads want to take different paths through the if-else construct.
Logically, the thread that wishes to execute the texture2D races with its
neighbor to reads the neighbor's value of r0 before it gets overridden.

In most SPMD/SIMT implementations, the fallout of this races is exposed via
the predicated expression of acyclic control flow:

pred0 <- cmp ...
if (pred0)  r0 = ...
if (pred0)  r2 = r0 + ...
if (!pred0) r1 = texture2D(..., r0, ...)
if (!pred0) r2 = r1 + ...

If thread 0 takes the else path and perform the texture2D operation, but
its neighbor thread 1 takes the then branch, then the texture2D will fail
because thread 1 has already overwritten its value of r0 before thread 0 has
a chance to read it.






More information about the llvm-dev mailing list