[llvm-dev] [LLVMdev] RFC: Convergent attribute

James Molloy via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 14 00:56:18 PDT 2015


Hi Mehdi,

My reading of it is that if you have a convergent instruction A, it is
legal to duplicate it to instruction B if (assuming B is after A in program
flow) A dominates B and B post-dominates A.

James

On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Jingyue,
>
> Convergent is not intended to prevent inlining.  It’s tricky to formalize
> this inter-procedurally, but the intended interpretation is that a
> convergent operation cannot be move either into or out of a conditionally
> executed region.  Normal inlining would not violate that.
>
> I would imagine that it would make sense to use a combination of
> convergent and noduplicate for barrier-like operations.
>
>
> Isn’t convergent implying “noduplicate” inside a function?
> It’s late but I’m not sure I can figure when a transformation would be
> allowed to duplicate a call to a convergent intrinsic?
>
>> Mehdi
>
>
>
> —Owen
>
> On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com> wrote:
>
> Hi Owen,
>
> According to your design, is LLVM supposed to (partially) disallow
> inlining a function that has convergent instructions? It's hard to define
> control equivalent inter-procedurally. For example, if a function
> containing a convergent instruction is called at two call sites, inlining
> the function produces two convergent instructions. Neither of the two is
> control equivalent to the original, but they combined are in some sense.
>
> I came across this when I am thinking whether __syncthreads in CUDA should
> be tagged "convergent'. Right now, it's tagged as noduplicate so inlining
> and loop unrolling are disallowed. But I think noduplicate is too strong
> for the semantics of convergent.
>
> Jingyue
>
> On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com> wrote:
>
>> Below is a proposal for a new "convergent" intrinsic attribute and
>> MachineInstr property, needed for correctly modeling many SPMD/SIMT
>> programming models in LLVM.  Comments and feedback welcome.
>>
>> —Owen
>>
>>
>>
>>
>>
>> In order to make LLVM more suitable for programming models variously
>> called SPMD
>> and SIMT, we would like to propose a new intrinsic and MachineInstr
>> annotation
>> called "convergent", which will be used to impose certain control flow
>> and/or
>> code motion constraints that are necessary for the correct compilation of
>> some
>> common constructs in these programming models.
>>
>> Our proposal strives to define the semantics of these annotations
>> *without*
>> introducing a definition of SPMD/SIMT programming models into LLVM IR.
>> Rather,
>> the properties that must be preserved are specified purely in terms of
>> single
>> thread semantics.  This allows pass authors to reason about the
>> constraints
>> without having to consider alternative programming models.  The downside
>> to
>> this approach is that the motivation and necessity of these constraints
>> in not
>> easily understood without understanding the programming model from which
>> they
>> derive.
>>
>> *** WHAT ***
>>
>> (Thanks to Phil Reames for input on this definition.)
>>
>> An operation marked convergent may be transformed or moved within the
>> program
>> if and only the post-transform placement of the convergent operation is
>> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
>> its original position.
>>
>> This definition is overly strict with respect to some SPMD/SIMT models,
>> but cannot be relaxed without introducing a specific model into LLVM IR.
>> We
>> believe it is important for LLVM itself to remain agnostic to any specific
>> model.  This allows core passes to preserve correctness for stricter
>> models,
>> while more relaxed models can implement additional transforms that use
>> weaker constraints on top of core LLVM.
>>
>> *** HOW ***
>>
>> Once the attribute has been added, we anticipate the following changes to
>> optimization passes will be required:
>>   - Restrict Sink and MachineSink for convergent operations
>>   - Disabling PRE for convergent operations
>>   - Disabling jump threading of convergent operations
>>   - Auditing SimplifyCFG for additional transforms that break convergent
>> guarantees
>>
>> *** WHY ***
>>
>> SPMD/SIMT programming models are a family of related programming models in
>> which multiple threads execute in a per-instruction lockstep fashion.
>> Predication is typically used to implement acyclic control flow that would
>> otherwise diverge the PC address of the lockstep threads.
>>
>> In these models, each thread's register set is typically indepedent, but
>> there
>> exist a small number of important circumstances in which a thread may
>> access
>> register storage from one of its lockstep neighbors.  Examples include
>> gradient
>> computation for texture lookups, as well a cross-thread broadcast and
>> shuffle
>> operations.
>>
>> These operations that provide access to another thread's register storage
>> pose
>> a particular challenge to the compiler, particularly when combined with
>> the
>> use of predication for control flow.  Consider the following example:
>>
>> // texture lookup that computes gradient of r0, last use of r0
>> r1 = texture2D(..., r0, ...)
>> if (...) {
>>   // r0 used as temporary here
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   // only use of r1
>>   r2 = r1 + ...
>> }
>>
>> In this example, various optimizations might try to sink the texture2D
>> operation
>> into the else block, like so:
>>
>> if (...) {
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   r1 = texture2D(..., r0, ...)
>>   r2 = r1 + ...
>> }
>>
>> At this point, it starts to become clear that a problem can occur when two
>> neighbor threads want to take different paths through the if-else
>> construct.
>> Logically, the thread that wishes to execute the texture2D races with its
>> neighbor to reads the neighbor's value of r0 before it gets overridden.
>>
>> In most SPMD/SIMT implementations, the fallout of this races is exposed
>> via
>> the predicated expression of acyclic control flow:
>>
>> pred0 <- cmp ...
>> if (pred0)  r0 = ...
>> if (pred0)  r2 = r0 + ...
>> if (!pred0) r1 = texture2D(..., r0, ...)
>> if (!pred0) r2 = r1 + ...
>>
>> If thread 0 takes the else path and perform the texture2D operation, but
>> its neighbor thread 1 takes the then branch, then the texture2D will fail
>> because thread 1 has already overwritten its value of r0 before thread 0
>> has
>> a chance to read it.
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150814/8d743f2a/attachment.html>


More information about the llvm-dev mailing list