[llvm-dev] [LLVMdev] RFC: Convergent attribute

Fri Aug 14 17:22:52 PDT 2015

Loop unrolling.

On Fri, Aug 14, 2015 at 4:28 PM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi James,
>
> That sounds reasonable to me. Any idea of a transformation that would want
> to do that though?
>
> —
> Mehdi
>
> On Aug 14, 2015, at 12:56 AM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
> Hi Mehdi,
>
> My reading of it is that if you have a convergent instruction A, it is
> legal to duplicate it to instruction B if (assuming B is after A in program
> flow) A dominates B and B post-dominates A.
>
> James
>
> On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi Jingyue,
>>
>> Convergent is not intended to prevent inlining.  It’s tricky to formalize
>> this inter-procedurally, but the intended interpretation is that a
>> convergent operation cannot be move either into or out of a conditionally
>> executed region.  Normal inlining would not violate that.
>>
>> I would imagine that it would make sense to use a combination of
>> convergent and noduplicate for barrier-like operations.
>>
>>
>> Isn’t convergent implying “noduplicate” inside a function?
>> It’s late but I’m not sure I can figure when a transformation would be
>> allowed to duplicate a call to a convergent intrinsic?
>>
>> —
>> Mehdi
>>
>>
>>
>> —Owen
>>
>> On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com> wrote:
>>
>> Hi Owen,
>>
>> According to your design, is LLVM supposed to (partially) disallow
>> inlining a function that has convergent instructions? It's hard to define
>> control equivalent inter-procedurally. For example, if a function
>> containing a convergent instruction is called at two call sites, inlining
>> the function produces two convergent instructions. Neither of the two is
>> control equivalent to the original, but they combined are in some sense.
>>
>> I came across this when I am thinking whether __syncthreads in CUDA
>> should be tagged "convergent'. Right now, it's tagged as noduplicate so
>> inlining and loop unrolling are disallowed. But I think noduplicate is too
>> strong for the semantics of convergent.
>>
>> Jingyue
>>
>> On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com> wrote:
>>
>>> Below is a proposal for a new "convergent" intrinsic attribute and
>>> MachineInstr property, needed for correctly modeling many SPMD/SIMT
>>> programming models in LLVM.  Comments and feedback welcome.
>>>
>>> —Owen
>>>
>>>
>>>
>>>
>>>
>>> In order to make LLVM more suitable for programming models variously
>>> called SPMD
>>> and SIMT, we would like to propose a new intrinsic and MachineInstr
>>> annotation
>>> called "convergent", which will be used to impose certain control flow
>>> and/or
>>> code motion constraints that are necessary for the correct compilation
>>> of some
>>> common constructs in these programming models.
>>>
>>> Our proposal strives to define the semantics of these annotations
>>> *without*
>>> introducing a definition of SPMD/SIMT programming models into LLVM IR.
>>> Rather,
>>> the properties that must be preserved are specified purely in terms of
>>> single
>>> thread semantics.  This allows pass authors to reason about the
>>> constraints
>>> without having to consider alternative programming models.  The downside
>>> to
>>> this approach is that the motivation and necessity of these constraints
>>> in not
>>> easily understood without understanding the programming model from which
>>> they
>>> derive.
>>>
>>> *** WHAT ***
>>>
>>> (Thanks to Phil Reames for input on this definition.)
>>>
>>> An operation marked convergent may be transformed or moved within the
>>> program
>>> if and only the post-transform placement of the convergent operation is
>>> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
>>> its original position.
>>>
>>> This definition is overly strict with respect to some SPMD/SIMT models,
>>> but cannot be relaxed without introducing a specific model into LLVM IR.
>>> We
>>> believe it is important for LLVM itself to remain agnostic to any
>>> specific
>>> model.  This allows core passes to preserve correctness for stricter
>>> models,
>>> while more relaxed models can implement additional transforms that use
>>> weaker constraints on top of core LLVM.
>>>
>>> *** HOW ***
>>>
>>> Once the attribute has been added, we anticipate the following changes to
>>> optimization passes will be required:
>>>   - Restrict Sink and MachineSink for convergent operations
>>>   - Disabling PRE for convergent operations
>>>   - Disabling jump threading of convergent operations
>>>   - Auditing SimplifyCFG for additional transforms that break convergent
>>> guarantees
>>>
>>> *** WHY ***
>>>
>>> SPMD/SIMT programming models are a family of related programming models
>>> in
>>> which multiple threads execute in a per-instruction lockstep fashion.
>>> Predication is typically used to implement acyclic control flow that
>>> would
>>> otherwise diverge the PC address of the lockstep threads.
>>>
>>> In these models, each thread's register set is typically indepedent, but
>>> there
>>> exist a small number of important circumstances in which a thread may
>>> access
>>> register storage from one of its lockstep neighbors.  Examples include
>>> gradient
>>> computation for texture lookups, as well a cross-thread broadcast and
>>> shuffle
>>> operations.
>>>
>>> These operations that provide access to another thread's register
>>> storage pose
>>> a particular challenge to the compiler, particularly when combined with
>>> the
>>> use of predication for control flow.  Consider the following example:
>>>
>>> // texture lookup that computes gradient of r0, last use of r0
>>> r1 = texture2D(..., r0, ...)
>>> if (...) {
>>>   // r0 used as temporary here
>>>   r0 = ...
>>>   r2 = r0 + ...
>>> } else {
>>>   // only use of r1
>>>   r2 = r1 + ...
>>> }
>>>
>>> In this example, various optimizations might try to sink the texture2D
>>> operation
>>> into the else block, like so:
>>>
>>> if (...) {
>>>   r0 = ...
>>>   r2 = r0 + ...
>>> } else {
>>>   r1 = texture2D(..., r0, ...)
>>>   r2 = r1 + ...
>>> }
>>>
>>> At this point, it starts to become clear that a problem can occur when
>>> two
>>> neighbor threads want to take different paths through the if-else
>>> construct.
>>> Logically, the thread that wishes to execute the texture2D races with its
>>> neighbor to reads the neighbor's value of r0 before it gets overridden.
>>>
>>> In most SPMD/SIMT implementations, the fallout of this races is exposed
>>> via
>>> the predicated expression of acyclic control flow:
>>>
>>> pred0 <- cmp ...
>>> if (pred0)  r0 = ...
>>> if (pred0)  r2 = r0 + ...
>>> if (!pred0) r1 = texture2D(..., r0, ...)
>>> if (!pred0) r2 = r1 + ...
>>>
>>> If thread 0 takes the else path and perform the texture2D operation, but
>>> its neighbor thread 1 takes the then branch, then the texture2D will fail
>>> because thread 1 has already overwritten its value of r0 before thread 0
>>> has
>>> a chance to read it.
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150814/398d158a/attachment.html>