[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

Nicolai Hähnle via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 31 13:49:41 PDT 2016


On 31.10.2016 19:37, Justin Lebar wrote:
> (I work on CUDA / PTX.)
>
> For one thing I'm in favor of having fewer annotations rather than
> more, so if we can do this in a reasonable way without introducing the
> notion of co-convergent calls, I think that would be a win.  The one
> convergent annotation is difficult enough for the GPU folks to grok
> and then keep in cache, and everyone who works on llvm has to pay the
> cost of keeping their passes compliant with these annotations.

I know. I'd love to find a simpler way to express this constraint, so 
any ideas are welcome :)


> I will think on a way to formalize this "uniform" attribute -- I don't
> have any good ideas at the moment.  Are you going to be at the llvm
> dev meeting this week?  We might be able to make progress
> brainstorming this in person.

Unfortunately I won't be there.

Nicolai


>
> -Justin
>
> On Wed, Oct 26, 2016 at 7:54 AM, Nicolai Hähnle via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> On 25.10.2016 16:28, Nicolai Hähnle wrote:
>>>
>>> But I fear that this path leads to eternal fuzziness. Let me try a
>>> completely different approach to define what we need by augmenting the
>>> semantics of IR with "divergence tokens". In addition to its usual
>>> value, every IR value carries a "divergence set" of divergence tokens.
>>>
>>> The basic rule is: the divergence set of a value is (at least) the union
>>> of the divergence sets of its operands.
>>>
>>> Every function input carries a unique divergence token.
>>>
>>> For phi-nodes, the divergence set also includes the branch conditions
>>> that can affect which predecessor value is taken. (A simpler, more
>>> conservative rule, would be to add a unique divergence token; this can
>>> be per-basic block.)
>>>
>>> Loads add a unique divergence token. (This is a very conservative rule
>>> mostly required for per-thread-allocas. It could be relaxed quite a bit
>>> by treating per-thread-memory like SSA-values augmented with divergence
>>> sets.)
>>>
>>> Atomic operations add a unique divergence token.
>>>
>>> Additional target-specific intrinsics may also add a divergence token.
>>>
>>> By transitivity, most function calls have to add a unique divergence
>>> token.
>>>
>>> The first restriction on the transformation of a call to a function with
>>> 'uniform' function argument is then:
>>>
>>> 1. The corresponding argument of the call can be changed only if the
>>> change preserves (or shrinks) the divergence set (on top of that, the
>>> new value obviously has to be equal to the old one in single threaded
>>> semantics).
>>
>>
>> Turns out that this isn't actually sufficient. Consider this (admittedly
>> convoluted) example:
>>
>>    idx = input & 1
>>    v0 = texelFetch(s[idx & 0], ...)
>>    v1 = texelFetch(s[idx | 1], ...)
>>    cond = trunc idx to i1
>>    v = select i1 cond, v1, v0
>>
>> transformed into
>>
>>    idx = in & 1
>>    v = texelFetch(s[idx], ...)
>>
>> has the same divergence set for the argument to texelFetch as defined in the
>> quote above, but the transformation is incorrect.
>>
>> So it's pretty clear what we want: We want a way to flag certain function
>> arguments in a way that forbids transformations of the form select(call ...
>> , call ...) -> call (select ..., ...), ...
>>
>> But it would be nice to have a clear definition of _why_ those
>> transformations must be forbidden. It's not clear how to do that without
>> pulling in a full model of SIMT-style parallel execution, and admittedly I
>> don't think we have a sane model for _that_ in the first place :-(
>>
>> Something that at least partially addresses the SIMT-style semantics: For
>> every pair (initial state, function inputs) and every call site of the
>> relevant function, keep a log of function arguments with which the function
>> has been called.
>>
>> The restriction on program transformations is roughly: two pairs A and B of
>> (initial state, function inputs) are compatible when run on the original
>> program if at each call site the log of A is a subsequence of the log of B
>> or vice versa. Then every pair A,B that is compatible when run on the
>> original program must also be compatible on the transformed program.
>>
>> This isn't a great definition either, but it gets closer to the actual
>> SIMT-semantics. Any additional ideas would be very much appreciated.
>>
>> Thanks,
>> Nicolai
>>
>>
>>>
>>> With this definition,
>>>
>>> def @fn(s0, s1, cond) {
>>>   v0 = texelFetch(s0, ...)
>>>   v1 = texelFetch(s1, ...)
>>>   v = select i1 cond, v0, v1
>>> }
>>>
>>> cannot be transformed to
>>>
>>> def @fn(s0, s1, cond) {
>>>   s = select i1 cond, s0, s1
>>>   v = texelFetch(s, ...)
>>> }
>>>
>>> because s has a larger divergence set.
>>>
>>> On the other hand, at least the InstCombine transforms should all be
>>> safe because they locally transform a DAG of values in a way that can
>>> only shrink the set of predecessors of the locally transformed region of
>>> the DAG.
>>>
>>> Transforms that modify the CFG could be problematic especially with the
>>> simpler phi-node-rule, e.g. when a new basic block is created, you end
>>> up introducing new divergence tokens for "downstream" values. That's
>>> probably not a problem with the first definition for phi-nodes, though
>>> I'm not sure.
>>>
>>> .....
>>> The above definition still allows the transformation of
>>>
>>>   masked &= 1
>>>   if (masked == 1) {
>>>     v1 = texelFetch(array[masked], ...)
>>>   } else {
>>>     v0 = texelFetch(array[masked], ...)
>>>   }
>>>   v = phi(v1, v0)
>>>
>>> to
>>>
>>>   masked &= 1
>>>   v = texelFetch(array[masked], ...);
>>>
>>> which is incorrect. So for a set of divergence tokens D, let's say A
>>> D-dominates B if the following holds: there exists a way of fixing the
>>> branching directions of conditional branches whose condition's
>>> divergence set is disjoint from D, such that A dominates B in the
>>> resulting CFG. Similarly for D-post-domination.
>>>
>>> If D \subset D', then D'-domination implies D-domination.
>>>
>>> We can then formulate the second condition as:
>>>
>>> 2a. A function call with a uniform argument with divergence set D can be
>>> moved from location A to location B if A D-dominates or D-post-dominates
>>> B.
>>>
>>> 2b. Simpler but more conservatively, just treat the function call as
>>> co-convergent (i.e., it can only be moved to positions that are
>>> dominated or post-dominated by its current position in the usual sense.)
>>>
>>> .....
>>> The full definition of D-domination is probably too involved to be
>>> turned into practical algorithms. But with this model of divergence
>>> sets, it should be possible to think of DivergenceAnalysis as an
>>> analysis that proves divergence sets of values to be empty (using
>>> additional target specific information to be less conservative).
>>> Combined with DivergenceAnalysis, (1)+(2a) still leave room for some
>>> useful and practical optimizations of functions with uniform arguments
>>> even across basic blocks.
>>>
>>> In practical terms for a first version, the effect of the attribute
>>> should just be to prevent the two transformations shown above.
>>>
>>> Thanks,
>>> Nicolai
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list