[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Tue Oct 25 07:28:53 PDT 2016
On 25.10.2016 01:17, Mehdi Amini wrote:
>
>> On Oct 24, 2016, at 4:15 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>
>> On 25.10.2016 01:11, Nicolai Hähnle wrote:
>>> On 24.10.2016 21:54, Mehdi Amini wrote:
>>>>> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev
>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> Some brain-storming on an issue with SPMD/SIMT backend support where
>>>>> I think some additional IR attributes would be useful. Sorry for the
>>>>> somewhat long mail; the short version of my current thinking is that
>>>>> I would like to have the following:
>>>>>
>>>>> 1) convergent: a call to a function with this attribute cannot be
>>>>> moved to have additional control dependencies; i.e., moving it from A
>>>>> to B is only possible if B dominates or post-dominates A.
>>>>>
>>>>> 2) co-convergent (divergent? for lack of a better name...): a call to
>>>>> a function with this attribute cannot be moved to have _fewer_
>>>>> control dependencies; i.e., moving it from A to B is only possible if
>>>>> A dominates or post-dominates B.
>>>>>
>>>>> 3) uniform (for function arguments): transformations are not allowed
>>>>> to introduce additional non-uniformity in this argument.
>>>>
>>>> Can you describe it in terms that are non-SPMD/SIMT?
>>>> I.e. I’m not sure that “uniformity” refers to an existing LLVM IR
>>>> concept.
>>>
>>> Yeah, that's actually the key problem I've been struggling with.
>>>
>>> The first example I sent shows the gist of it. It also shows that the
>>> concept can't be expressed in terms of the CFG, which makes this tricky.
>>>
>>> In a way it's the data-flow analog of the convergent attribute: the
>>> argument cannot be changed to have additional dependencies in its
>>> computation. That captures the basic intention, but I'm not completely
>>> sure that it works.
>>
>> One big question is whether there are transformations in LLVM today that replace a value %a with a value %b, where %b "has additional dependencies in its computation". I can't think of anything obvious where that would be the case, but I'm not sure.
>
> What do you mean by “additional dependencies in its computation" here?
Minus phi-nodes, loads, and intrinsic calls, it could be: look at the
directed graph where %a -> %b if %a is an operand of the instruction
that defines %b. Take all nodes of that graph that do not have predecessors.
But I fear that this path leads to eternal fuzziness. Let me try a
completely different approach to define what we need by augmenting the
semantics of IR with "divergence tokens". In addition to its usual
value, every IR value carries a "divergence set" of divergence tokens.
The basic rule is: the divergence set of a value is (at least) the union
of the divergence sets of its operands.
Every function input carries a unique divergence token.
For phi-nodes, the divergence set also includes the branch conditions
that can affect which predecessor value is taken. (A simpler, more
conservative rule, would be to add a unique divergence token; this can
be per-basic block.)
Loads add a unique divergence token. (This is a very conservative rule
mostly required for per-thread-allocas. It could be relaxed quite a bit
by treating per-thread-memory like SSA-values augmented with divergence
sets.)
Atomic operations add a unique divergence token.
Additional target-specific intrinsics may also add a divergence token.
By transitivity, most function calls have to add a unique divergence token.
The first restriction on the transformation of a call to a function with
'uniform' function argument is then:
1. The corresponding argument of the call can be changed only if the
change preserves (or shrinks) the divergence set (on top of that, the
new value obviously has to be equal to the old one in single threaded
semantics).
With this definition,
def @fn(s0, s1, cond) {
v0 = texelFetch(s0, ...)
v1 = texelFetch(s1, ...)
v = select i1 cond, v0, v1
}
cannot be transformed to
def @fn(s0, s1, cond) {
s = select i1 cond, s0, s1
v = texelFetch(s, ...)
}
because s has a larger divergence set.
On the other hand, at least the InstCombine transforms should all be
safe because they locally transform a DAG of values in a way that can
only shrink the set of predecessors of the locally transformed region of
the DAG.
Transforms that modify the CFG could be problematic especially with the
simpler phi-node-rule, e.g. when a new basic block is created, you end
up introducing new divergence tokens for "downstream" values. That's
probably not a problem with the first definition for phi-nodes, though
I'm not sure.
.....
The above definition still allows the transformation of
masked &= 1
if (masked == 1) {
v1 = texelFetch(array[masked], ...)
} else {
v0 = texelFetch(array[masked], ...)
}
v = phi(v1, v0)
to
masked &= 1
v = texelFetch(array[masked], ...);
which is incorrect. So for a set of divergence tokens D, let's say A
D-dominates B if the following holds: there exists a way of fixing the
branching directions of conditional branches whose condition's
divergence set is disjoint from D, such that A dominates B in the
resulting CFG. Similarly for D-post-domination.
If D \subset D', then D'-domination implies D-domination.
We can then formulate the second condition as:
2a. A function call with a uniform argument with divergence set D can be
moved from location A to location B if A D-dominates or D-post-dominates B.
2b. Simpler but more conservatively, just treat the function call as
co-convergent (i.e., it can only be moved to positions that are
dominated or post-dominated by its current position in the usual sense.)
.....
The full definition of D-domination is probably too involved to be
turned into practical algorithms. But with this model of divergence
sets, it should be possible to think of DivergenceAnalysis as an
analysis that proves divergence sets of values to be empty (using
additional target specific information to be less conservative).
Combined with DivergenceAnalysis, (1)+(2a) still leave room for some
useful and practical optimizations of functions with uniform arguments
even across basic blocks.
In practical terms for a first version, the effect of the attribute
should just be to prevent the two transformations shown above.
Thanks,
Nicolai
More information about the llvm-dev
mailing list