[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Wed Oct 26 07:54:54 PDT 2016
On 25.10.2016 16:28, Nicolai Hähnle wrote:
> But I fear that this path leads to eternal fuzziness. Let me try a
> completely different approach to define what we need by augmenting the
> semantics of IR with "divergence tokens". In addition to its usual
> value, every IR value carries a "divergence set" of divergence tokens.
>
> The basic rule is: the divergence set of a value is (at least) the union
> of the divergence sets of its operands.
>
> Every function input carries a unique divergence token.
>
> For phi-nodes, the divergence set also includes the branch conditions
> that can affect which predecessor value is taken. (A simpler, more
> conservative rule, would be to add a unique divergence token; this can
> be per-basic block.)
>
> Loads add a unique divergence token. (This is a very conservative rule
> mostly required for per-thread-allocas. It could be relaxed quite a bit
> by treating per-thread-memory like SSA-values augmented with divergence
> sets.)
>
> Atomic operations add a unique divergence token.
>
> Additional target-specific intrinsics may also add a divergence token.
>
> By transitivity, most function calls have to add a unique divergence token.
>
> The first restriction on the transformation of a call to a function with
> 'uniform' function argument is then:
>
> 1. The corresponding argument of the call can be changed only if the
> change preserves (or shrinks) the divergence set (on top of that, the
> new value obviously has to be equal to the old one in single threaded
> semantics).
Turns out that this isn't actually sufficient. Consider this (admittedly
convoluted) example:
idx = input & 1
v0 = texelFetch(s[idx & 0], ...)
v1 = texelFetch(s[idx | 1], ...)
cond = trunc idx to i1
v = select i1 cond, v1, v0
transformed into
idx = in & 1
v = texelFetch(s[idx], ...)
has the same divergence set for the argument to texelFetch as defined in
the quote above, but the transformation is incorrect.
So it's pretty clear what we want: We want a way to flag certain
function arguments in a way that forbids transformations of the form
select(call ... , call ...) -> call (select ..., ...), ...
But it would be nice to have a clear definition of _why_ those
transformations must be forbidden. It's not clear how to do that without
pulling in a full model of SIMT-style parallel execution, and admittedly
I don't think we have a sane model for _that_ in the first place :-(
Something that at least partially addresses the SIMT-style semantics:
For every pair (initial state, function inputs) and every call site of
the relevant function, keep a log of function arguments with which the
function has been called.
The restriction on program transformations is roughly: two pairs A and B
of (initial state, function inputs) are compatible when run on the
original program if at each call site the log of A is a subsequence of
the log of B or vice versa. Then every pair A,B that is compatible when
run on the original program must also be compatible on the transformed
program.
This isn't a great definition either, but it gets closer to the actual
SIMT-semantics. Any additional ideas would be very much appreciated.
Thanks,
Nicolai
>
> With this definition,
>
> def @fn(s0, s1, cond) {
> v0 = texelFetch(s0, ...)
> v1 = texelFetch(s1, ...)
> v = select i1 cond, v0, v1
> }
>
> cannot be transformed to
>
> def @fn(s0, s1, cond) {
> s = select i1 cond, s0, s1
> v = texelFetch(s, ...)
> }
>
> because s has a larger divergence set.
>
> On the other hand, at least the InstCombine transforms should all be
> safe because they locally transform a DAG of values in a way that can
> only shrink the set of predecessors of the locally transformed region of
> the DAG.
>
> Transforms that modify the CFG could be problematic especially with the
> simpler phi-node-rule, e.g. when a new basic block is created, you end
> up introducing new divergence tokens for "downstream" values. That's
> probably not a problem with the first definition for phi-nodes, though
> I'm not sure.
>
> .....
> The above definition still allows the transformation of
>
> masked &= 1
> if (masked == 1) {
> v1 = texelFetch(array[masked], ...)
> } else {
> v0 = texelFetch(array[masked], ...)
> }
> v = phi(v1, v0)
>
> to
>
> masked &= 1
> v = texelFetch(array[masked], ...);
>
> which is incorrect. So for a set of divergence tokens D, let's say A
> D-dominates B if the following holds: there exists a way of fixing the
> branching directions of conditional branches whose condition's
> divergence set is disjoint from D, such that A dominates B in the
> resulting CFG. Similarly for D-post-domination.
>
> If D \subset D', then D'-domination implies D-domination.
>
> We can then formulate the second condition as:
>
> 2a. A function call with a uniform argument with divergence set D can be
> moved from location A to location B if A D-dominates or D-post-dominates B.
>
> 2b. Simpler but more conservatively, just treat the function call as
> co-convergent (i.e., it can only be moved to positions that are
> dominated or post-dominated by its current position in the usual sense.)
>
> .....
> The full definition of D-domination is probably too involved to be
> turned into practical algorithms. But with this model of divergence
> sets, it should be possible to think of DivergenceAnalysis as an
> analysis that proves divergence sets of values to be empty (using
> additional target specific information to be less conservative).
> Combined with DivergenceAnalysis, (1)+(2a) still leave room for some
> useful and practical optimizations of functions with uniform arguments
> even across basic blocks.
>
> In practical terms for a first version, the effect of the attribute
> should just be to prevent the two transformations shown above.
>
> Thanks,
> Nicolai
More information about the llvm-dev
mailing list