[PATCH] D68994: [RFC] Redefine `convergent` in terms of dynamic instances

Wed Oct 30 07:00:49 PDT 2019

nhaehnle marked 2 inline comments as done.
nhaehnle added a comment.

Thank you for your detailed comments!

In D68994#1725542 <https://reviews.llvm.org/D68994#1725542>, @kariddi wrote:

> 1. Changing the convergent attribute from being sink prevention only to both sink/hoist prevention (which should impact CSE as well I believe even if I didn't see any change on that regard) would have some performance impact. Considering there's another proposal with respect to make convergent the default from Matt, how do we see this impacting performance. Should we have two (or more?) attributes instead? For example quad-scoped texture operations don't really care about being hoisted/cse'd up, but they care about being sinked.

That's a good point. Something like `allowconvergence` and `allowdivergence`?

> 2. llvm.convergence.join is marked IntNoMem. Its not clear to me how is it gonna survive instcombine considering that is not emitting any value , so nothing is in place to keep it alive. What's the mechanism that's keeping it around?

You're right. The immediate answer to your question is that I missed marking the intrinsics as `nounwind`. Once I do that, they are indeed removed like you say. Any ideas on that?

> 3. Is there a plan on how we are gonna figure out all the optimizations that need to be updated and in particular, how are we gonna make sure new optimizations (probably written with single threaded in mind) are not gonna break our invariants. (Basically how do we make everybody aware this exists and make them think about it). In particular there could be classes of optimizations that work uniquely on CFG that now would maybe have to look "inside the blocks" before transforming because they need to make sure there's no convergent instruction/call while operating over the blocks they are considering.

I don't have a good answer for that, unfortunately. In general, education about `convergent` is the only answer that comes to mind on the first half. On the second half, I did go to some length to think about whether transforms need to worry about "spooky action at a distance", i.e. whether they need to scan code for `convergent` that they would not otherwise have to scan, and I believe this is largely not a problem (though I cannot prove it). This is summarized in the "Correctness" section of the new document.

================
Comment at: llvm/docs/DynamicInstances.rst:229
+
+  void @llvm.convergence.point(token %anchor) convergent readnone
+
----------------
simoll wrote:
> nhaehnle wrote:
> > arsenm wrote:
> > > I don't fully understand what the 'point' of this one is. What is the point of ensuring nonreconvergence? Which real example does this correspond to?
> > It allows us to mark regions of code that are semantically part of a loop for the purposes of convergence even when they're not part of the loop as far as loop analysis is concerned. This applies to high-level code such as the following:
> > ```
> > int divergent_key = ...;
> > int v = ...;
> > int sum;
> > for (;;;) {
> >   tok = @llvm.convergence.anchor()
> >   int uniform_key = readfirstlane(divergent_key);
> >   if (uniform_key == divergent_key) {
> >     sum = subgroup_reduce_add(v);
> >     @llvm.convergence.point(tok)
> >     break;
> >   }
> > }
> > ```
> > The point indicates that no reconvergence should happen for the execution of the reduce operation (or in a sense, that the reduce operation should not be moved out of the loop).
> > 
> > Points are in some sense redundant, as the code above could be equivalently rewritten as:
> > ```
> > int divergent_key = ...;
> > int v = ...;
> > int sum;
> > for (;;;) {
> >   tok = @llvm.convergence.anchor()
> >   int uniform_key = readfirstlane(divergent_key);
> >   if (uniform_key == divergent_key) {
> >     sum = subgroup_reduce_add(v);
> >   }
> >   @llvm.convergence.join(tok)
> >   if (uniform_key == divergent_key)
> >     break;
> > }
> > ```
> > Though that reformulation loses some information about control-flow, in particular it leads to more conservative live ranges.
> Couldn't you align the conceptual convergence point with the loop structure by adding a `@llvm.always.true` intrinsic like so:
> 
> 
> ```
> int divergent_key = ...;
> int v = ...;
> int sum;
> for (;;;) {
>   int uniform_key = readfirstlane(divergent_key);
>   if (uniform_key == divergent_key) {
>     sum = subgroup_reduce_add(v);
>     %we.know.its.true = call @llvm.always.true()
>    br i1 %we.know.its.true, %loopExitBlock, %nextBlockInTheLoop
>   }
> }
> ```
> 
> That way you do not need the intrinsic and the DA/SDA already assume that threads only synchronize at (`LoopInfo`) loop exits.
I would like to avoid having to insert fake edges in the CFG: it might trip up generic transforms.

================
Comment at: llvm/lib/Transforms/Utils/SimplifyCFG.cpp:1474
+      // with which the convergent operation communicates.
+      if (C->isConvergent())
+        return false;
----------------
nhaehnle wrote:
> kariddi wrote:
> > Is this the only change in SimplifyCFG we would have to do?
> > 
> > What about FoldBranchToCommonDest() ? It seems to be hoisting instructions into the predecessors , but it just checks for the Speculable attribute (through isSafeToSpeculativelyExecute()) and seem to ignore the Convergent attribute.
> > 
> > In general how much the new behavior has been vetted around LLVM passes with respect respecting hoisting or changes in thread execution mask?
> I've fixed the places that I was aware of based on bugs we've encountered over the years. I wouldn't be surprised if there were more places like you say...
Giving this some more thought, it does raise an interesting question: is there a legitimate use for a function that is `speculatable convergent`?

It seems initially strange, given that speculation changes the set of threads calling the function, which goes counter to the whole point of `convergent`. But then, as you mentioned in your later comment and I've mentioned in D69498, we'll want to relax convergent. Something like a subgroup shuffle could perhaps be `speculatable convergent allowconvergence`, indicating that it can be hoisted but not sunk.

I'm going to add some comments to that effect.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68994/new/

https://reviews.llvm.org/D68994