[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Mon Aug 10 02:16:25 PDT 2020

simoll added a reviewer: simoll.
simoll added a comment.

Some comments.

- It would be very helpful put the stress on the token value when it comes to the question which threads converge on the same static instruction. Does the token encode only the instance, does it include the static instruction 'id', ie can two instructions return the same token value?
- Not sure about `llvm.experimental.constrained.anchor`: ie, the current non-definition (it is the 'implementation-defined') invites in almost arbitrary semantics (depending on how the token value comes about).

================
Comment at: llvm/docs/ConvergentOperations.rst:202-203
+
+2. Executions of different static instructions always occur in different
+   dynamic instances.
+
----------------
I suppose this only refers to convergent instructions but it isn't clear to me from the wording: Does this constraint apply to all IR instructions or only those that are convergent?
(Only 4. explicitly mentions convergent operations)

================
Comment at: llvm/docs/ConvergentOperations.rst:212-214
+*Convergence tokens* are values of ``token`` type, i.e. they cannot be used in
+``phi`` or ``select`` instructions. A convergence token value represents the
+dynamic instance of the instruction that produced it.
----------------
This is actually super important and should probably go into the formal semantics: the token value represents the dynamic instance of the producing instruction.
If the token represents the dynamic instance **exactly** then this would also limit the freedom `llvm.experimental.convergence.anchor()` has. For example, this would rule out thread partitioning if it were so because then no token-producing instruction could return different token values per dynamic invocation.

================
Comment at: llvm/docs/ConvergentOperations.rst:245-246
+This intrinsic is a marker that acts as an "anchor" producing an initial
+convergence token. The set of threads executing the same dynamic instance of
+this intrinsic is implementation-defined.
+
----------------
t-tye wrote:
> See questions below. I had been assuming that the set of threads would be well defined by the source language and not be an implementation defined concept. I was thinking this is present to model source language semantics, not different target implementation approaches. I feel I am missing something.
+1

================
Comment at: llvm/docs/ConvergentOperations.rst:290-293
+The expectation is that for program "main" functions, such as kernel entry
+functions, whose caller is not visible to LLVM, the implementation returns a
+convergence token that represents uniform control flow, i.e. that is guaranteed
+to refer to all threads within a (target- or environment-dependent) group.
----------------
Not sure whether the expectation of uniformity makes sense here: there could be a caller with a non-uniform convergence token in a different module. This may only become apparent when everything is linked together.

Would this be a property of the calling convention of the kernel function (ie if it's a GPU kernel we know that the entry token is all-uniform).

================
Comment at: llvm/docs/ConvergentOperations.rst:339-343
+1. Let U be a static controlled convergent operation other than
+   :ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+   whose convergence token is produced by an instruction D. Two threads
+   executing U execute the same dynamic instance of U if and only if they
+   obtained the token value from the same dynamic instance of D.
----------------
Should suffice to say that they two threads will execute the same instance if they see the same token value.
Above you stated that the token value represents the dynamic instance of the defining instruction.

================
Comment at: llvm/docs/ConvergentOperations.rst:387-388
+
+4. If a convergence region contains a use of a convergence token, then it must
+   also contain its definition.
+
----------------
Isn't 4. implied by the fact that this is SSA and the convergence region consists of all blocks that are dominated by the definition?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603