[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Mon Apr 5 08:18:02 PDT 2021

sameerds added a comment.

In D85603#2422804 <https://reviews.llvm.org/D85603#2422804>, @jholewinski wrote:

> Few questions below. Please bear with me as I try to grok the proposal and the long stream of comments...
>
>> I believe that with this proposal, we can model this with the attributes we have by saying that `subgroupShuffle` is `convergent readnone`, while `__shfl_sync` is `inaccessiblememonly`.
>
> `subgroupShuffle` would require `convergentctrl` and `__shfl_sync` would not, correct?

The answer seems to depend on whether it is correct to say the following about `__shfl_sync`

1. The only constraint on control flow transformations around `__shfl_sync` is identical to the current definition of the `convergent` intrinsic.
2. Other optimizations such as DCE and reordering, that do not involve changes in the control flow, should be modelled using other constructs like `inaccessiblememonly`

If this is true, then yes, `__shfl_sync` doesn't need the new convergence control bundles.

>> There is a straightforward way for a frontend to insert these intrinsics while ensuring correctness and not overly constraining optimization.
>
> This feels like it could use a bit of discussion in the documentation, at least spelling out the straight-forward mapping for a simple C-like language with one example built-in that uses implicit thread masks. My understanding of this proposal implies the following rules:
>
> 1. Add call to `llvm.experimental.convergence.entry` to the beginning of every `convergent` function (generally assume the program entry-point is `convergent`)
> 2. Add call to `llvm.experimental.convergence.anchor` to the beginning of every non-`convergent` function
> 3. Add call to `llvm.experimental.convergence.loop` to the beginning of every natural loop header block
> 4. For each call to a `convergent` function (intrinsic or other-wise), attach `convergencectrl` bundle pointing to the closest call to `entry`/`anchor`/`loop`, in terms of nesting
>
> Is this correct for general structured code, or am I missing some case?

Right. These //heuristics// are actually proposed in a related patch under review:
https://reviews.llvm.org/D85609

In particular, the above pass is expected to do the right thing when working with cross-thread operations that are "textually aligned" (for example, SPIRV, OpenCL, HIP, and CUDA before sm70).

> Things are less clear when you consider odd looping structures, for example:
> [snip]
> Ignoring the new intrinsics, the `subgroupOp` calls may have defined semantics between groups of threads that reach the two loop tail blocks together, at least if you guarantee maximal convergence. When you add in the proposed intrinsics, what do the `subgroupOp` calls anchor to?

I believe the question needs to be turned around: what do you want `subgroupOp` to anchor to? In general, it should be impossible to infer the correct operand bundles from arbitrary LLVM IR, else we could just express everything as an analysis instead of having these explicit markers. The intention of these markers is for a frontend to be able to produce constraints that cannot be expressed using just the structure of the program.

> Is there any concern over implementing and validating all of the necessary logic in all CFG-altering optimization passes? This seems like a large change that will require modifications through-out the existing optimization passes in order to achieve the goal of "ensuring correctness and not overly constraining optimization". I'm just curious if there is any plan being formulated for tackling this issue.

These intrinsics are expected to be very useful for new techniques being worked out in the AMDGPU backend. The following reviews start off the changes required in the optimizer:
https://reviews.llvm.org/D85604
https://reviews.llvm.org/D85605
https://reviews.llvm.org/D85606

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603