[PATCH] D85603: IR: Add convergence control operand bundle and intrinsics

Thu Apr 8 07:06:17 PDT 2021

Anastasia added a comment.

>> - Is it clear how to modify clang's CUDA frontend to use this new form?
>>
>> It's not perfectly clear to me how to do this.  Is it as simple as saying, loops always have a `convergent.loop()` intrinsic at the top, functions always have `convergent.entry()` at the top, and that's it?  If you &co aren't planning to do this work (I know the CUDA frontend shares a lot of code with the HIP frontend), I'd want to be sure that the people who *are* going to do this work (@tra?) are clear on what needs to be done and think it's possible.
>
> There are two kinds of answers to this. One is that you can only //really// know how the frontend should be modified once you've established what the high-level language semantics ought to be. Part of why I'm doing this work is to enable us to experiment with this kind of question and verify our understanding what this should look like (I'm going to caveat this with saying that I'm coming at it from the graphics side).
>
> The other kind of answer is that for most but not all constructs, there's a pretty natural answer that boils down pretty much to what you wrote. Of course it generally breaks down in the face of `goto`, for example. I have a follow-on patch, D85609 <https://reviews.llvm.org/D85609>, which adds a pass that does this kind of insertion on top of LLVM IR. I'd appreciate your review on that if you find the time -- I think what it tries to do is fairly natural, but it is a bit more work to dig through. A reasonable first step for someone working on the CUDA frontend would be to insert that pass early in the pass pipeline. Longer term, it may be necessary to insert them directly during IR generation, but this at least partially depends on the high-level language semantics question.

Regarding the HLL and frontend side, I believe this could be represented fairly similarly in different C/C++-based languages - considering that we already follow the same implementation for existing convergent semantics at least between CUDA and OpenCL. However, it isn't yet in its optimal state and perhaps we can attempt to refine this topic holistically for example also addressing the following rework that removes the need to make everything convergent: https://reviews.llvm.org/D69498. Otherwise, we will likely have to generate the convergent intrinsics absolutely everywhere, which is not ideal!

Looking at the wording in some parts of your convergent semantics definition there might be options resulting in some tradeoff between tooling complexity and optimization opportunities:

  +The
  +:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
  +intrinsic is typically expected to appear in the header of a natural loop.
  +However, it can also appear in non-header blocks of a loop. In that case, the
  +loop can generally not be unrolled.

I understand this is not in the scope of this work. And I think it is perfectly reasonable to provide experimental support that could help with further evaluation and productization too. However, it would be good to make some preliminary assessment for the frontend support rather soon. What I think could speed up the progress on the frontend/HLL is some sort of description about the conditions where the new intrinsics have to be inserted. My understanding is that the plan is not to expose them to the application code that would require educating the application developers about all the low-level details? Looking at your transformation pass in https://reviews.llvm.org/D69498 it seems that adding those automatically should somehow be possible and you already have some rules defined where and how those can be added? But there are certain things that can be done in IR that are very constrained in AST as it makes Parsing more complicated.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85603/new/

https://reviews.llvm.org/D85603