[PATCH] D149348: RFD: Do not CSE convergent calls in different basic blocks

Fri Apr 28 06:31:31 PDT 2023

nhaehnle added inline comments.

================
Comment at: llvm/lib/IR/Instruction.cpp:578
+    // different basic blocks.
+    if (CI->isConvergent() && CI->getParent() != I->getParent())
+      return false;
----------------
sameerds wrote:
> foad wrote:
> > ruiling wrote:
> > > foad wrote:
> > > > arsenm wrote:
> > > > > I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block
> > > > Surely it's correct? You could even CSE them in different blocks if there is no divergent control flow between them. When we have convergence tokens perhaps this will become trivial: you can CSE them iff they use the convergence token.
> > > > I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block
> > > 
> > > Very good question. CSE convergent operations in the same block may still be helpful if there are several subgroup operations(like mbcnt(ballot)) in one block.
> > > 
> > > Regarding to whether it is correct to CSE convergent operation. I think into two cases:
> > > 1. WWM operations with normal operations. As the inputs to the WWM operation are already `set.inactive`ed, so we will not do wrong CSE between WWM operation and normal operation.
> > > 2. convergent operations before and after amdgcn.kill. A ballot before an amdgcn.kill obviously cannot be CSEed with another one after the kill.
> > > 
> > > So I think we should not try to CSE **ANY** convergent operation up till now.
> > > Last time I think into kill, my idea is we need to add a new type of basic block terminator for it. But I have not go into the details of how to make it work with other parts of our compiler.
> > > So I think we should not try to CSE ANY convergent operation up till now.
> > 
> > I would like to push the current patch because I think it is justified by the target-independent meaning of "convergent".
> > 
> > The problem with amdgcn.kill is very AMDGPU-specific, so I think ideally we should find an AMDGPU-specific way to fix it, instead of pessimizing target-independent code.
> I kinda agree with that sentiment. Is it correct to assume that the kill intrinsic is itself experimental, and its semantics needs to be worked out for convergent operations? Shouldn't that happen before using the intrinsic in production?
I'd say the change should go ahead as-is. `kill` is AMDGPU-specific, and in any case, it looks like this change doesn't make the situation regarding `kill` worse than it is today.

Further, I agree with @ruiling that `kill` should conceptually be a terminator. We should consider just using it with `callbr`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149348/new/

https://reviews.llvm.org/D149348