[PATCH] D149348: RFD: Do not CSE convergent calls in different basic blocks

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 28 03:19:39 PDT 2023


foad added inline comments.


================
Comment at: llvm/lib/IR/Instruction.cpp:578
+    // different basic blocks.
+    if (CI->isConvergent() && CI->getParent() != I->getParent())
+      return false;
----------------
ruiling wrote:
> foad wrote:
> > arsenm wrote:
> > > I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block
> > Surely it's correct? You could even CSE them in different blocks if there is no divergent control flow between them. When we have convergence tokens perhaps this will become trivial: you can CSE them iff they use the convergence token.
> > I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block
> 
> Very good question. CSE convergent operations in the same block may still be helpful if there are several subgroup operations(like mbcnt(ballot)) in one block.
> 
> Regarding to whether it is correct to CSE convergent operation. I think into two cases:
> 1. WWM operations with normal operations. As the inputs to the WWM operation are already `set.inactive`ed, so we will not do wrong CSE between WWM operation and normal operation.
> 2. convergent operations before and after amdgcn.kill. A ballot before an amdgcn.kill obviously cannot be CSEed with another one after the kill.
> 
> So I think we should not try to CSE **ANY** convergent operation up till now.
> Last time I think into kill, my idea is we need to add a new type of basic block terminator for it. But I have not go into the details of how to make it work with other parts of our compiler.
> So I think we should not try to CSE ANY convergent operation up till now.

I would like to push the current patch because I think it is justified by the target-independent meaning of "convergent".

The problem with amdgcn.kill is very AMDGPU-specific, so I think ideally we should find an AMDGPU-specific way to fix it, instead of pessimizing target-independent code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149348/new/

https://reviews.llvm.org/D149348



More information about the llvm-commits mailing list