[PATCH] D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 19 00:48:20 PDT 2019


alex-t marked an inline comment as done.
alex-t added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:429
+  if (!const_cast<MachineBasicBlock *>(DefBB)->canFallThrough() &&
+      (DefBB != MBB)) {
+    MachineBasicBlock::const_iterator IT =
----------------
rampitec wrote:
> What if there are several blocks and branches in between of def and use?
In fact, the only case we should care of is the definition inside the loop and use outside.
In case loop has the divergent exit we have temporal dependency. 
All other cases may be broken into the following:
1. Definition dominates the divergent branch but use does not post-dominate the definition. This means that the use is in one of the branch's target blocks: all threads that have taken this block observe same value. This is true irrespective of how many divergent branches are in between.
2. Definition dominates the divergent branch and use post-dominates the definition. This means that the exec mask is the same for use and definition irrespective of how many times it was modified in between.

So, I'd check for the exact condition: the copy uses exec and is inside the loop and use is outside the loop and loop has divergent exit and there is a path from this exit to the use block.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67662/new/

https://reviews.llvm.org/D67662





More information about the llvm-commits mailing list