[llvm] [OpenMPOpt] Fix incorrect end-of-kernel barrier removal (PR #65670)

Johannes Doerfert via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 8 17:52:50 PDT 2023


================
@@ -2676,17 +2676,19 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
       if (!ED.EncounteredAssumes.empty() && !A.isModulePass())
         return;
 
-      // We can remove this barrier, if it is one, or all aligned barriers
-      // reaching the kernel end. In the latter case we can transitively work
-      // our way back until we find a barrier that guards a side-effect if we
-      // are dealing with the kernel end here.
+      // We can remove this barrier, if it is one, or aligned barriers reaching
+      // the kernel end (if CB is nullptr). Aligned barriers reaching the kernel
+      // end may have other successors besides the kernel end (especially if
+      // they're in loops) with non-local side-effects, so those barriers can
+      // only be removed if they also only reach the kernel end. If those
----------------
jdoerfert wrote:

> This should be "if they also only reach the kernel end or other aligned barriers", right?

I don't think that is accurate at least not with the rest in place. See my example (loop_barrier3) with an explicit barrier in the end. The one in the loop will hit only aligned barriers. The one in the end is useless and removed. The end -> barrier elimination will skip it and eliminate the one in the loop.

Here are my current thoughts:
The end -> barrier case is different because we delete the "former"(/only) barrier, not the latter one.
The logic we have is basically trying to determine when a barrier is not actually synchronizing/communicating anything, and it is wrong in the kernel end case because the barrier can (evidently) synchronize with itself.

The logic we use right now is fundamentally flawed since end -> barrier does not say anything about other parts from barrier to somewhere else. In fact, there is no need for the threads reaching the barrier to go (directly) to the end at all. I think this is the crucial point. Our logic only holds if the barrier we are about to eliminate, due to the kernel end case, cannot reach any other synchronization point. If it can, the path from it to the end is not enough reason to remove it. Even if it can only reach other aligned barriers not, e.g.,:
```
if (tid == 7)
  G = 1;
aligned_barrier();        // currently eliminated, reached by kernel end
if (runtime_false())
  return; // kernel exit
aligned_barrier();        // currently eliminated, no side effect since the last barrier
use(G);
```

I see two paths forward:
1) Remove the kernel end special case (and change the kernel end everywhere to not be an aligned barrier). This is simple and prevents the problem, but might cause some leftover barriers.
2) Verify that each barrier we remove in the kernel end case will not reach any other sync point. The easiest solution is to start at the kernel end, and only remove barriers in blocks for which the kernel end block is the unique successor.

WDYT?


https://github.com/llvm/llvm-project/pull/65670


More information about the llvm-commits mailing list