[llvm] [OpenMPOpt] Fix incorrect end-of-kernel barrier removal (PR #65670)

Fri Sep 8 12:39:50 PDT 2023

================
@@ -2676,17 +2676,19 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
       if (!ED.EncounteredAssumes.empty() && !A.isModulePass())
         return;
 
-      // We can remove this barrier, if it is one, or all aligned barriers
-      // reaching the kernel end. In the latter case we can transitively work
-      // our way back until we find a barrier that guards a side-effect if we
-      // are dealing with the kernel end here.
+      // We can remove this barrier, if it is one, or aligned barriers reaching
+      // the kernel end (if CB is nullptr). Aligned barriers reaching the kernel
+      // end may have other successors besides the kernel end (especially if
+      // they're in loops) with non-local side-effects, so those barriers can
+      // only be removed if they also only reach the kernel end. If those
+      // barriers have other barriers reaching them, those can be transitively
+      // removed as well.
       if (CB) {
         DeletedBarriers.insert(CB);
         A.deleteAfterManifest(*CB);
         ++NumBarriersEliminated;
         Changed = ChangeStatus::CHANGED;
       } else if (!ED.AlignedBarriers.empty()) {
----------------
dwoodwor-intel wrote:

I'm still wrapping my head around this pass, but from my understanding it doesn't seem unreasonable to treat the kernel end as an aligned barrier. Even if the kernel exit instructions themselves are divergent and different ones are reached in different threads, it seems like we can still model the control flow as re-converging after the different kernel exits before our implicit post-kernel barrier.

My understanding is that it's legal and useful to remove barriers near the end of kernels in cases like this:

```
define void @simple_end_barrier() "kernel" {
  call void @unknown()
  call void @aligned_barrier() ; Barrier A
  ret void
}
```

It seems like we need a concept of a post-kernel barrier to do this, though, since otherwise `Barrier A` isn't redundant and can't be removed.

https://github.com/llvm/llvm-project/pull/65670