[llvm] [OpenMPOpt] Fix incorrect end-of-kernel barrier removal (PR #65670)

Mon Sep 11 10:25:28 PDT 2023

================
@@ -2676,17 +2676,19 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
       if (!ED.EncounteredAssumes.empty() && !A.isModulePass())
         return;
 
-      // We can remove this barrier, if it is one, or all aligned barriers
-      // reaching the kernel end. In the latter case we can transitively work
-      // our way back until we find a barrier that guards a side-effect if we
-      // are dealing with the kernel end here.
+      // We can remove this barrier, if it is one, or aligned barriers reaching
+      // the kernel end (if CB is nullptr). Aligned barriers reaching the kernel
+      // end may have other successors besides the kernel end (especially if
+      // they're in loops) with non-local side-effects, so those barriers can
+      // only be removed if they also only reach the kernel end. If those
----------------
dwoodwor-intel wrote:

I also think what we're doing now with special-casing the kernel end is not the right approach, but I think there is some value in generalizing the "former" barrier removal if we can get the conditions for it right. That would help in cases like this:

```
define void @only_first_barrier_redundant(i1 %c) "kernel" {
  call void @unknown()
  br i1 %c, label %if, label %end

if:
  call void @unknown()
  call void @aligned_barrier() ; Barrier A
  br label %end

end:
  call void @aligned_barrier() ; Barrier B
  call void @unknown()
  ret void
}
```

We can't remove `Barrier B` in this case and we can't remove `Barrier A` using our normal "latter" barrier removal because it has an `unknown` call before it. But `Barrier A` is still redundant here because it's always immediately followed by `Barrier B`, so if we had "former" barrier removal in cases other than the kernel end case we could safely remove it.

As in your example with a conditional return between the barriers, one of the big challenges here is that if we're removing both former and latter barriers in the common straight-line case we'll remove the former barrier because it's immediately followed by a barrier and also the latter barrier because it's immediately preceded by a barrier. This is a stale analysis problem, though: once we remove one of the barriers it's no longer the case that the remaining barrier does not have any side effects (or only reaches aligned barriers) before/after it. We could fix this by making sure the analysis is up-to-date, either by propagating the results from barriers as we remove them or by re-computing the analysis between the first sweep removing latter barriers and the second sweep removing former barriers.

This is a more ambitious strategy that will take more time to get right, though. For now I think just doing what we are now but only removing barriers in the end -> barrier case when the kernel end is the unique successor is a reasonable solution since it should fix all of the cases where barriers are being incorrectly removed while still removing barriers in simple cases. I'll update this PR to implement this instead.

https://github.com/llvm/llvm-project/pull/65670