[PATCH] D114146: [DA][NFC] Update publication - add remarks

Fri Nov 19 02:11:23 PST 2021

simoll added a comment.

> Not related to the changes proposed in this review, but it seems the PDT is not used by the implementation. It might be good to take out that dependency.

All good points. PDT and the comment on strict loop compactness are leftovers. I will remove PDT in a followup patch.

================
Comment at: llvm/lib/Analysis/SyncDependenceAnalysis.cpp:103
+//
+// -- Known Limitations & Future Work --
+// * The algorithm requires reducible loops because the implementation
----------------
sameerds wrote:
> Good to see both these points spelled out. We are currently working on an implementation that works with irreducible control flow. It's still a work in progress, but involves the new "CycleInfo" being introduced in D112696. I do believe that the single pass of DFA is a strength that need not be lost when handling irreducible control flow. CycleInfo provides a predictable way to work around the lack of a unique header. One just needs to take extra care about entering the same irreducible loop multiple times when constructing ModifiedPO.
Regarding irreducibility, the unclear part to me always was whether the analysis result (detected joins, synchronization points) in IR still represents the synchronization we will end up getting with the final binary on actual hardware.
So the issue is not really getting some result but making sure the entire stack agrees on synchronization.

D85603 should help here. However, if a kernel is irreducible and doesn't use the tools of the patch, the documentation says that reconvergence should be maximal, as early as possible - this may be ambiguous in irreducible control (IIUC implementation defined with cycles).

This is less of an issue with reducible loops as there is a (mostly unspoken) mutual understanding where synchronization happens and so all transformations will abide to that. This is starting to fade however as more recent hardware breaks with the traditions on synchronization.

================
Comment at: llvm/lib/Analysis/SyncDependenceAnalysis.cpp:149
 // * We (virtually) modify the CFG.
-// * We want a loop-compact block enumeration, that is the numbers assigned by
-//   the traveral to the blocks of a loop are an interval.
+// * We want a loop-compact block enumeration, that is the numbers assigned to
+//   blocks of a loop form an interval
----------------
sameerds wrote:
> Is it correct to say that this is not critical for correctness? From what I understood, if other blocks (beyond the loop exit) got interleaved, the implementation will work, but unnecessarily visit and skip the interleaved blocks when walking backwards on the ModifiedPO.
Correct. Exit blocks must have a lower number than the loop header in the traversal (to account for the virtual edge from header to exit). The PO should take care of all other dependences.
Strict loop compactness is a relic from the earlier implementation that recursed on the loop tree during propagation.

It may help with efficiency though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114146/new/

https://reviews.llvm.org/D114146