[PATCH] D114146: [DA][NFC] Update publication - add remarks

Fri Nov 19 23:55:12 PST 2021

sameerds added a comment.

In D114146#3142269 <https://reviews.llvm.org/D114146#3142269>, @simoll wrote:

> All good points. PDT and the comment on strict loop compactness are leftovers. I will remove PDT in a followup patch.

So will you be rephrasing the compactness in this patch? While I am convinced that it does not affect correctness, I have not thought about whether it affects accuracy, i.e., whether the lack of compactness may result in some places being marked divergent when they aren't.

================
Comment at: llvm/lib/Analysis/SyncDependenceAnalysis.cpp:103
+//
+// -- Known Limitations & Future Work --
+// * The algorithm requires reducible loops because the implementation
----------------
simoll wrote:
> sameerds wrote:
> > Good to see both these points spelled out. We are currently working on an implementation that works with irreducible control flow. It's still a work in progress, but involves the new "CycleInfo" being introduced in D112696. I do believe that the single pass of DFA is a strength that need not be lost when handling irreducible control flow. CycleInfo provides a predictable way to work around the lack of a unique header. One just needs to take extra care about entering the same irreducible loop multiple times when constructing ModifiedPO.
> Regarding irreducibility, the unclear part to me always was whether the analysis result (detected joins, synchronization points) in IR still represents the synchronization we will end up getting with the final binary on actual hardware.
> So the issue is not really getting some result but making sure the entire stack agrees on synchronization.
> 
> D85603 should help here. However, if a kernel is irreducible and doesn't use the tools of the patch, the documentation says that reconvergence should be maximal, as early as possible - this may be ambiguous in irreducible control (IIUC implementation defined with cycles).
> 
> This is less of an issue with reducible loops as there is a (mostly unspoken) mutual understanding where synchronization happens and so all transformations will abide to that. This is starting to fade however as more recent hardware breaks with the traditions on synchronization.
Indeed, D85603 is a big part of the work being done for the AMDGPU backend. As part of its cleanup, I am hoping to make some conservative statements about the default behaviour when the intrinsics are not being used. The minimal goal is to gracefully handle irreducible control flow, unlike the current DA implementation where we just give up and assume everything is divergent. More as soon as I am in a confident place to make specific statements.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114146/new/

https://reviews.llvm.org/D114146