[PATCH] D96754: [NewPM] Use stale divergence analysis with SimpleLoopUnswitch

Tue Feb 16 21:28:14 PST 2021

sameerds added a comment.

In D96754#2566461 <https://reviews.llvm.org/D96754#2566461>, @asbirlea wrote:

> This is absolutely **not** the right way resolve this.
> First, the restriction to not allow getting a stale analysis is very much intentional and part of the design of the new pass manager. The API being added here must not exist.
> Second, it is not safe to use the stale DA.

Please see other comments for the specific use-case. It is safe for what loop unswitching does. Maybe the use of the word "stale" provokes a conservative reaction. It might be more useful to think in terms of an analysis being used as a hint rather than a reliable truth.

> LoopUnswitch gets the LegacyDivergenceAnalysis only when making the final unswitching decision, and it does not reuse a stale instance. The same needs to happen in SimpleLoopUnswitch.

That is not true. The flow in the legacy pass manager is more involved. The function LoopPass::preparePassManager() actually makes sure that if a loop transform T running inside a loop pass manager M invalidates analyses used by other passes in M, then T is split out into a separate loop pass manager M'. Thus every instance of loop pass manager is responsible for making sure that the used analyses are recomputed before starting any passes. The divergence analysis is not computed in the middle of loop unswitching like you seem to be suggesting here.

But now that I looked at it more closely, this does invalidate my original assumption that the legacy PM is providing a stale analysis. In that sense, my patch should not be seen as reproducing existing behaviour ... it's turning out to be more of a hack because the new PM lacks some functionality. There doesn't seem to be a way for a transform in the new PM to isolate itself from the effects of other transforms that invalidate analyses in the outer analysis manager.

> A proper solution is to change the divergence analysis pass so it can be created as an object. An example of a pass used as an object is the OptimizationRemarkEmitter.

The DA is already available as a standalone object. But then if this was a proper solution, then that would undermine the utility of the whole analysis manager framework. Is it really a good idea to build a framework that is not flexible enough and then advise people to stay out of it if their use-case is not covered? Recomputing a function analysis on every loop seems unnecessarily costly when a stale analysis would have been good enough.

================
Comment at: llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:2886-2891
+/// The `StaleDA` analysis is useful for skipping divergent branches
+/// if it is available: unswitching such a branch is expensive on
+/// targets that have divergence. The analysis is stale since other
+/// loop transforms neither preserve nor update it, but it is safe for
+/// skipping branches.
+///
----------------
tra wrote:
> Could you elaborate on why it is always safe?
> 
> IIUIC, the idea is that by default we'll treat all branches as divergent. The DA will allow treating some branches as non-divergent. Any new branches created transforms will not be included in the stale DA and therefore will be treated as divergent,
> 
> Perhaps `BaselineDA` would be a better name? Yes, it is potentially stale and that needs to be prominently displayed, but being stale is not particularly descriptive of the parameter.
> 
> 
When you unswitch a divergent branch, you're simply splitting the warp into two active masks, which will separately follow the corresponding side of the branch. This does not affect the execution of the threads themselves. But it does force a typical GPU to serialize execution ... while threads in one active mask are executing their side, the others must remain inactive and vice versa. This usually has lower performance than having all the threads go through the single original loop including a divergent branch.

I can imagine `StaleDA` is a bit negative, but `BaselineDA` also does not reflect the fact that it is unreliable. How about `DivergenceHint`? That brings out the utility of the result while indicating that the results are not entirely reliable.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96754/new/

https://reviews.llvm.org/D96754