[PATCH] D87519: [analyzer][Liveness][NFC] Enqueue the CFGBlocks post-order

Fri Feb 12 08:24:46 PST 2021

Szelethus added a comment.

I don't insist on this patch, though I will end up removing the FIXME even if I leave the actual code unchanged, as it seems to be outdated.

================
Comment at: clang/lib/Analysis/LiveVariables.cpp:522

-  // FIXME: we should enqueue using post order.
-  for (const CFGBlock *B : cfg->nodes()) {
+  for (const CFGBlock *B : *AC.getAnalysis<PostOrderCFGView>()) {
     worklist.enqueueBlock(B);
----------------
baloghadamsoftware wrote:
> martong wrote:
> > xazax.hun wrote:
> > > With `BackwardDataflowWorklist`, each  `enqueueBlock` will insert the block into a `llvm::PriorityQueue`. So regardless of the insertion order, `dequeue` will return the nodes in the reverse post order. 
> > > 
> > > Inserting elements in the right order into the heap might be beneficial is we need to to less work to "heapify". But on the other hand, we did more work to insert them in the right order, in the first place. All in all, I am not sure whether the comment is still valid and whether this patch would provide any benefit over the original code. 
> > > 
> > Yes, what Gabor says makes sense. On the other hand I don't see any overhead - I might be wrong though - in the post order visitation. And it makes the code more consistent IMHO.
> > 
> > Well, it would be important to know why the original author put the 
> > ```
> > // FIXME: we should enqueue using post order.
> > ```
> > there. The blamed commit 77ff930fff15c3fc76101b38199dad355be0866b is not saying much.
> Please compare the execution time with and without this patch. I think it is difficult do decide in theory which one costs more: the heapification during insertion or the reverse ordering before the insertion.
I think the performance hit, given that there is any, must be negligible. 

On the project [[ https://github.com/rswier/c4 | c4 ]], a tiny C compiler written in 4 large functions, the current liveness analysis runtimes on my machine in debug mode look like this:

Liveness analysis on next: 7.397604e-03s
Liveness analysis on expr: 1.203549e-02s
Liveness analysis on stmt: 5.470070e-04s
Liveness analysis on main: 1.344798e-02s
Liveness analysis on main: 1.415095e-02s
Liveness analysis on stmt: 5.550660e-04s
Liveness analysis on expr: 1.197334e-02s
Liveness analysis on next: 7.181059e-03s

After this patch, they look like this:

Liveness analysis on next: 7.313751e-03s
Liveness analysis on expr: 1.211920e-02s
Liveness analysis on stmt: 5.582670e-04s
Liveness analysis on main: 1.372210e-02s
Liveness analysis on main: 1.437104e-02s
Liveness analysis on stmt: 5.685340e-04s
Liveness analysis on expr: 1.269498e-02s
Liveness analysis on next: 7.094738e-03s

Mind that this measured the entire analysis, not just enqueuing. I think its fair to say that even on relatively large functions, the difference is within the margin of error, and is most definitely incomparable to its only user's runtime, the static analyzer itself.

> Well, it would be important to know why the original author put the [TODO] there.

Yep, I think its just an outdated comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87519/new/

https://reviews.llvm.org/D87519