[PATCH] D124490: [InstrProf] Minimal Block Coverage

Sun Feb 26 00:41:10 PST 2023

MaskRay added a comment.

In `BlockCoverageInference.cpp` , `for (auto &BB : F) { ...  getReachableAvoiding` is strictly quadratic and I think that may be problematic.
There are quite a few programs (e.g. Chisel genrerated C++) which contain functions with many basic blocks (say, >1000).
Also see D137184 <https://reviews.llvm.org/D137184>: a workaround for some programs with too many critical edges.

Applying a strict quadratic algorithm is a compile time pitfall. I understand that it may be fine for some mobile applications.
There are certainly other quadratic algorithms such as an `O((V+E)*c)` algorithm in GCC gcov (V: number of vertices; E: number of edges; c: number of elementary circuits),
an O(V^2) implementation of Semi-NCA algorithm.
For them, the quadratic time complexity is a theoretic upper bound which is really really difficult to achieve in practice (if ever achievable considering that in practice most CFGs are reducible).

Note that the optimality in the number of instrumented basic blocks is not required.
It does not necessarily lead to optimal performance since we have discarded execution count information.
Is it possible to use a faster algorithm which may instrument a few more basic blocks (let's arbitrarily say 1.2-approximation is good enough in practice)?

================
Comment at: llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp:111
+  for (auto &BB : F)
+    if (succ_empty(&BB))
+      ExitBlocks.push_back(&BB);
----------------
Rename ExitBlocks: BB may end with an unreachable/resume/cleanupret/etc. I think we don't name them exit blocks.

================
Comment at: llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp:137
+    BlockSet ReachableFromEntry, ReachableFromExit;
+    getReachableAvoiding(&EntryBlock, &BB, /*IsForward=*/true,
+                         ReachableFromEntry);
----------------
ellis wrote:
> MaskRay wrote:
> > The time complexity is O(|BB|^2) and can be problematic.
> I've added a comment discussing the runtime. We do know of a linear time algorithm that would compute the Dependencies maps, but it is significantly more complicated. We thought it would be best to first add this simpler implementation and in the future we can use the linear algorithm if we find that it gives a significant speedup in practice.
We have quadratic time algorithms like `O((V+E)(c+1))` used in gcov.

================
Comment at: llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp:1717
+                 << (IsBlockDead(BB).value() ? "Dead" : "Covered") << "\n");
+      NumCorruptCoverage++;
+    }
----------------

================
Comment at: llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp:1365
+  bool WasChanged = true;
+  while (WasChanged) {
+    WasChanged = false;
----------------
ellis wrote:
> MaskRay wrote:
> > This can use a flood-fill instead of a fixed-point iteration.
> I opted to use this algorithm for simplicity. Since most CFGs are relatively small, this function doesn't contribute too much to buildtime. If we discover that this causes a slowdown in some real world scenario then we can improve it. In fact, we can use info from the block coverage inference algorithm to infer all blocks in one pass. But again, this would add complexity and I would rather not do this unless we needed to.
I think it's worth fixing it. A flood-fill algorithm is nearly of the same length and avoids the performance pitfall. That style is used much more than the iterative algorithm here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124490/new/

https://reviews.llvm.org/D124490