[PATCH] D139367: [CSSPGO][llvm-profgen] A tail call tracker to infer missing tail call frames.

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 7 00:56:05 PST 2022


wenlei added inline comments.


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:16-17
 #include <float.h>
+#include <map>
+#include <unordered_map>
 #include <unordered_set>
----------------
I didn't see new instance of these containers. Also didn't see these includes being removed from other headers. 


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:781
     Binary->decodePseudoProbe();
+    refineTailCallTargets();
+  }
----------------
why is this only done for probe case?


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:818
+
+  // Populate SampledIndirectCalls based on static indirect call sites, i.e, entries of
+  // CallTargets with zero target value. Similarly to indirect tail calls.
----------------
The way this is done not only turn unknown indirect call targets into known, but also removes/filters some unsampled indirect calls. I'm wondering whether doing such filtering on direct call would also help narrowing down the search space? 




================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:827
+        assert(CallTargets.count(From) == 1 &&
+               "A callsite should only appear once with either a known a zero "
+               "target value at this point");
----------------
> either a known a zero
typo? I also don't understand this message. 


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:847
+
+  // Remove unmaterialized indirect call targets which are place holders and no
+  // longer useuful.
----------------
Maybe just my pet peeve, but I always feel plain, simple description is much less confusing than those ambiguous terms. Through static analysis, indirect call targets are known, but with samples, these targets become known. Can we just simply say known, unknown, and avoid materialized, unmaterialized? 


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:870
+
+#ifndef NODEBUG
+  // Sanity check to make sure one call target only appears once in the
----------------
NDEBUG


================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:540
+        } else if (MCDesc.isUnconditionalBranch()) {
+          if (Target) {
+            // Any inter-function unconditional jump is considered tail call at
----------------
Can we assert that `Target` is always available from `evaluateBranch` for direct branch?


================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:553
+          });
+        } else if (MCDesc.isIndirectBranch() && MCDesc.isBarrier()) {
+          // This is an indirect branch but not necessarily an indirect tail
----------------
Is `MCDesc.isBarrier()` for filtering *unconditional* branch? Make it explicit in the comments below. 


================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.h:252-256
+  // A map of call instructions to their target addresses.
+  std::unordered_multimap<uint64_t, uint64_t> CallTargets;
+
+  // A map of tail call instructions to their target addresses.
+  std::unordered_multimap<uint64_t, uint64_t> TailCallTargets;
----------------
1. Did you choose `unordered_multimap<uint64_t, uint64_t>` over `unordered_map<uint64_t, unordered_set<uint64_t>>` to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner..

2. The name is bit confusing because this contains both call source and target. 

3. I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now.  


================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.h:494
+
+  std::unordered_multimap<uint64_t, uint64_t>& getTailCallTargets() {
+    return TailCallTargets;
----------------
nit: `type& get..` -> `type &get..`


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:42-43
+    if (FuncRange *ToFRange = Binary->findFuncRange(Call.second)) {
+      TailCallTargets.emplace(Call.first, ToFRange->Func);
+      TailCallReachableFuncs.insert(ToFRange->Func);
+    }
----------------
I found the names a bit confusing, how about making these tweaks: 
TailCallTargets -> TailCallToTargetMap
TailCallReachableFuncs -> TailCallTargetFuncs
FuncTailCalls -> FuncToTailCallMap



================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:50
+  // Deduplicate
+  if (!FuncTailCalls.empty()) {
+    auto I = FuncTailCalls.begin();
----------------
Why use `unordered_multimap` to allow duplication in the first place? With `unordered_map<key_t, unordered_set<value_t>>` for both `TailCallTargets` and `FuncTailCalls`, there won't be duplicates. 


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:61-63
+  if (FuncTailCalls.size() == 1) {
+    TailCallFuncSingleTailCalls++;
+  } else if (FuncTailCalls.size() > 1) {
----------------
`FuncTailCalls` contains all functions and their associated tail calls, and it's not a per-function map. Why size 1 means a single tail call function?


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:86
+
+#ifndef NODEBUG
+  if (!FuncTailCalls.empty()) {
----------------
The no debug macro used should be `NDEBUG`


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:115
+
+uint64_t TailCallContextTracker::computeUniqueTailCallPath(
+    BinaryFunction *From, BinaryFunction *To, SmallVectorImpl<uint64_t> &Path) {
----------------
The return value can be simplified to use boolean instead of int, because we only care whether there's unique path? 

We may need to check for cycle at caller side, so we avoid the need to return 0. 


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:115
+
+uint64_t TailCallContextTracker::computeUniqueTailCallPath(
+    BinaryFunction *From, BinaryFunction *To, SmallVectorImpl<uint64_t> &Path) {
----------------
wenlei wrote:
> The return value can be simplified to use boolean instead of int, because we only care whether there's unique path? 
> 
> We may need to check for cycle at caller side, so we avoid the need to return 0. 
Do we want to limit the level of recursive search? Practically, it's probably very rare to have 3+ back to back tail calls, and three consecutive missing frames due to that. I'm not sure how much this can help though.  

If we open this up to general missing frame inference beyond tail call, such pruning might be useful. 


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:154
+    NumPaths += computeUniqueTailCallPath(TailCall->second, To, Path);
+    // Reachable via multiple paths is considered unreachable. Stop analyzing
+    // the remaining if we are already seeing more than one reachable paths.
----------------
> Reachable via multiple paths is considered unreachable 

This is just non-unique path situation. Why do we need to introduce a notion of "unreachable" while it is actually reachable? This is probably pure terminology issue, but I don't think I understand the rational behind "unreachable".


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:186
+
+uint64_t TailCallContextTracker::inferMissingFrames(
+    uint64_t From, uint64_t To, SmallVectorImpl<uint64_t> &UniquePath) {
----------------
It doesn't seem like we need to return num of path here. We bail early when `NumPaths > 1`, so the number isn't accurate anyways. At most, a boolean indicated whether we found a unique path should be enough. 


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:229
+    UniquePath.pop_back_n(UniquePath.size() - Pos);
+    assert(UniquePath.back() == From && "borken path");
+  }
----------------
typo: broken


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:232
+
+#ifndef NODEBUG
+  if (NumPaths == 1) {
----------------
NDEBUG


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:23
+
+class TailCallContextTracker {
+public:
----------------
The core improvement is to find unique path between a caller frame and callee frame, so we can infer missing frames. You identified cases of missing frames caused by tail call, so you centered the implementation around tail call. But actually recovering missing frame for FPO is not that different either. 

So what I'm thinking about is, have a way to not limit this to just handling tail call, instead of `TailCallContextTracker`, make it a generic `MissingFrameInferrer`. The trade off is that the search is going to be more expensive and more likely to have multiple path. But implementation wise it should be fairly similar and could be turned on/off via switch. 

We don't have to cover everything now, but I think `MissingFrameInferrer` could be a better name than `TailCallContextTracker`, even if we decided to limit to tail call for now. This really is an inferrer instead of a tracker. WDYT? 

Correspondingly `computeUniqueTailCallPath` would be `computeUniqueCallPath` even if it only deals with tail call now. 


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:83-85
+  std::unordered_map<FrameTransition, std::vector<uint64_t>, PairHash>
+      UniquePaths;
+  std::unordered_map<FrameTransition, uint64_t, PairHash> NonUniquePaths;
----------------
1. Add comment for the two data structures - what is the vector and the uint64_t value.

2. I'd avoid introducing terms when more intuitive and plain expression is available. How about just use `CallerCalleePair` instead of `FrameTransition`? This also goes better with `PairHash`

3. For `NonUniquePaths`, do we really care how many paths are there? Would an `unordered_set` be good enough?

4. Can we merge `UniquePaths` and `NonUniquePaths`? so we don't need two separate look up. we could use empty vector to represent non-unique path?


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:89
+
+#ifndef NODEBUG
+  DenseSet<std::pair<uint64_t, uint64_t>> Reachables;
----------------
NDEBUG


================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:92
+  DenseSet<std::pair<uint64_t, uint64_t>> Unreachables;
+  DenseSet<std::pair<uint64_t, uint64_t>> Multireachables;
+#endif
----------------
Would be nice to unify terms, you used unique vs non-unique elsewhere. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139367/new/

https://reviews.llvm.org/D139367



More information about the llvm-commits mailing list