[PATCH] D139367: [CSSPGO][llvm-profgen] A tail call tracker to infer missing tail call frames.

Wed Dec 7 11:00:18 PST 2022

hoy marked 5 inline comments as done.
hoy added inline comments.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:16-17
 #include <float.h>
+#include <map>
+#include <unordered_map>
 #include <unordered_set>
----------------
wenlei wrote:
> I didn't see new instance of these containers. Also didn't see these includes being removed from other headers. 
Good catch, removed.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:781
     Binary->decodePseudoProbe();
+    refineTailCallTargets();
+  }
----------------
wenlei wrote:
> why is this only done for probe case?
Because the current tracker implementation only applies to the pseudo probe case where `AddrBasedCtxKey` is used. The line number base uses `StringBasedCtxKey`. So there's no need to do `refineTailCallTargets` for the line number case for now.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:818
+
+  // Populate SampledIndirectCalls based on static indirect call sites, i.e, entries of
+  // CallTargets with zero target value. Similarly to indirect tail calls.
----------------
wenlei wrote:
> The way this is done not only turn unknown indirect call targets into known, but also removes/filters some unsampled indirect calls. I'm wondering whether doing such filtering on direct call would also help narrowing down the search space? 
> 
> 
Good point. I think it should be helpful. Will give it a try.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:827
+        assert(CallTargets.count(From) == 1 &&
+               "A callsite should only appear once with either a known a zero "
+               "target value at this point");
----------------
wenlei wrote:
> > either a known a zero
> typo? I also don't understand this message. 
should be "a known or a zero"

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:847
+
+  // Remove unmaterialized indirect call targets which are place holders and no
+  // longer useuful.
----------------
wenlei wrote:
> Maybe just my pet peeve, but I always feel plain, simple description is much less confusing than those ambiguous terms. Through static analysis, indirect call targets are known, but with samples, these targets become known. Can we just simply say known, unknown, and avoid materialized, unmaterialized? 
Yeah, known/unknown sounds good.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:870
+
+#ifndef NODEBUG
+  // Sanity check to make sure one call target only appears once in the
----------------
wenlei wrote:
> NDEBUG
fixed.

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:540
+        } else if (MCDesc.isUnconditionalBranch()) {
+          if (Target) {
+            // Any inter-function unconditional jump is considered tail call at
----------------
wenlei wrote:
> Can we assert that `Target` is always available from `evaluateBranch` for direct branch?
Done.

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:553
+          });
+        } else if (MCDesc.isIndirectBranch() && MCDesc.isBarrier()) {
+          // This is an indirect branch but not necessarily an indirect tail
----------------
wenlei wrote:
> Is `MCDesc.isBarrier()` for filtering *unconditional* branch? Make it explicit in the comments below. 
It's to filter out conditional branch. Tail call is an unconditional branch. Comment updated.

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.h:252-256
+  // A map of call instructions to their target addresses.
+  std::unordered_multimap<uint64_t, uint64_t> CallTargets;
+
+  // A map of tail call instructions to their target addresses.
+  std::unordered_multimap<uint64_t, uint64_t> TailCallTargets;
----------------
wenlei wrote:
> 1. Did you choose `unordered_multimap<uint64_t, uint64_t>` over `unordered_map<uint64_t, unordered_set<uint64_t>>` to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner..
> 
> 2. The name is bit confusing because this contains both call source and target. 
> 
> 3. I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now.  
> Did you choose unordered_multimap<uint64_t, uint64_t> over unordered_map<uint64_t, unordered_set<uint64_t>> to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner..

Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother construct a container for each of those targets. Duplications are handled in `refineTailCallTargets` .

> The name is bit confusing because this contains both call source and target.

How about CallPairs or just CallEdges?

> I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now.

Yeah, they are separated mostly because of efficiency. A `unordered_map<uint64_t, unordered_set<uint64_t>>` would unify them but at the expense of efficiency. 

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:42-43
+    if (FuncRange *ToFRange = Binary->findFuncRange(Call.second)) {
+      TailCallTargets.emplace(Call.first, ToFRange->Func);
+      TailCallReachableFuncs.insert(ToFRange->Func);
+    }
----------------
wenlei wrote:
> I found the names a bit confusing, how about making these tweaks: 
> TailCallTargets -> TailCallToTargetMap
> TailCallReachableFuncs -> TailCallTargetFuncs
> FuncTailCalls -> FuncToTailCallMap
> 
Sounds good.

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:50
+  // Deduplicate
+  if (!FuncTailCalls.empty()) {
+    auto I = FuncTailCalls.begin();
----------------
wenlei wrote:
> Why use `unordered_multimap` to allow duplication in the first place? With `unordered_map<key_t, unordered_set<value_t>>` for both `TailCallTargets` and `FuncTailCalls`, there won't be duplicates. 
It's about efficiency. 95% of tail calls have a single target, using `unordered_set` for them sounds like unnecessary. WDYT?

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:61-63
+  if (FuncTailCalls.size() == 1) {
+    TailCallFuncSingleTailCalls++;
+  } else if (FuncTailCalls.size() > 1) {
----------------
wenlei wrote:
> `FuncTailCalls` contains all functions and their associated tail calls, and it's not a per-function map. Why size 1 means a single tail call function?
Size 1 means there is only one tail call in the program. It in turn means there's only one function having single tail call.

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:115
+
+uint64_t TailCallContextTracker::computeUniqueTailCallPath(
+    BinaryFunction *From, BinaryFunction *To, SmallVectorImpl<uint64_t> &Path) {
----------------
wenlei wrote:
> wenlei wrote:
> > The return value can be simplified to use boolean instead of int, because we only care whether there's unique path? 
> > 
> > We may need to check for cycle at caller side, so we avoid the need to return 0. 
> Do we want to limit the level of recursive search? Practically, it's probably very rare to have 3+ back to back tail calls, and three consecutive missing frames due to that. I'm not sure how much this can help though.  
> 
> If we open this up to general missing frame inference beyond tail call, such pruning might be useful. 
> The return value can be simplified to use boolean instead of int, because we only care whether there's unique path?

Hmm, it currently returns three kind of values, 0, 1 and multiple. I'm not seeing how a boolean could make the logic simpler.

> Do we want to limit the level of recursive search? Practically, it's probably very rare to have 3+ back to back tail calls, and three consecutive missing frames due to that. I'm not sure how much this can help though.

Having a limit of the search level makes sense, but I'm not sure what is a good value there. It' not uncommon to have a tail call chain with more than 3 frames. For example, a chain consists of one-liner getter helpers. I can get some numbers and see.

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:154
+    NumPaths += computeUniqueTailCallPath(TailCall->second, To, Path);
+    // Reachable via multiple paths is considered unreachable. Stop analyzing
+    // the remaining if we are already seeing more than one reachable paths.
----------------
wenlei wrote:
> > Reachable via multiple paths is considered unreachable 
> 
> This is just non-unique path situation. Why do we need to introduce a notion of "unreachable" while it is actually reachable? This is probably pure terminology issue, but I don't think I understand the rational behind "unreachable".
The comment is confusing. `unreachable` has its literal meaning in the algorithm. That reachable via multiple paths has a different meaning. I just simplified the comment to "Stop analyzing the remaining if we are already seeing more than one reachable paths."

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.cpp:186
+
+uint64_t TailCallContextTracker::inferMissingFrames(
+    uint64_t From, uint64_t To, SmallVectorImpl<uint64_t> &UniquePath) {
----------------
wenlei wrote:
> It doesn't seem like we need to return num of path here. We bail early when `NumPaths > 1`, so the number isn't accurate anyways. At most, a boolean indicated whether we found a unique path should be enough. 
Not sure a boolean is enough. E.g, it wouldn't differentiate the following two cases when inferring missing frames between A -> F 

- Case 1 with actually paths:

```
A -> B -> C -> F

A -> B -> D -> F

A -> E -> F
```

The integer return value can tell that we should stop analyzing the path `A -> E -> F` because there are already multiple available paths. Returning 0 or false would not tell it apart from the case below.

- Case 2 with actually paths:

```
A -> B -> C 

A -> E -> F
```

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:23
+
+class TailCallContextTracker {
+public:
----------------
wenlei wrote:
> The core improvement is to find unique path between a caller frame and callee frame, so we can infer missing frames. You identified cases of missing frames caused by tail call, so you centered the implementation around tail call. But actually recovering missing frame for FPO is not that different either. 
> 
> So what I'm thinking about is, have a way to not limit this to just handling tail call, instead of `TailCallContextTracker`, make it a generic `MissingFrameInferrer`. The trade off is that the search is going to be more expensive and more likely to have multiple path. But implementation wise it should be fairly similar and could be turned on/off via switch. 
> 
> We don't have to cover everything now, but I think `MissingFrameInferrer` could be a better name than `TailCallContextTracker`, even if we decided to limit to tail call for now. This really is an inferrer instead of a tracker. WDYT? 
> 
> Correspondingly `computeUniqueTailCallPath` would be `computeUniqueCallPath` even if it only deals with tail call now. 
Yeah, renaming `TailCallContextTracker` as `MissingFrameInferrer` sounds good. 

> Correspondingly computeUniqueTailCallPath would be computeUniqueCallPath even if it only deals with tail call now.

I still like `computeUniqueTailCallPath` since what it real does is to compute a unique tail call path. Moving forward we could extend `inferMissingFrames` to make an extra call to another say `computeUniqueFPOPath` or something?

================
Comment at: llvm/tools/llvm-profgen/TailCallTracker.h:83-85
+  std::unordered_map<FrameTransition, std::vector<uint64_t>, PairHash>
+      UniquePaths;
+  std::unordered_map<FrameTransition, uint64_t, PairHash> NonUniquePaths;
----------------
wenlei wrote:
> 1. Add comment for the two data structures - what is the vector and the uint64_t value.
> 
> 2. I'd avoid introducing terms when more intuitive and plain expression is available. How about just use `CallerCalleePair` instead of `FrameTransition`? This also goes better with `PairHash`
> 
> 3. For `NonUniquePaths`, do we really care how many paths are there? Would an `unordered_set` be good enough?
> 
> 4. Can we merge `UniquePaths` and `NonUniquePaths`? so we don't need two separate look up. we could use empty vector to represent non-unique path?
`CallerCalleePair` sounds good.

> For NonUniquePaths, do we really care how many paths are there? Would an unordered_set be good enough?

We don't really care. `unordered_set` should be good enough.

> Can we merge UniquePaths and NonUniquePaths? so we don't need two separate look up. we could use empty vector to represent non-unique path?

Constructing an empty vector also takes extra memory and time, but is probably better than an extra hash lookup? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139367/new/

https://reviews.llvm.org/D139367