[PATCH] D34863: [XRay][tools] Function call stack based analysis tooling for XRay traces

Tue Jul 11 13:55:17 PDT 2017

kpw added a comment.

Sorry for the delay getting comments on this. My phabricator email address lapsed into an unvalidated state and I haven't been getting messages until today.

Thanks for getting a start on this. The trie data structure looks very analogous to the Chrome Trace Viewer format given some scheme to generate stack ids like FnId->FnId->FnId.

I can pick this up and play with it, using StackTrie within llvm-xray convert. We have a sync later this afternoon, but the moral of the story is that I'll integrate this into my tree and make the changes from my comments to get started.
Would you like me to "Commandeer the revision" once I have some edits?

================
Comment at: tools/llvm-xray/xray-stacks.cc:57
+static cl::opt<std::string> StacksInstrMap(
+    "instr_map", cl::desc("instrumentation map used to identify function ids"),
+    cl::sub(Stack), cl::init(""));
----------------
Might be worth pointing out that this expects a compiler output format (e.g. elf). Are there other supported formats? This type of option is common to all the llvm-xray sub commands. Is there a good way to hyperlink our command line options such as "look over in that manpage or in llvm-xray --help?"

================
Comment at: tools/llvm-xray/xray-stacks.cc:63-64
+
+// The stack command will take a set of XRay traces as arguments, and collects
+// information about the stack traces that appear in the traces. We track the
+// following pieces of information:
----------------
Doxygen comments? 
```
sed 's_//_///_g'
```

================
Comment at: tools/llvm-xray/xray-stacks.cc:89
+//
+//   1. When unwinding record the durations of each unwound functions associated
+//      with the path up to which the unwinding stops. For example:
----------------
functions -> function

================
Comment at: tools/llvm-xray/xray-stacks.cc:92-99
+//        push a <start time>
+//        push b <start time>
+//        push c <start time>
+//        pop  c <end time>
+//        pop  b <end time>
+//        push c <start time> <-- before this, record durations for path a->b->c
+//        pop  c <end time>
----------------
This might be more clear with two columns.

Step                                             Duration State
push a <start time>                   a = ?
push b <start time>                   a = ?, a->b = ?
push c < start time>                  a = ?, a->b = ?, a->b->c = ?
pop c <end time>                      a = ?, a->b = ?, emit duration a->b->c 
pop b <end time>                      a = ?, emit duration a->b
push c <start time>                   a = ?, a->c = ?
pop c <end time>                      a = ?, emit duration  a->c
pop a <end time>                      emit duration a

The reason I find the comment confusing that it doesn't sound like there is a record emitted each time a pop happens, but only when a push follows a pop or the stack is empty.

================
Comment at: tools/llvm-xray/xray-stacks.cc:117-118
+//
+//   3. For cases where we don't have durations for some of the higher levels of
+//      the stack, we can mark them appropriately.
+//
----------------
Why are the higher levels of the stack special?

================
Comment at: tools/llvm-xray/xray-stacks.cc:153
+// represent all the call stacks in an easily traversible manner later on when
+// we do the aggregations and lookups. For the call sequences like the
+// following:
----------------
I think it might be worth inventing terminology like "instrumented call sequences" and "instrumented call stacks".

It's obvious from the implementation that intermediate and leaf nodes will only include xray instrumented functions, but for a user new to the tool and accustomed to sampling profilers, commonly understood terminology like call stack could serve to reinforce a misconception that each function is included in this trie.

================
Comment at: tools/llvm-xray/xray-stacks.cc:182
+
+  // We maintain a pointer to the roots of the tries we see.
+  SmallVector<TrieNode *, 4> Roots;
----------------
Nit: We maintain a pointer -> We maintain pointers

================
Comment at: tools/llvm-xray/xray-stacks.cc:188-189
+
+  DenseMap<uint32_t, SmallVector<std::pair<TrieNode *, uint64_t>, 8>>
+      ThreadStackMap;
+
----------------
Perhaps a comment that the uint64_t pair parameter is for start times is justified.

================
Comment at: tools/llvm-xray/xray-stacks.cc:204
+
+  TrieNode *findRoot(int32_t FuncId) {
+    for (auto Node : Roots)
----------------
Alternatively use a DenseMap of FuncId -> TrieNode* for roots. Expected value of cardinality(root_functions) is small, so it's up to you which is preferable.

================
Comment at: tools/llvm-xray/xray-stacks.cc:212
+public:
+  bool accountRecord(const XRayRecord &R) {
+    auto &TS = ThreadStackMap[R.TId];
----------------
What does the return type mean?

================
Comment at: tools/llvm-xray/xray-stacks.cc:233-235
+        // We didn't find the callee in the stack trie, so we're going to
+        // add to the stack then set up the pointers properly.
+        auto N = createTrieNode(R.FuncId, Top.first);
----------------
We're going to create duplicate nodes for each thread that this stack id appears in if we don't search the FuncId index.

I think that this duplication is actually *good*, because it gives us another dimension to work with for statistics. Users that are interested in profiling work distribution approaches might want to compare stack-id duration across threads. We're going to lose that information once the thread stack is unwound, but we could trivially retain it.

================
Comment at: tools/llvm-xray/xray-stacks.cc:258
+
+      auto Parent =
+          find_if(reverse(TS), [&](const std::pair<TrieNode *, uint64_t> &E) {
----------------
Parent doesn't seem like the right name. We're not looking for the func-id's parent. We're looking for a match.

================
Comment at: tools/llvm-xray/xray-stacks.cc:271-275
+      auto I = std::next(Parent).base();
+      for (auto &E : make_range(I, TS.end()))
+        E.first->Durations.push_back(std::max(E.second, R.TSC) -
+                                     std::min(E.second, R.TSC));
+      TS.erase(I, TS.end());
----------------
This seems really fishy to me. I think it's a bug.

It should be "auto I = Parent.base()" and I think this is more obvious if the Parent was named Match.

Parent is the reverse_iterator into the ThreadStack that points to an entry that has a *matching* function id of the function that an EXIT record is being processed for.

Calling std::next on the reverse_iterator does contain the actual parent (or caller) of that function in the thread stack, which is then turned into the forward_iterator with .base()

The caller function and all of its children have a duration recorded and are removed from the thread stack. This is wrong. The caller function isn't being exited from.

================
Comment at: tools/llvm-xray/xray-stacks.cc:294
+      auto Sum = std::accumulate(F->Durations.begin(), F->Durations.end(), 0LL);
+      auto Fn = FN.SymbolOrNumber(F->FuncId);
+      OS << formatv("#{0,-4} {1,-60} {2,+12} {3,+16}\n", Level++,
----------------
Fn and FN are both variable names at this point (in addition to F). That seems like a recipe for confusion.

================
Comment at: tools/llvm-xray/xray-stacks.cc:327
+        if (Top->Callees.empty()) {
+          ++UniqueStacks;
+          auto TopSum = std::accumulate(Top->Durations.begin(),
----------------
I think this is accurate if you consider the thread id part of the stack identifier, otherwise see above comment. I think we should (optionally) merge stack sums across threads here.

================
Comment at: tools/llvm-xray/xray-stacks.cc:349
+
+          // printStack(OS, Top, FN);
+        }
----------------
Remove?

https://reviews.llvm.org/D34863