[PATCH] D147812: [InstrProf] Use BalancedPartitioning to order temporal profiling trace data

Tue Apr 11 14:47:08 PDT 2023

ellis added a subscriber: jmestre.
ellis added a comment.

In D147812#4259190 <https://reviews.llvm.org/D147812#4259190>, @davidxl wrote:

> A few high level questions:
>
> Different types of traces don't have the same frequency, so it might be useful to support weighting. The frequency certain trace pattern appear in the profile data does not necessarily match to their frequency in real world usage. To support this, some kind of symbolic id may be needed to annotate the trace data.

I see what you are saying. We could have a set of raw profiles collected under type "A" conditions and another set under type "B" conditions. Maybe type "A" is common while type "B" is more rare so we'd like to weight "A"s traces more than "B"s.

This could be implemented with an extra "weight" field in the trace data for each trace. It hasn't been long since I landed https://reviews.llvm.org/D147287. Do you think it makes sense to land a patch to add this field without updating the version and supporting two trace formats?

The other option is to have the `llvm-profdata merge` command duplicate traces with extra weight. Of course this reduces the data density so we couldn't store as many profiles.

> The cost metric is entropy like. Have you tried other metrics such as gini impurity?

@spupyrev @jmestre Do you have any thoughts on this?

================
Comment at: llvm/lib/Support/BalancedPartitioning.cpp:201
+  // Initialize signatures. Empirically the number of corresponding UtilityNodes
+  // is 4x larger than the number of FunctionNodes.
+  SignatureMapT Signatures;
----------------
davidxl wrote:
> why 4x larger? Is the number of utility nodes the same as 'the number of traces * number of cutoffs" ?
The `Cutoffs` variable is only used with bp when ordering functions from traces. We can also use bp to order functions to improve compression by placing similar function close together. For that case we have another way to assign utility nodes to function nodes, so the number of signatures will be different.

================
Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:3046
+      "cutoffs", cl::CommaSeparated,
+      cl::desc("Timestamp cutoff values used to initialize function nodes for "
+               "ordering functions"),
----------------
davidxl wrote:
> How are the default values selected?
These values were also empirically found by optimizing a large binary. I was debating leaving out a default value since these values are pretty specific to the binary we tested. Now that I think about it, we can try to derive a similar list by looking at the total number of functions and assign cutoffs linearly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147812/new/

https://reviews.llvm.org/D147812