[llvm] [llvm-profgen] Improve sample profile density (PR #92144)

Fri May 17 11:35:02 PDT 2024

================
@@ -1032,6 +1035,78 @@ void CSProfileGenerator::convertToProfileMap() {
   IsProfileValidOnTrie = false;
 }
 
+void CSProfileGenerator::calculateAndShowDensity(
+    SampleContextTracker &CTracker) {
+  double Density = calculateDensity(CTracker);
+  showDensitySuggestion(Density, ProfileDensityHotFuncCutOff);
+}
+
+// Calculate Profile-density:
+// Sort the list of function-density in descending order and iterate them once
+// their accumulated total samples exceeds the percentage_threshold of total
+// profile samples, the profile-density is the last(minimum) function-density of
+// the processed functions, which means all the functions significant to perf
+// are on good density if the profile-density is good, or in other words, if the
+// profile-density is bad, the accumulated samples for all the bad density
+// profile exceeds the (100% - percentage_threshold).
+// The percentage_threshold(--profile-density-hot-func-cutoff) is configurable
+// depending on how much regression the system want to tolerate.
+double CSProfileGenerator::calculateDensity(SampleContextTracker &CTracker) {
+  double ProfileDensity = 0.0;
+
+  uint64_t TotalProfileSamples = 0;
+  // A list of the function profile density and total samples.
+  std::vector<std::pair<double, uint64_t>> DensityList;
+  for (const auto *Node : CTracker) {
+    const auto *FSamples = Node->getFunctionSamples();
+    if (!FSamples)
+      continue;
+
+    uint64_t TotalBodySamples = 0;
+    uint64_t FuncBodySize = 0;
+    for (const auto &I : FSamples->getBodySamples()) {
+      TotalBodySamples += I.second.getSamples();
+      FuncBodySize++;
----------------
wlei-llvm wrote:

yeah, I noticed that. Assuming the ideal density is 
`density = total instruction samples / binary size`. (binary size is num of instruction.)
The reason is right now we use the probe, and we don't have the instruction num for each probe, thus the total_samples is actually not the sum of all instruction samples, so using the binary function size is probably not accurate now. 
Instead, using the probe seems more reasonable, because the samples is from the LBR, the minimum unit is a range not an address. 
For example, say a function only have one BB/probe, but this BB has 10000 instructions,  no any jmp, now say we only have one LBR hit cover all the instructions,  the old one vs the new one is `1` vs `1/10000`. 

>IIRC for BodySamples, we omit 0 entries, so counting number of body samples to compute size seems wrong?

We do generate the 0 count probe, we don't generate the probe that is fully optimized out, but as it doesn't show in the binary, it should ok to ignore it. 






https://github.com/llvm/llvm-project/pull/92144