[llvm] [clang] [PseudoProbe] Mix and reorder block and call probe ID in lexical order (PR #75092)

Wed Jan 24 19:00:32 PST 2024

================
@@ -471,6 +471,66 @@ ProfileGenerator::getTopLevelFunctionProfile(FunctionId FuncName) {
   return ProfileMap.Create(Context);
 }
 
+// Use a heuristic to determine probe order by checking callsite insertion
+// position relative to the BB probes. Sort all the BB probes is in a list, for
+// each calliste, compute "ratio = insert position / length of the list". For
+// the old order, the probe ids for BB should be all before(smaller than) the
+// probe ids for callsite, this ratio should be equal to or close to 1.
+bool checkProbeIDIsInMixedOrder(const SampleProfileMap &Profiles) {
+  // Use flattned profile to maximize the callsites in the profile.
+  SampleProfileMap flattenProfile;
+  ProfileConverter::flattenProfile(Profiles, flattenProfile);
+
+  uint32_t PossibleOldOrderNum = 0;
+  uint32_t PossibleNewOrderNum = 0;
+
+  for (const auto &I : flattenProfile) {
+    const FunctionSamples &FS = I.second;
+    // Skip small functions whose profile order are likely random.
+    if (FS.getBodySamples().size() < 10)
+      continue;
+
+    std::set<uint32_t> PossibleBBProbeIDs;
+    std::set<uint32_t> CallsiteIDs;
+    for (const auto &I : FS.getBodySamples()) {
+      if (I.second.getCallTargets().empty())
+        PossibleBBProbeIDs.insert(I.first.LineOffset);
+      else
+        CallsiteIDs.insert(I.first.LineOffset);
+    }
+
+    if (PossibleBBProbeIDs.empty() || CallsiteIDs.empty())
+      continue;
+
+    uint32_t DistanceToBeginSum = 0;
+    for (const auto &ID : CallsiteIDs)
+      DistanceToBeginSum += std::distance(PossibleBBProbeIDs.begin(),
+                                          PossibleBBProbeIDs.upper_bound(ID));
+    uint32_t LengthSum = PossibleBBProbeIDs.size() * CallsiteIDs.size();
+
+    // Note that PossibleBBProbeIDs could contains some callsite probe id, the
+    // ratio is not exactly 1 for the old order, hence use a smaller threshold
+    // to determine.
+    if ((float)DistanceToBeginSum / LengthSum > 0.8)
+      PossibleOldOrderNum++;
+    else
+      PossibleNewOrderNum++;
+  }
+  return PossibleNewOrderNum >= PossibleOldOrderNum;
----------------
wlei-llvm wrote:

Sorry, it was also far complicated than I expected, try to explain more about this, we can discuss offline. 

The fact is that, in theory there can be cases that all the calls are after BB probes for the new order ID(like all hot profiled calls are in the last BB), so even we can decide the probe is call or not, we still can't just use one function or one probe as a line to decide the order, that's why here I use this complex statistical way.(It appears it works well)

> If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable

So far sadly no from profile. We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work.(Also as mentioned above, even the case all calls are after all BB also can't decide the order) 

> In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Yep, both are actually a lot in a large production-used profile, that's why I use a 0.8 not a 0.9 things, I spent some time tuning this threshold, right now it works very well on my large profile test.

> but if that's case we can't even confidently issue error.

To some degree, this heuristic is reliable(as long as there are enough big functions). From 
my statistics, using 0.8 as threshold showed it works really good to detect the order, the two num are distinguishable. For both side, the `PossibleOrderNum` is much larger than the other num, the ratio(PossibleNewOrderNum/PossibleOldOrderNum) is almost ~0.9 for new order, a large room to swing, and vice versa.  (like PossibleNewOrderNum is 9X more than  PossibleOldOrderNum for a new order. Or PossibleOldOrderNum is 9X more than PossibleNewOrderNum for an old order). 
But TBH, I'm sure how it works for a small profile, what i tested is all larger than 100M profile, I know that for a small function, it's quite random, it just has little calls, I have to filter out those small function as showed in the code. 



https://github.com/llvm/llvm-project/pull/75092