[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

Wed Jun 12 13:55:59 PDT 2024

================
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction &Func) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction &Func) {
+bool canApplyInference(const FlowFunction &Func,
+                       const yaml::bolt::BinaryFunctionProfile &YamlBF) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
     return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+      opts::MatchedProfileThreshold / 100.0)
+    return false;
----------------
WenleiHe wrote:

Trying to understand the rationale behind using dynamic counts to determine whether profile inference is safe. 

The way I see it is, we have two graph that we try to match, if we have many nodes in the graph that we have exact match, chances are higher that we can infer the correct match for the rest of the nodes. With that, we care about more how many nodes we can match statically. 

Say if we have 5 blocks with count distribution of 1M, 1K, 1K, 1k, 1K, if we have exact match for the 4 1K node (80% exact match), we should feel reasonably confident about inferring the remaining 1 node, even though if we look at counts, we have exact match for only <1%. 

WDYT?

https://github.com/llvm/llvm-project/pull/95156