[llvm] [CGData][MachineOutliner] Global Outlining (PR #90074)

Wed Aug 28 14:14:33 PDT 2024

================
@@ -695,6 +859,39 @@ void MachineOutliner::findCandidates(
   }
 }
 
+void MachineOutliner::computeAndPublishHashSequence(MachineFunction &MF,
+                                                    unsigned CandSize) {
+  // Compute the hash sequence for the outlined function.
+  SmallVector<stable_hash> OutlinedHashSequence;
+  for (auto &MBB : MF) {
+    for (auto &NewMI : MBB) {
+      stable_hash Hash = stableHashValue(NewMI);
+      if (!Hash) {
+        OutlinedHashSequence.clear();
+        break;
+      }
+      OutlinedHashSequence.push_back(Hash);
+    }
+  }
+
+  // Append a unique name based on the non-empty hash sequence.
+  if (AppendContentHashToOutlinedName && !OutlinedHashSequence.empty()) {
+    auto CombinedHash = stable_hash_combine(OutlinedHashSequence);
+    auto NewName =
+        MF.getName().str() + ".content." + std::to_string(CombinedHash);
+    MF.getFunction().setName(NewName);
+  }
+
+  // Publish the non-empty hash sequence to the local hash tree.
+  if (OutlinerMode == CGDataMode::Write) {
+    StableHashAttempts++;
+    if (!OutlinedHashSequence.empty())
+      LocalHashTree->insert({OutlinedHashSequence, CandSize});
----------------
kyulee-com wrote:

We create an outlining candidate each time a sequence of instructions matches in the global hash tree. These candidates may overlap. As shown in the `outline` function (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineOutliner.cpp#L837-L868), we sort these candidates and process them sequentially, discarding any that overlap with previously processed ones. Initially, our global outlining implementation did not utilize this count information. However, we discovered that prioritizing candidates based on their frequency significantly improves size efficiency in real-world workloads.

Although it's theoretically possible to globally order all candidates using the hash tree, we can't access to other modules to determine the best outlining sequence, as modules are fundamentally independent. In this PR, we populate candidates that appear in the current module, match them in the global hash tree, and then order these local instances based on a global heuristic. While this may not be optimal, it has proven effective in our tests.

For more details, you can refer to a unit test related to candidate prioritization in this PR: `llvm/test/CodeGen/AArch64/cgdata-read-priority.ll`

https://github.com/llvm/llvm-project/pull/90074