[llvm] [SampleFDO] Improve stale profile matching by diff algorithm (PR #87375)

via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 26 20:50:49 PDT 2024


================
@@ -19,6 +19,57 @@
 
 namespace llvm {
 
+// Callsite location based matching anchor.
+struct Anchor {
+  LineLocation Loc;
+  FunctionId FuncId;
+
+  Anchor(const LineLocation &Loc, const FunctionId &FuncId)
+      : Loc(Loc), FuncId(FuncId) {}
+  bool operator==(const Anchor &Other) const {
+    return this->FuncId == Other.FuncId;
+  }
+};
+
+// This class implements the Myers diff algorithm used for stale profile
+// matching. The algorithm provides a simple and efficient way to find the
+// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
+// sequences. For more details, refer to the paper 'An O(ND) Difference
+// Algorithm and Its Variations' by Eugene W. Myers.
+// In the scenario of profile fuzzy matching, the two sequences are the IR
+// callsite anchors and profile callsite anchors. The subsequence equivalent
+// parts from the resulting SES are used to remap the IR locations to the
+// profile locations.
+class MyersDiff {
+public:
+  struct DiffResult {
+    LocToLocMap EqualLocations;
+#ifndef NDEBUG
+    // New IR locations that are inserted in the new version.
+    std::vector<LineLocation> Insertions;
+    // Old Profile locations that are deleted in the new version.
+    std::vector<LineLocation> Deletions;
+#endif
+    void addEqualLocations(const LineLocation &IRLoc,
+                           const LineLocation &ProfLoc) {
+      EqualLocations.insert({IRLoc, ProfLoc});
+    }
+#ifndef NDEBUG
+    void addInsertion(const LineLocation &IRLoc) {
+      Insertions.push_back(IRLoc);
+    }
+    void addDeletion(const LineLocation &ProfLoc) {
+      Deletions.push_back(ProfLoc);
+    }
+#endif
+  };
+
+  // The basic greedy version of Myers's algorithm. Refer to page 6 of the
+  // original paper.
+  DiffResult longestCommonSequence(const std::vector<Anchor> &A,
----------------
WenleiHe wrote:

It would simpler if LCS just return a "CommonSequence" as represented by `LocToLocMap`.

The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function `LocToLocMap longestCommonSequence(..)` would be cleaner I think. 

https://github.com/llvm/llvm-project/pull/87375


More information about the llvm-commits mailing list