[llvm] [SampleFDO] Improve stale profile matching by diff algorithm (PR #87375)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 26 20:50:49 PDT 2024
================
@@ -19,6 +19,57 @@
namespace llvm {
+// Callsite location based matching anchor.
+struct Anchor {
+ LineLocation Loc;
+ FunctionId FuncId;
+
+ Anchor(const LineLocation &Loc, const FunctionId &FuncId)
+ : Loc(Loc), FuncId(FuncId) {}
+ bool operator==(const Anchor &Other) const {
+ return this->FuncId == Other.FuncId;
+ }
+};
+
+// This class implements the Myers diff algorithm used for stale profile
+// matching. The algorithm provides a simple and efficient way to find the
+// Longest Common Subsequence(LCS) or the Shortest Edit Script(SES) of two
+// sequences. For more details, refer to the paper 'An O(ND) Difference
+// Algorithm and Its Variations' by Eugene W. Myers.
+// In the scenario of profile fuzzy matching, the two sequences are the IR
+// callsite anchors and profile callsite anchors. The subsequence equivalent
+// parts from the resulting SES are used to remap the IR locations to the
+// profile locations.
+class MyersDiff {
+public:
+ struct DiffResult {
+ LocToLocMap EqualLocations;
+#ifndef NDEBUG
+ // New IR locations that are inserted in the new version.
+ std::vector<LineLocation> Insertions;
+ // Old Profile locations that are deleted in the new version.
+ std::vector<LineLocation> Deletions;
+#endif
+ void addEqualLocations(const LineLocation &IRLoc,
+ const LineLocation &ProfLoc) {
+ EqualLocations.insert({IRLoc, ProfLoc});
+ }
+#ifndef NDEBUG
+ void addInsertion(const LineLocation &IRLoc) {
+ Insertions.push_back(IRLoc);
+ }
+ void addDeletion(const LineLocation &ProfLoc) {
+ Deletions.push_back(ProfLoc);
+ }
+#endif
+ };
+
+ // The basic greedy version of Myers's algorithm. Refer to page 6 of the
+ // original paper.
+ DiffResult longestCommonSequence(const std::vector<Anchor> &A,
----------------
WenleiHe wrote:
It would simpler if LCS just return a "CommonSequence" as represented by `LocToLocMap`.
The abstraction of DiffResult and MyersDiff seem a bit unnecessary. A standalone function `LocToLocMap longestCommonSequence(..)` would be cleaner I think.
https://github.com/llvm/llvm-project/pull/87375
More information about the llvm-commits
mailing list