[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

Mingming Liu via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 26 15:11:09 PDT 2024


================
@@ -322,14 +796,133 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
     if (!NumCandidates ||
         (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
       continue;
+
     auto PromotionCandidates = getPromotionCandidatesForCallSite(
         *CB, ICallProfDataRef, TotalCount, NumCandidates);
-    Changed |= tryToPromoteWithFuncCmp(*CB, PromotionCandidates, TotalCount,
-                                       ICallProfDataRef, NumCandidates);
+
+    VTableGUIDCountsMap VTableGUIDCounts;
+    Instruction *VPtr =
+        computeVTableInfos(CB, VTableGUIDCounts, PromotionCandidates);
+
+    if (isProfitableToCompareVTables(*CB, PromotionCandidates, TotalCount))
+      Changed |= tryToPromoteWithVTableCmp(*CB, VPtr, PromotionCandidates,
+                                           TotalCount, NumCandidates,
+                                           ICallProfDataRef, VTableGUIDCounts);
+    else
+      Changed |= tryToPromoteWithFuncCmp(*CB, VPtr, PromotionCandidates,
+                                         TotalCount, ICallProfDataRef,
+                                         NumCandidates, VTableGUIDCounts);
   }
   return Changed;
 }
 
+// TODO: Returns false if the function addressing and vtable load instructions
+// cannot sink to indirect fallback.
+bool IndirectCallPromoter::isProfitableToCompareVTables(
+    const CallBase &CB, const std::vector<PromotionCandidate> &Candidates,
+    uint64_t TotalCount) {
+  if (!EnableVTableProfileUse || Candidates.empty())
+    return false;
+  uint64_t RemainingVTableCount = TotalCount;
+  const size_t CandidateSize = Candidates.size();
+  for (size_t I = 0; I < CandidateSize; I++) {
+    auto &Candidate = Candidates[I];
+    uint64_t CandidateVTableCount = 0;
+    for (auto &[GUID, Count] : Candidate.VTableGUIDAndCounts)
+      CandidateVTableCount += Count;
+
+    if (CandidateVTableCount < Candidate.Count * ICPVTablePercentageThreshold) {
+      LLVM_DEBUG(dbgs() << "For callsite #" << NumOfPGOICallsites << CB << I
+                        << "-th candidate, function count " << Candidate.Count
+                        << " and its vtable count " << CandidateVTableCount
+                        << " have discrepancies\n");
+      return false;
+    }
+
+    RemainingVTableCount -= Candidate.Count;
+
+    // 'MaxNumVTable' limits the number of vtables to make vtable comparison
+    // profitable. Comparing multiple vtables for one function candidate will
+    // insert additional instructions on the hot path, and allowing more than
+    // one vtable for non last candidates may or may not elongates dependency
+    // chain for the subsequent candidates. Set its value to 1 for non-last
+    // candidate and allow option to override it for the last candidate.
+    int MaxNumVTable = 1;
+    if (I == CandidateSize - 1)
+      MaxNumVTable = ICPMaxNumVTableLastCandidate;
+
+    if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {
----------------
minglotus-6 wrote:

done by `LLVM_DEBUG`. 
* Didn't do missed-opt remark mainly because function comparison might be applied when vtable comparison is not profitable, and readers need to join missed vtable message and applied function one to make sense of the remark messages.

Now `opt llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -debug -S` gives the following log


```
Work on callsite   call void %1(ptr %d), !prof !10 Num_targets: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078
 
Work on callsite #0  call void %1(ptr %d), !prof !10 Num_targets: 3 Num_candidates: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078

Computing vtable infos for callsite #1
  Cannot find vtable definition for 12345678; maybe the vtable isn't imported

Evaluating vtable profitability for callsite #1  call void %1(ptr %d), !prof !10
  Candidate 0 FunctionCount: 600, VTableCounts: {Derived1, 600}
  Candidate 1 FunctionCount: 500, VTableCounts: {Derived2, 500}
  Candidate 2 FunctionCount: 400, VTableCounts: {Base1, 200} {Derived3, 200}
    allow at most 1 and got 2 vtables. Bail out for vtable comparison.
```

https://github.com/llvm/llvm-project/pull/81442


More information about the cfe-commits mailing list