[PATCH] D100993: [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Wenlei He via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 21 16:25:03 PDT 2021
wenlei added inline comments.
================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:845
+/// \param SumOrigin Original sum of target counts for indirect call.
+/// \param Sum Prorated sum of target counts for indirect call.
/// \param InlinedCallSite Output vector for new call sites exposed after
----------------
hoy wrote:
> wenlei wrote:
> > nit:
> >
> > SumOrigin -> Original sum of target counts for indirect call before promoting given candidate.
> > Sum -> Sum of remaining target counts for indirect call after promoting given candidate.
> >
> > There's no prorated sum - the Sum is adjust using subtraction
> >
> > > Sum -= Candidate.CallsiteCount;
> `Sum` is actually passed in as prorated. `Candidate.CallsiteCount` is also prorated. Maybe "Prorated sum of remaining target counts for indirect call after promoting given candidate"?
Ah, I see. You're right, sorry for the confusion.
================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:835
+ // excluding promoted targets.
+ if (FullSum)
+ if (Optional<PseudoProbe> Probe = extractProbe(Inst))
----------------
hoy wrote:
> wenlei wrote:
> > This makes sure that the count for the indirect call instruction accurately reflects the counts for remaining indirect call. However I thought later PGO ICP pass relies on the value profiles we put onto metadata, so even if the count for that icall instruction is a bit off, that ICP should not be affected. Is that correct?
> >
> > Is the problem here more about pre-link sample loader ICP screw up distribution factor so post-link sample loader ICP sees a smaller count for the icall instruction which blocks the post-link sample loader ICP? In this case, if we look at call site target counts from profile, with promoted targets excluded, we would arrive at the correct sum even without distribution factor? Relying on distribution could get us the same result, but using prorating to mimic subtraction is a bit weird..
> >
> > I can see how accurate distribution factor helps count quality in general though - it would make counts for icall more consistent before running inference. On that, how do we make sure promoted calls get correct counts for inference input though?
> Yes, this is supposed to make sure that the indirect call and its enclosing block will have a correct count during sample annotation, but looks like doing it here is late. I just realized this is after `computeAndPropagateWeights`. Ideally, the distribution factor should be updated before `computeAndPropagateWeights` and the call target samples should be adjusted with the pre-updated factor. That means `updateIDTMetaData` should be done before `computeAndPropagateWeights`.
>
> The prelink problem doesn't need this to be fixed. We have the pseudo probe update pass that updates distribution factor based on BFI to handle that.
> Ideally, the distribution factor should be updated before computeAndPropagateWeights and the call target samples should be adjusted with the pre-updated factor.
Agreed, that ensures good input for inference.
> That means updateIDTMetaData should be done before computeAndPropagateWeights.
I think that update is meant for value profile metadata, which is orthogonal to computeAndPropagateWeights. In fact, we always set metadata after inference is done. Is it possible to decouple the distribution factor setting from metadata update?
> The prelink problem doesn't need this to be fixed. We have the pseudo probe update pass that updates distribution factor based on BFI to handle that.
Ok, then does this cause immediate problem for ICP (does the fix generate good result for xalancbmk without setting icp threshold to 2)?
> On that, how do we make sure promoted calls get correct counts for inference input though?
Taking another look, line 891 of sampleprofile.cpp took care of this?
================
Comment at: llvm/test/Transforms/SampleProfile/pseudo-probe-icp-factor.ll:261
+;; factors, i.e, roughly 11259 * 0.95 * 0.79 = 8444.
+; CHECK: ![[#PROF]] = !{!"VP", i32 0, i64 8444, i64 7546896869197086323, i64 -1, i64 -2012135647395072713, i64 8444}
----------------
hoy wrote:
> wenlei wrote:
> > What was the value before this fix? Is that much smaller than 0.95?
> It was 0.04 * 11259 = 450, where 0.04 was the distribution factor fixed up earlier in inlining time (line 255).
So with this fix, the distribution factor we see after sample loader changed from 0.04 to 0.95, correct?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100993/new/
https://reviews.llvm.org/D100993
More information about the llvm-commits
mailing list