[PATCH] D100993: [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Hongtao Yu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 21 18:12:15 PDT 2021
hoy added inline comments.
================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:835
+ // excluding promoted targets.
+ if (FullSum)
+ if (Optional<PseudoProbe> Probe = extractProbe(Inst))
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > This makes sure that the count for the indirect call instruction accurately reflects the counts for remaining indirect call. However I thought later PGO ICP pass relies on the value profiles we put onto metadata, so even if the count for that icall instruction is a bit off, that ICP should not be affected. Is that correct?
> > >
> > > Is the problem here more about pre-link sample loader ICP screw up distribution factor so post-link sample loader ICP sees a smaller count for the icall instruction which blocks the post-link sample loader ICP? In this case, if we look at call site target counts from profile, with promoted targets excluded, we would arrive at the correct sum even without distribution factor? Relying on distribution could get us the same result, but using prorating to mimic subtraction is a bit weird..
> > >
> > > I can see how accurate distribution factor helps count quality in general though - it would make counts for icall more consistent before running inference. On that, how do we make sure promoted calls get correct counts for inference input though?
> > Yes, this is supposed to make sure that the indirect call and its enclosing block will have a correct count during sample annotation, but looks like doing it here is late. I just realized this is after `computeAndPropagateWeights`. Ideally, the distribution factor should be updated before `computeAndPropagateWeights` and the call target samples should be adjusted with the pre-updated factor. That means `updateIDTMetaData` should be done before `computeAndPropagateWeights`.
> >
> > The prelink problem doesn't need this to be fixed. We have the pseudo probe update pass that updates distribution factor based on BFI to handle that.
> > Ideally, the distribution factor should be updated before computeAndPropagateWeights and the call target samples should be adjusted with the pre-updated factor.
>
> Agreed, that ensures good input for inference.
>
> > That means updateIDTMetaData should be done before computeAndPropagateWeights.
>
> I think that update is meant for value profile metadata, which is orthogonal to computeAndPropagateWeights. In fact, we always set metadata after inference is done. Is it possible to decouple the distribution factor setting from metadata update?
>
> > The prelink problem doesn't need this to be fixed. We have the pseudo probe update pass that updates distribution factor based on BFI to handle that.
>
> Ok, then does this cause immediate problem for ICP (does the fix generate good result for xalancbmk without setting icp threshold to 2)?
>
> > On that, how do we make sure promoted calls get correct counts for inference input though?
>
> Taking another look, line 891 of sampleprofile.cpp took care of this?
>
>
> Is it possible to decouple the distribution factor setting from metadata update?
They are coupled in this patch because the target counts from profile need to be prorated based on a pre-updated factor (say from prelink or inlining, but not ICP). The ICP-updated factor is the final one and is needed before `computeAndPropagateWeights`. Decoupling the ICP factor update from metadata update means we need to save the original factor somewhere, which can be possibly done using an intermediate table.
On the other hand, the indirect call metadata update may not rely on `computeAndPropagateWeights`, unlike the branch metdata update. But separating them seems a bigger change and less understandable.
> Ok, then does this cause immediate problem for ICP (does the fix generate good result for xalancbmk without setting icp threshold to 2)?
It causes less promotion with later PGO ICP since the early prorate eats the target count for non-promoted targets. I'm seeing this helps some benchmarks like perlbench and gcc, but regresses other benchmarks like h264. I guess that's due to the heuristics of PGO ICP. Overall no change in geomean.
> Taking another look, line 891 of sampleprofile.cpp took care of this?
Yes.
================
Comment at: llvm/test/Transforms/SampleProfile/pseudo-probe-icp-factor.ll:261
+;; factors, i.e, roughly 11259 * 0.95 * 0.79 = 8444.
+; CHECK: ![[#PROF]] = !{!"VP", i32 0, i64 8444, i64 7546896869197086323, i64 -1, i64 -2012135647395072713, i64 8444}
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > What was the value before this fix? Is that much smaller than 0.95?
> > It was 0.04 * 11259 = 450, where 0.04 was the distribution factor fixed up earlier in inlining time (line 255).
> So with this fix, the distribution factor we see after sample loader changed from 0.04 to 0.95, correct?
The distribution factor is still 0.04, but the call target count for bar is 8444 instead of 450.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100993/new/
https://reviews.llvm.org/D100993
More information about the llvm-commits
mailing list