[PATCH] D125266: [CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller.

Wed May 11 12:10:00 PDT 2022

hoy added inline comments.

================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:370
+        Count = NumSamples;
+      NumSamples -= Count;
+      CallTargets.erase(I);
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > The asymmetricity between removeCalledTarget and addCalledTarget can be confusing. We're reducing NumSamples in removeCalledTarget, but we are not adding removeCalledTarget in addCalledTarget. Two questions, 1) should NumSamples include target counts, and are we doing that consistently? 2) Any ways to make them the two functions consistent?
> > NumSamples and target counts are computed separately in llvm-profgen, one is based on lbr ranges while the other is based on branch samples. So they do not have any connection on the profile generator side.
> > 
> > I guess we can separate `removeSamples` out of `removeCalledTarget`  to be consistent. The same would also be needed on the FunctionSample side. Maybe just rename `removeCalledTarget` to `removeCalledTargetAndBodySamples`? For now they should always be reduced together.
> Ok, now I see that they're added separately. I'm wondering what is the definition of `NumSamples`. If the two don't have connection and `NumSamples` is samples for this location, and call target count is the count for specific call targets, then they should be changed independently. 
> 
> Here changing the call/callee representation from CallTargets to nested FunctionSamples shouldn't change the value of `NumSamples` because the we would still derived the same value from the same ranges?
By no connection I mean how they are generated in llvm-profgen and used in the compiler technically. But logically `NumSamples` should be the sum of all target counts. It reflects the the execution count of the callsite. Removing one target means the callsite won't be executed as many times as before. The reduced count will be instead reflected on another code path that promotes the indirect callsite. Annotating block frequency based on the original callsite count (due to the max operation) could be misleading.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125266/new/

https://reviews.llvm.org/D125266