[PATCH] D102007: [CSSPGO] Fix return value of getProbeWeight

Fri May 7 21:33:32 PDT 2021

wenlei added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:566
   if (!FS)
-    return std::error_code();
+    return 0;

----------------
wmi wrote:
> hoy wrote:
> > wenlei wrote:
> > > Given it more thought, perhaps whether this is 0 or error_code is subject to something like `profile-sample-accurate`? I.e. with that flag, it means cold, otherwise it means unknown (error_code)?
> > > 
> > > The default behavior for `getInstWeight` the AutoFDO counterpart is returning error_code here. We don't have to do it now, but may be both AutoFDO and CSSPGO can check `profile-sample-accurate` here to decide between 0 or error_code? @wmi what do you think about AutoFDO?
> > Good point. Was thinking about this. When it comes here, `Inst` should be mostly from a function that has a profile, like
> > 
> > - If it is from a top-level inliner, the inliner should have a profile, otherwise it wouldn't be compiled.
> > - If it is from an inlinee, the inlinee should have a profile, otherwise it wouldn't be inlined (see `SampleProfileLoader::getInlineCandidate`).
> > 
> > 
> > 
> > 
> Currently profile-sample-accurate is only used on function profile, i.e., if a function profile is not present, and if profile-sample-accurate is true, compiler will set the function entry count to be zero. It is a balance between best performance/smaller binary size and performance stableness. In google currently some targets caring about size more than performance uses it since it will generate smaller binary. 
> 
> If we do the same for instruction level profile, it will make the target using the flag to be more subject to source drift issue. I am not sure whether it will decrease the usability of the flag -- some targets currently using it can get their expected balance between binary size and performance stableness, and the balance may be broken if we introduce more performance unstableness to it.  
I was hoping that by being more stringent on profile and trust zero counts to be zero, we would be able to save size and at the same time improve performance when profile is accurate. 

>From your experience, does profile-sample-accurate often lead to worse performance along with smaller size? If so, that is mostly because of source drift, right? If profile is refreshed very often, or used in instr. PGO like two step builds, it should not lead to worse performance.. 

I think we can leave it as is, or perhaps experiment with an additional switch for profile-sample-block-accurate for block level counts.. in order to avoid affecting existing usage. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102007/new/

https://reviews.llvm.org/D102007