<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:times new roman,serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 7, 2021 at 9:33 PM Wenlei He via Phabricator <<a href="mailto:reviews@reviews.llvm.org">reviews@reviews.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">wenlei added inline comments.<br>

<br>

<br>

================<br>

Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:566<br>

   if (!FS)<br>

-    return std::error_code();<br>

+    return 0;<br>

<br>

----------------<br>

wmi wrote:<br>

> hoy wrote:<br>

> > wenlei wrote:<br>

> > > Given it more thought, perhaps whether this is 0 or error_code is subject to something like `profile-sample-accurate`? I.e. with that flag, it means cold, otherwise it means unknown (error_code)?<br>

> > > <br>

> > > The default behavior for `getInstWeight` the AutoFDO counterpart is returning error_code here. We don't have to do it now, but may be both AutoFDO and CSSPGO can check `profile-sample-accurate` here to decide between 0 or error_code? @wmi what do you think about AutoFDO?<br>

> > Good point. Was thinking about this. When it comes here, `Inst` should be mostly from a function that has a profile, like<br>

> > <br>

> > - If it is from a top-level inliner, the inliner should have a profile, otherwise it wouldn't be compiled.<br>

> > - If it is from an inlinee, the inlinee should have a profile, otherwise it wouldn't be inlined (see `SampleProfileLoader::getInlineCandidate`).<br>

> > <br>

> > <br>

> > <br>

> > <br>

> Currently profile-sample-accurate is only used on function profile, i.e., if a function profile is not present, and if profile-sample-accurate is true, compiler will set the function entry count to be zero. It is a balance between best performance/smaller binary size and performance stableness. In google currently some targets caring about size more than performance uses it since it will generate smaller binary. <br>

> <br>

> If we do the same for instruction level profile, it will make the target using the flag to be more subject to source drift issue. I am not sure whether it will decrease the usability of the flag -- some targets currently using it can get their expected balance between binary size and performance stableness, and the balance may be broken if we introduce more performance unstableness to it.  <br>

I was hoping that by being more stringent on profile and trust zero counts to be zero, we would be able to save size and at the same time improve performance when profile is accurate.</blockquote><div><br></div><div><div class="gmail_default" style="font-family:"times new roman",serif">   That is right. More cold callsites can lead to less inlining and smaller size. </div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>

<br>

>From your experience, does profile-sample-accurate often lead to worse performance along with smaller size? If so, that is mostly because of source drift, right? If profile is refreshed very often, or used in instr. PGO like two step builds, it should not lead to worse performance..</blockquote><div><br></div><div><div class="gmail_default" style="font-family:"times new roman",serif"> profile-sample-accurate will lead to better performance and smaller size if the profile is accurate, but more unstableness (right, from source drift).   </div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>

<br>

I think we can leave it as is, or perhaps experiment with an additional switch for profile-sample-block-accurate for block level counts.. in order to avoid affecting existing usage.</blockquote><div><br></div><div><div class="gmail_default" style="font-family:"times new roman",serif"> Agree, we can experiment it and see how much performance and size headroom it provides. If the benefit is substantial, we can provide an additional switch for the users who want the benefit and can tolerate the performance unstableness. </div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>

<br>

<br>

Repository:<br>

  rG LLVM Github Monorepo<br>

<br>

CHANGES SINCE LAST ACTION<br>

  <a href="https://reviews.llvm.org/D102007/new/" rel="noreferrer" target="_blank">https://reviews.llvm.org/D102007/new/</a><br>

<br>

<a href="https://reviews.llvm.org/D102007" rel="noreferrer" target="_blank">https://reviews.llvm.org/D102007</a><br>

<br>

</blockquote></div></div>