[PATCH] D158889: [AsmPrinter][PGO] Adds optional dumping of branch probabilities for PGO metrics.

Fri Sep 1 12:47:12 PDT 2023

red1bluelost added a comment.

In D158889#4633333 <https://reviews.llvm.org/D158889#4633333>, @wenlei wrote:

> We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

Micrea makes a good point about it being easy for a pass to corrupt PGO information the same way it can corrupt debug information. Having a tool to track or debug PGO info like branch_weights, even if something different than this patch, seems beneficial for upstream.

In D158889#4633333 <https://reviews.llvm.org/D158889#4633333>, @wenlei wrote:

> 1. Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

I'll try to look into this.

In D158889#4633333 <https://reviews.llvm.org/D158889#4633333>, @wenlei wrote:

> 2. Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

For the metrics, we need an execution branch trace for comparison, Pin tool we've found works for x86 and we have an tool for one of our internal targets. We hope to keep the tracing part minimally coupled to the compiler metadata.

Makes sense to add general support for extracting the section info. I can try to look into it.

In D158889#4633353 <https://reviews.llvm.org/D158889#4633353>, @mtrofin wrote:

> Even if relatively few groups would build up on such information, arguably PGO (as a technique) is probably the most impactful tool we have for improving performance. Like I think we're all observing here, profiles aren't necessarily well maintained throughout passes - it's actually easy to write a pass that accidentally and silently drops it, or that corrupts it somehow. I think having primitives in llvm that can help anyone interested build validation tooling and detect and fix such bugs would end up helping the community way more than their cost.

Agreed that it is easy to corrupt. The team I'm with at MediaTek has found cases in LLVM passes and our backend where branch weights are mishandled or not updated which was a driving factor to developing metrics for it.

In D158889#4633353 <https://reviews.llvm.org/D158889#4633353>, @mtrofin wrote:

> Maybe we can even, eventually, have a layer of defense doing this on a build bot with e.g. llvm-test-suite benchmarks. (I have a rfc for doing some even simpler validation transparently as part of opt/llc, would send it after the long weekend)

Internally we are planning to track some of our benchmarks using these metrics since we've found its helpful for this source of change/regression tracking for our PGO data.

In D158889#4633353 <https://reviews.llvm.org/D158889#4633353>, @mtrofin wrote:

> Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

That makes sense. I can see about getting stuff written up. It will take some time. I'll also be giving a talk at the developer meeting in October on how we've been using the metrics and where we think it can help others.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158889/new/

https://reviews.llvm.org/D158889