[all-commits] [llvm/llvm-project] 5c2ae3: [CSSPGO][Preinliner] Trim cold call edges of the p...

Hongtao Yu via All-commits all-commits at lists.llvm.org
Tue Mar 28 16:25:16 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5c2ae37bbeff948540712bf02ac3eb5aa0b0d43a
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2023-03-28 (Tue, 28 Mar 2023)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
    M llvm/tools/llvm-profgen/CSPreInliner.cpp

  Log Message:
  [CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.

I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%.

The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that:
	- The topological sorting inside one SCC isn't quite right. This is fixed by {D130717}.
	- The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile.

Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner.

Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147013

More information about the All-commits mailing list