[all-commits] [llvm/llvm-project] bf317f: [CSSPGO] Sorting nodes in a cycle of profiled call...

Tue Nov 30 09:01:34 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: bf317f66989cac26e17b4cd16ab1c7bdfe73dbe0
      https://github.com/llvm/llvm-project/commit/bf317f66989cac26e17b4cd16ab1c7bdfe73dbe0
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2021-11-30 (Tue, 30 Nov 2021)

  Changed paths:
    M llvm/include/llvm/ADT/SCCIterator.h
    M llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    A llvm/test/Transforms/SampleProfile/Inputs/profile-context-order-scc.prof
    M llvm/test/Transforms/SampleProfile/profile-context-order.ll
    M llvm/tools/llvm-profgen/CSPreInliner.cpp

  Log Message:
  -----------
  [CSSPGO] Sorting nodes in a cycle of profiled call graph.

For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter.

Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D114204