[PATCH] D146661: [BOLT] v1 stale profile matching

Sergey Pupyrev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 22 14:45:05 PDT 2023


spupyrev created this revision.
Herald added a reviewer: rafauler.
Herald added subscribers: treapster, ayermolo.
Herald added a reviewer: Amir.
Herald added a reviewer: maksfb.
Herald added a project: All.
spupyrev edited the summary of this revision.
Herald added a subscriber: wenlei.
spupyrev published this revision for review.
Herald added subscribers: llvm-commits, yota9.
Herald added a project: LLVM.

This is a first "serious" version of stale profile matching in BOLT. This diff
extends the hash computation for basic blocks so that we can apply a fuzzy
hash-based matching. The idea is to compute several "versions" of a hash value
for a basic block. A loose version of a hash (computed by ignoring instruction
operands) allows to match blocks in functions whose content has been changed, 
while stricter hash values (considering instruction opcodes with operands and
even based on hashes of block's successors/predecessors) allow to resolve
collisions. In order to save space and build time, individual hash components
are blended into a single uint64_t.
There are likely numerous ways of improving hash computation but already this
simple variant provides significant perf benefits.

**Perf testing** on the clang binary: collecting data on clang-10 and using it
to optimize clang-11 (with ~1 year of commits in between). Next, we compare

- //stale_clang// (clang-11 optimized with profile collected on clang-10 with **infer-stale-profile=0**)
- //opt_clang// (clang-11 optimized with profile collected on clang-11)
- //infer_clang// (clang-11 optimized with profile collected on clang-10 with **infer-stale-profile=1**)

`LTO-only` mode:
//stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002]
(That is, there is a ~9.5% perf regression)
//infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702]
(That is, the regression is reduced to ~2%)
Related BOLT logs:

  BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples
  BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples

`LTO+AutoFDO` mode:
//stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002]
//infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002]
Related BOLT logs:

  BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples
  BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D146661

Files:
  bolt/lib/Profile/StaleProfileMatching.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D146661.507509.patch
Type: text/x-patch
Size: 7703 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230322/4c9a8650/attachment.bin>


More information about the llvm-commits mailing list