[all-commits] [llvm/llvm-project] a98d6a: [SamplePGO] Stale profile matching(part 1)

Fri Apr 28 13:14:02 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: a98d6a11ea192756ee99dc2547d46599141977f2
      https://github.com/llvm/llvm-project/commit/a98d6a11ea192756ee99dc2547d46599141977f2
  Author: wlei <wlei at fb.com>
  Date:   2023-04-28 (Fri, 28 Apr 2023)

  Changed paths:
    M llvm/include/llvm/ProfileData/SampleProf.h
    M llvm/lib/ProfileData/SampleProf.cpp
    M llvm/lib/Transforms/IPO/SampleProfile.cpp

  Log Message:
  -----------
  [SamplePGO] Stale profile matching(part 1)

AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.

This patch is the part 1 of stale profile matching, summary of the implementation:
 - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile).
 - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`.
 - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location.
 - Added a new switch `--salvage-stale-profile`.
 - Some refactoring for the staleness detection.

Test case is in part 2 with the matching algorithm.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147456

  Commit: 892daede7201e67f78227d09969504a4b27c4e87
      https://github.com/llvm/llvm-project/commit/892daede7201e67f78227d09969504a4b27c4e87
  Author: wlei <wlei at fb.com>
  Date:   2023-04-28 (Fri, 28 Apr 2023)

  Changed paths:
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    A llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof
    A llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll

  Log Message:
  -----------
  [SamplePGO] Stale profile matching(part 2)

Part 2 of https://reviews.llvm.org/D147456
Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:
- Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
- Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.
**The input of the algorithm:**
- IR locations: the anchor is the callee name of direct callsite.
- Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`.
The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`.
```
// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }
```
IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`.
```
; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30
```
Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]`
**Matching heuristic:**
- Match all the anchors in lexical order first.
- Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.
So the example above is matched like:
```
   [1,    2(foo), 3,  5,  6(bar), 7]
    |     |       |   |     |     |
   [1, 2, 3(foo), 4,  7,  8(bar), 9]
```
3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`)

**Clang-self build benchmark: **
Current build version: clang-10
The profiled version:  clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).
1) Regression to using refresh profile with this off : -3.93%
2) Regression to using refresh profile with this on  : -1.1%
So this algorithm can recover ~72% of the regression.
**Internal(Meta) large-scale services.**
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

**Notes or future work:**
- Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
- The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
- Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
- This doesn't handle function name mismatch, we plan to solve it in future.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D147545

Compare: https://github.com/llvm/llvm-project/compare/dc275fd03254...892daede7201