[all-commits] [llvm/llvm-project] a7629a: [CSSPGO] Fix MSVC initializing truncation warning ...

ictwanglei via All-commits all-commits at lists.llvm.org
Fri Feb 19 21:28:47 PST 2021


  Branch: refs/heads/release/12.x
  Home:   https://github.com/llvm/llvm-project
  Commit: a7629a2244a325b908ddbd4336aef25a7049bda9
      https://github.com/llvm/llvm-project/commit/a7629a2244a325b908ddbd4336aef25a7049bda9
  Author: Yang Fan <nullptr.cpp at gmail.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h

  Log Message:
  -----------
  [CSSPGO] Fix MSVC initializing truncation warning (NFC)

MSVC warning:
```
\llvm-project\llvm\include\llvm\Transforms\IPO\SampleProfileProbe.h(65): warning C4305: 'initializing': truncation from 'double' to 'const float'
```


  Commit: 78b35e278a9f62c2a6cfe3c974155a7e9bb60361
      https://github.com/llvm/llvm-project/commit/78b35e278a9f62c2a6cfe3c974155a7e9bb60361
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
    M llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h
    M llvm/tools/llvm-profgen/ProfiledBinary.h
    M llvm/tools/llvm-profgen/PseudoProbe.cpp
    M llvm/tools/llvm-profgen/PseudoProbe.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation

This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**Implementation**

- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test

Test Plan:

ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92998


  Commit: 6209b0756d5df805f6279d3dadc8d2ba8648c3eb
      https://github.com/llvm/llvm-project/commit/6209b0756d5df805f6279d3dadc8d2ba8648c3eb
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-noprobe.perfbin
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-noprobe.perfscript
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-pseudoprobe.perfbin
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-pseudoprobe.perfscript
    A llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    A llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h
    M llvm/tools/llvm-profgen/PseudoProbe.cpp
    M llvm/tools/llvm-profgen/PseudoProbe.h
    M llvm/unittests/tools/CMakeLists.txt
    A llvm/unittests/tools/llvm-profgen/CMakeLists.txt
    A llvm/unittests/tools/llvm-profgen/ContextCompressionTest.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Compress recursive cycles in calling context

This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556


  Commit: e562ff08f634d814c1cd1e65e3428ca5308d3022
      https://github.com/llvm/llvm-project/commit/e562ff08f634d814c1cd1e65e3428ca5308d3022
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/PerfReader.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110


  Commit: 87c27020cc6466ae33550f1f1f55d5989afaca2e
      https://github.com/llvm/llvm-project/commit/87c27020cc6466ae33550f1f1f55d5989afaca2e
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
    A llvm/test/tools/llvm-profgen/merge-cold-profile.test
    M llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
    M llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    M llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Merge and trim profile for cold context to reduce profile size

This change allows merging and trimming cold context profile in llvm-profgen to solve profile size bloat problem. Currently when the profile's total sample is below threshold(supported by a switch), it will be considered cold and merged into a base context-less profile, which will at least keep the profile quality as good as the baseline(non-cs).

For example, two input profiles:
 [main @ foo @ bar]:60
 [main @ bar]:50
Under threshold = 100, the two profiles will be merge into one with the base context, get result:
 [bar]:110

Added two switches:
`--csprof-cold-thres=<value>`: Specified the total samples threshold for a context profile to be considered cold, with 100 being the default. Any cold context profiles will be merged into context-less base profile by default.
`--csprof-keep-cold`: Force profile generation to keep cold context profiles instead of dropping them. By default, any cold context will not be written to output profile.

Results:
Though not yet evaluating it with the latest CSSPGO, our internal branch shows neutral on performance but significantly reduce the profile size. Detailed evaluation on llvm-profgen with CSSPGO will come later.

Differential Revision: https://reviews.llvm.org/D94111


  Commit: db88d92217f185d9ab5b8f0a0eddc5dc9ad30659
      https://github.com/llvm/llvm-project/commit/db88d92217f185d9ab5b8f0a0eddc5dc9ad30659
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript
    M llvm/tools/llvm-profgen/PerfReader.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Fix bug with parsing hybrid sample trace line

when we skip the call stack starting with an external address, we should also skip the bottom LBR entry, otherwise it will cause a truncated context issue.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D95480


  Commit: 10712791a9affbea8e6fa474d8a857ea6dfbb955
      https://github.com/llvm/llvm-project/commit/10712791a9affbea8e6fa474d8a857ea6dfbb955
  Author: Wenlei He <aktoon at gmail.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/include/llvm/ProfileData/ProfileCommon.h
    M llvm/lib/ProfileData/ProfileSummaryBuilder.cpp
    M llvm/lib/ProfileData/SampleProfReader.cpp
    M llvm/lib/ProfileData/SampleProfWriter.cpp
    M llvm/test/Transforms/SampleProfile/csspgo-inline.ll
    A llvm/test/Transforms/SampleProfile/csspgo-summary.ll

  Log Message:
  -----------
  [CSSPGO] Use merged base profile for hot threshold calculation

Context-sensitive profile effectively split a function profile into many copies each representing the CFG profile of a particular calling context. That makes the count distribution looks more flat as we now have more function profiles each with lower counts, which in turn leads to lower hot thresholds. Now we tells threshold computation to merge context profile first before calculating percentile based cutoffs to compensate for seemingly flat context profile. This can be controlled by swtich `sample-profile-contextless-threshold`.

Earlier measurement showed ~0.4% perf boost with this tuning on spec2k6 for CSSPGO (with pseudo-probe and new inliner).

Differential Revision: https://reviews.llvm.org/D95980


  Commit: e8e45f52d0a8268fe3ee2a3a2afc80bc10a47280
      https://github.com/llvm/llvm-project/commit/e8e45f52d0a8268fe3ee2a3a2afc80bc10a47280
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/include/llvm/CodeGen/MachineInstr.h
    M llvm/include/llvm/IR/Instruction.h
    M llvm/lib/CodeGen/LiveRangeShrink.cpp
    M llvm/lib/CodeGen/MachineInstr.cpp
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    M llvm/lib/CodeGen/StackProtector.cpp
    M llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
    M llvm/lib/IR/Instruction.cpp
    M llvm/lib/Transforms/IPO/FunctionAttrs.cpp
    M llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
    M llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
    A llvm/test/Transforms/SampleProfile/pseudo-probe-instcombine.ll
    A llvm/test/Transforms/SampleProfile/pseudo-probe-instsched.ll
    A llvm/test/Transforms/SampleProfile/pseudo-probe-peep.ll
    A llvm/test/Transforms/SampleProfile/pseudo-probe-twoaddr.ll

  Log Message:
  -----------
  [CSSPGO] Unblock optimizations with pseudo probe instrumentation.

The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982


  Commit: 1a5bb1e4d540303554c0e891389f699956e5e03b
      https://github.com/llvm/llvm-project/commit/1a5bb1e4d540303554c0e891389f699956e5e03b
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/Transforms/SampleProfile/pseudo-probe-instsched.ll
    M llvm/test/Transforms/SampleProfile/pseudo-probe-peep.ll
    M llvm/test/Transforms/SampleProfile/pseudo-probe-twoaddr.ll

  Log Message:
  -----------
  [CSSPGO] Restrict pseudo probe tests to x86_64 only.


  Commit: 1f5e2016be9a01e4294dcdd10b3c7b03826b26a1
      https://github.com/llvm/llvm-project/commit/1f5e2016be9a01e4294dcdd10b3c7b03826b26a1
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
    M llvm/lib/Transforms/IPO/SampleContextTracker.cpp
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    A llvm/test/Transforms/SampleProfile/Inputs/profile-context-order.prof
    A llvm/test/Transforms/SampleProfile/Inputs/profile-topdown-order.prof
    A llvm/test/Transforms/SampleProfile/profile-context-order.ll
    A llvm/test/Transforms/SampleProfile/profile-topdown-order.ll

  Log Message:
  -----------
  [CSSPGO] Process functions in a top-down order on a dynamic call graph.

Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

The profile top-down order for SCC is also extended to support non-CS profiles.

Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95988


  Commit: 989b5c9571922ddfecae78a1351d0c801bfbf97b
      https://github.com/llvm/llvm-project/commit/989b5c9571922ddfecae78a1351d0c801bfbf97b
  Author: Hongtao Yu <hoy at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/Transforms/SampleProfile/profile-context-order.ll

  Log Message:
  -----------
  Remove test code that cause MSAN failure.

Summary:
The negative test (with the feature being added disabled) caused MSAN failure and that's the added feature is supposed to fix. Therefore the negative test code is being removed.


  Commit: beb80ffee6a1a816cfeb4047926f412c1a2456d9
      https://github.com/llvm/llvm-project/commit/beb80ffee6a1a816cfeb4047926f412c1a2456d9
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/lib/ProfileData/SampleProfWriter.cpp
    A llvm/test/tools/llvm-profgen/cs-extbinary.test
    M llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    M llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Add brackets for context id to support extended binary format

To align with https://reviews.llvm.org/D95547, we need to add brackets for context id before initializing the `SampleContext`.

Also added test cases for extended binary format from llvm-profgen side.

Differential Revision: https://reviews.llvm.org/D95929


  Commit: 66873fb695370f5bd333e327ec77e4710c7891c2
      https://github.com/llvm/llvm-project/commit/66873fb695370f5bd333e327ec77e4710c7891c2
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/tools/llvm-profgen/disassemble.s
    A llvm/test/tools/llvm-profgen/invalid-perfscript.test
    M llvm/test/tools/llvm-profgen/pseudoprobe-decoding.test
    M llvm/test/tools/llvm-profgen/symbolize.ll
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/PerfReader.h
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/llvm-profgen.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Renovate perfscript check and command line input validation

This include some changes related with PerfReader's the input check and command line change:

1) It appears there might be thousands of leading MMAP-Event line in the perfscript for large workload. For this case, the 4k threshold is not eligible to determine it's a hybrid sample. This change renovated the `isHybridPerfScript` by going through the script without threshold limitation checking whether there is a non-empty call stack immediately followed by a LBR sample. It will stop once it find a valid one.

2) Added several input validations for the command line switches in PerfReader.

3) Changed the command line `show-disassembly` to `show-disassembly-only`, it will print to stdout and exit early which leave an empty output profile.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D96387


  Commit: 610b51c04d3ca6b58555fa30ae52ad9762f9cf86
      https://github.com/llvm/llvm-project/commit/610b51c04d3ca6b58555fa30ae52ad9762f9cf86
  Author: wlei <wlei at fb.com>
  Date:   2021-02-19 (Fri, 19 Feb 2021)

  Changed paths:
    M llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
    M llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Filter out the instructions without location info for symbolizer

It appears some instructions doesn't have the debug location info and the symbolizer will return an empty call stack for them which will cause some crash later in profile unwinding. Actually we do not record the sample info for them, so this change just filter out those instruction.

As those instruction would appears at the begin and end of the instruction list, without them we need to add the boundary check for IP `advance` and `backward`.

Also for pseudo probe based profile, we actually don't need the symbolized location info, so here just change to use an empty stack for it. This could save half of the binary loading time.

Differential Revision: https://reviews.llvm.org/D96434


Compare: https://github.com/llvm/llvm-project/compare/d3f9f512a47f...610b51c04d3c


More information about the All-commits mailing list