[all-commits] [llvm/llvm-project] b3154d: [CSSPGO][llvm-profgen] Pseudo probe decoding and d...
ictwanglei via All-commits
all-commits at lists.llvm.org
Wed Jan 13 11:11:40 PST 2021
Branch: refs/heads/master
Home: https://github.com/llvm/llvm-project
Commit: b3154d11bc6dee59e581b731b7561f1ebab3aed6
https://github.com/llvm/llvm-project/commit/b3154d11bc6dee59e581b731b7561f1ebab3aed6
Author: wlei <wlei at fb.com>
Date: 2021-01-13 (Wed, 13 Jan 2021)
Changed paths:
A llvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfbin
A llvm/test/tools/llvm-profgen/pseudoprobe-decoding.test
M llvm/tools/llvm-profgen/CMakeLists.txt
M llvm/tools/llvm-profgen/ProfiledBinary.cpp
M llvm/tools/llvm-profgen/ProfiledBinary.h
A llvm/tools/llvm-profgen/PseudoProbe.cpp
A llvm/tools/llvm-profgen/PseudoProbe.h
Log Message:
-----------
[CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling
This change implements pseudo probe decoding and disassembling for llvm-profgen/CSSPGO. Please see https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**ELF section format**
Please see the encoding patch(https://reviews.llvm.org/D91878) for more details of the format, just copy the example here:
Two section(`.pseudo_probe_desc` and `.pseudoprobe` ) is emitted in ELF to support pseudo probe.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"", at progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
**Disassembling**
A switch `--show-pseudo-probe` is added to use along with `--show-disassembly` to print disassembly code with pseudo probe directives.
For example:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
**Implementation**
- `PseudoProbeDecoder` is added in ProfiledBinary as an infra for the decoding. It decoded the two section and generate two map: `GUIDProbeFunctionMap` stores all the `PseudoProbeFunction` which is the abstraction of a general function. `AddressProbesMap` stores all the pseudo probe info indexed by its address.
- All the inline info is encoded into binary as a trie(`PseudoProbeInlineTree`) and will be constructed from the decoding. Each pseudo probe can get its inline context(`getInlineContext`) by traversing its inline tree node backwards.
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92334
Commit: 414930b91bfd4196c457120932a1dbaf26db711d
https://github.com/llvm/llvm-project/commit/414930b91bfd4196c457120932a1dbaf26db711d
Author: wlei <wlei at fb.com>
Date: 2021-01-13 (Wed, 13 Jan 2021)
Changed paths:
M llvm/tools/llvm-profgen/PerfReader.cpp
M llvm/tools/llvm-profgen/PerfReader.h
M llvm/tools/llvm-profgen/ProfileGenerator.cpp
M llvm/tools/llvm-profgen/ProfileGenerator.h
Log Message:
-----------
[CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter
As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map. This change implements the hashable interface(`Hashable`) and the unified base class for them to have better extensibility and reusability.
Currently perf trace sample and sample counter with context implemented this `Hashable` and the class hierarchy is like:
```
| Hashable
| PerfSample
| HybridSample
| LBRSample
| ContextKey
| StringBasedCtxKey
| ProbeBasedCtxKey
| CallsiteBasedCtxKey
| ...
```
- Class specifying `Hashable` should implement `getHashCode` and `isEqual`. Here we make `getHashCode` a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding
- `isEqual` is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function.
- Added `PerfSample` and `ContextKey` as base class for perf sample and counter context key, leveraging llvm-style RTTI for this.
- Added `StringBasedCtxKey` class extending `ContextKey` to use string as context id.
- Refactor `AggregationCounter` to take all kinds of `PerfSample` as key
- Refactor `ContextSampleCounter` to take all kinds of `ContextKey` as key
- Other refactoring work:
- Create a wrapper class `SampleCounter` to wrap `RangeCounter` and `BranchCounter`
- Hoist `ContextId` and `FunctionProfile` out of `populateFunctionBodySamples` and `populateFunctionBoundarySamples` to reuse them in ProfileGenerator
Differential Revision: https://reviews.llvm.org/D92584
Commit: c681400b25a66ae56b74cc1f11ffdc15190a65b8
https://github.com/llvm/llvm-project/commit/c681400b25a66ae56b74cc1f11ffdc15190a65b8
Author: wlei <wlei at fb.com>
Date: 2021-01-13 (Wed, 13 Jan 2021)
Changed paths:
M llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript
A llvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfscript
M llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
A llvm/test/tools/llvm-profgen/Inputs/noinline-cs-pseudoprobe.perfbin
A llvm/test/tools/llvm-profgen/Inputs/noinline-cs-pseudoprobe.perfscript
A llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
A llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
M llvm/tools/llvm-profgen/PerfReader.cpp
M llvm/tools/llvm-profgen/PerfReader.h
M llvm/tools/llvm-profgen/ProfileGenerator.cpp
M llvm/tools/llvm-profgen/ProfileGenerator.h
M llvm/tools/llvm-profgen/ProfiledBinary.cpp
M llvm/tools/llvm-profgen/ProfiledBinary.h
M llvm/tools/llvm-profgen/PseudoProbe.cpp
M llvm/tools/llvm-profgen/PseudoProbe.h
Log Message:
-----------
[CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe
This change extends virtual unwinder to support pseudo probe in llvm-profgen. Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**Implementation**
- Added `ProbeBasedCtxKey` derived from `ContextKey` for sample counter aggregation. As we need string splitting to infer the profile for callee function, string based context introduces more string handling overhead, here we just use probe pointer based context.
- For linear unwinding, as inline context is encoded in each pseudo probe, we don't need to go through each instruction to extract range sharing same inliner. So just record the range for the context.
- For probe based context, we should ignore the top frame probe since it will be extracted from the address range. we defer the extraction in `ProfileGeneration`.
- Added `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- Some helper function to get pseduo probe info(call probe, inline context) from profiled binary.
- Added regression test for unwinder's output
The pseudo probe based profile generation will be in the upcoming patch.
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92896
Compare: https://github.com/llvm/llvm-project/compare/166e5c335cbe...c681400b25a6
More information about the All-commits
mailing list