[PATCH] D95962: [CSSPGO] Introducing dangling pseudo probes.
Hongtao Yu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 3 10:34:45 PST 2021
hoy created this revision.
Herald added subscribers: dexonsmith, wenlei, JDevlieghere, hiraditya, kristof.beyls.
hoy requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.
To mitigate this issue, we first try to move around dangling probes inside its host block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.
If we are unlucky to find such in-block preceding instruction for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number UINT64_MAX is used to mark sample count as collected for a dangling probe.
Unblocking optimizations
This change also fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign it a more reasonable weight. We have not seen counts quality degradation from our experiments.
The optimizations being unblocked are:
1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
2. Unblock jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has on unconditional jump instructions.
3. Unblock SimplifyCFG and MIR tail duplicate to threadempty blocks and blocks with redundant branch checks.
Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.
Deduplicating dangling pseudo probes
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference too to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.
This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.
An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D95962
Files:
llvm/include/llvm/CodeGen/MachineBasicBlock.h
llvm/include/llvm/CodeGen/MachineInstr.h
llvm/include/llvm/IR/Instruction.h
llvm/include/llvm/IR/PseudoProbe.h
llvm/include/llvm/MC/MCPseudoProbe.h
llvm/include/llvm/ProfileData/SampleProf.h
llvm/lib/CodeGen/BranchFolding.cpp
llvm/lib/CodeGen/MachineBasicBlock.cpp
llvm/lib/CodeGen/PseudoProbeInserter.cpp
llvm/lib/CodeGen/TailDuplicator.cpp
llvm/lib/IR/Instruction.cpp
llvm/lib/IR/PseudoProbe.cpp
llvm/lib/Transforms/IPO/SampleProfile.cpp
llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
llvm/lib/Transforms/Scalar/JumpThreading.cpp
llvm/lib/Transforms/Utils/Local.cpp
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
llvm/test/Transforms/SampleProfile/pseudo-probe-dedup.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D95962.321154.patch
Type: text/x-patch
Size: 34292 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210203/39d206eb/attachment.bin>
More information about the llvm-commits
mailing list