[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 17 05:57:42 PDT 2019
nemanjai added a comment.
> Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
>
> Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.
Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.
TL; DR;
The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:
for (int i = 0; i < HugeValue; i++) {
// Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
// Do something short
}
Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61228/new/
https://reviews.llvm.org/D61228
More information about the llvm-commits
mailing list