[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes
Zhang Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 11 23:46:11 PDT 2019
ZhangKang added a comment.
In D61228#1506487 <https://reviews.llvm.org/D61228#1506487>, @nemanjai wrote:
> > Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
> >
> > Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.
>
> Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.
>
> TL; DR;
> The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:
>
> for (int i = 0; i < HugeValue; i++) {
> // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
> for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
> // Do something short
> }
>
>
> Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).
@nemanjai , I have tested the case you give, after doing the alignment to 32 bytes without PGO, the performance doesn't degrade, I think we'd better align the inner loop without PGO data. What do you think?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61228/new/
https://reviews.llvm.org/D61228
More information about the llvm-commits
mailing list