[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes

Zhang Kang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 11 23:46:11 PDT 2019


ZhangKang added a comment.

In D61228#1506487 <https://reviews.llvm.org/D61228#1506487>, @nemanjai wrote:

> > Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
> > 
> > Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.
>
> Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.
>
> TL; DR;
>  The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:
>
>   for (int i = 0; i < HugeValue; i++) {
>     // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
>     for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
>       // Do something short
>   }
>
>
> Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).


@nemanjai , I have tested the case you give, after doing the alignment to 32 bytes without PGO, the performance doesn't degrade, I think we'd better align the inner loop without PGO data. What do you think?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61228/new/

https://reviews.llvm.org/D61228





More information about the llvm-commits mailing list