[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes

Zhang Kang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 30 19:09:28 PDT 2019

ZhangKang added a comment.

In D61228#1506487 <https://reviews.llvm.org/D61228#1506487>, @nemanjai wrote:

> > Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
> > 
> > Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.
> Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.
> TL; DR;
>  The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:
>   for (int i = 0; i < HugeValue; i++) {
>     // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
>     for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
>       // Do something short
>   }
> Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

@nemanjai  @hfinkel , I have tested the spec performance without PGO for this patch, there is no regressions for alignment 5 without PGO, so I will modify the patch to remove the PGO restriction.



More information about the llvm-commits mailing list