[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes

Thu May 30 19:09:28 PDT 2019

ZhangKang added a comment.

In D61228#1506487 <https://reviews.llvm.org/D61228#1506487>, @nemanjai wrote:

> > Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
> > 
> > Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.
>
> Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.
>
> TL; DR;
>  The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:
>
>   for (int i = 0; i < HugeValue; i++) {
>     // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
>     for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
>       // Do something short
>   }
>
>
> Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

@nemanjai  @hfinkel , I have tested the spec performance without PGO for this patch, there is no regressions for alignment 5 without PGO, so I will modify the patch to remove the PGO restriction.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61228/new/

https://reviews.llvm.org/D61228