[PATCH] D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 17 05:57:42 PDT 2019


nemanjai added a comment.

> Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?
> 
> Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.

Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.

TL; DR;
The hotness of the loop is checked in `MachineBlockPlacement::alignBlocks()`. I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:

  for (int i = 0; i < HugeValue; i++) {
    // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
    for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
      // Do something short
  }

Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61228/new/

https://reviews.llvm.org/D61228





More information about the llvm-commits mailing list