[PATCH] D97982: [MC] Introduce NeverAlign fragment type

Mon Jun 14 14:07:09 PDT 2021

Amir added a comment.

In D97982#2813799 <https://reviews.llvm.org/D97982#2813799>, @reames wrote:

> Drive by thought, not intended to be blocking.
>
> Reading over the description, I'm left wondering why not treat this as a bundling problem?  We have precedent for bundles of instructions which need to not be split across an alignment boundary.  Why not simply say that the test/jcc are are in a two instruction bundle which can't cross the boundary?  In fact, shouldn't this be able to reuse the existing boundary align mechanism pretty much exactly?  It seems like the same basic problem, just for a different use case.
>
> Or maybe I'm misunderstanding the problem.  Do you get good performance if one of the instructions starts in the first cache line, but ends in the second?  That doesn't match my memory of the performance characteristics, but I don't have that fully loaded any more either.  That seems to be the difference between the two approaches right?
>
> If that is the difference, and it's intention, I'd suggest updating the commit message to be really explicit about that being the desired behavior in the edge case.

The use case is different: the intent is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one byte. It's OK if either instruction crosses the cache line. The performance metric for this alignment is whether macro-op fusion is performed or not. BOLT aggressively removes nop padding to improve icache/iTLB utilization, so padding both instructions using bundles to not cross the cache line would go against this goal. Performance gain from enabled macro-op fusion in this rare case would be negated by code size increase due to an excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle.

Following the suggestion by @skan I've experimented with re-purposing BoundaryAlign (D101817 <https://reviews.llvm.org/D101817>) and achieved the desired alignment but this approach has more overhead due to reliance on relaxation (as BoundaryAlign requires in the general case) - see https://reviews.llvm.org/D97982#2710638.

So neither instruction bundling nor BoundaryAlign satisfy our functional and overhead requirements, that NeverAlign addresses. I'll this information to the commit message.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97982/new/

https://reviews.llvm.org/D97982