[PATCH] D70157: Align branches within 32-Byte boundary

David Zarzycki via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Dec 6 01:54:10 PST 2019


davezarzycki added a comment.

In D70157#1772227 <https://reviews.llvm.org/D70157#1772227>, @annita.zhang wrote:

> > 
> > 
> >>> Third, I have not see a justification for why complexity for instruction prefix padding is necessary.  All the effected CPUs support multi-byte nops, so we're talking about a *single micro op* difference between the nop form and prefix form.  Can anyone point to a performance delta due to this?  If not, I'd suggest we should start with the nop form, and then build the prefix form in a generic manner for all alignment varieties.
> >> 
> >> +1.
> > 
> > +1. Starting from just NOP padding sounds a simple and good first step. We can explore segment override prefixes in the future.
>
> I think it's a good suggestion to start with NOP padding as the first step. In our previous experiment, we saw that the prefix padding was slight better than NOP padding, but not much. We will retest the NOP padding and go back to you.


For whatever it may be worth: Agnor Fog's empirical research on x86 pipelines and his review of manufacturer optimization guidelines also concludes that prefixes are often preferable to NOPs on modern x86 processors. (See: https://www.agner.org/optimize/microarchitecture.pdf) This arguably isn't surprising given that the decoder needs to be good at finding instruction boundaries but the decoder isn't responsible for interpreting instructions, therefore NOPs of any size dilute decode bandwidth.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157





More information about the cfe-commits mailing list