[PATCH] D70157: Align branches within 32-Byte boundary

Tue Nov 26 08:07:23 PST 2019

jyknight added a comment.

I have some other concerns about the code itself, but after pondering this a little bit, I'd like to instead bring up some rather more general concerns about the overall approach used here -- with some suggestions. (As these comments are not really comments on the _code_, but on the overall strategy, they also apply to the gnu binutils patch for this same feature.)

Of course, echoing chandler, it would be nice to see some performance numbers, otherwise it's not clear how useful any of this is.

Segment-prefix padding
----------------------

The ability to pad instructions instead of adding a multibyte-NOP in order to create alignment seems like a generally-useful feature, which should be usable in other situations where we align within a function -- assuming that there is indeed a measurable performance benefit vs NOP instructions. E.g. we should do this for basic-block alignment, as well! As such, it feels like this feature ought to be split out, and implemented separately from the new branch-alignment functionality -- in a way which is usable for any kind of alignment request.

The way I'd imagine it working is to introduce a new pair of asm directives, to enable and disable segment-prefix padding in a certain range of instructions (let's say ".enable_instr_prefix_pad", ".disable_instr_prefix_pad". Within such an enabled range, the assembler would prefix instructions as required in order to handle later nominally-nop-emitting code alignment directives (such as the usual '.p2align 4, 0x90') .

Branch alignment
----------------

The primary goal of this patch, restricting the placement of branch instructions, is a performance optimization. Similar to loop alignment, the desire is to increase speed, at the cost of code-size. However, the way this feature has been designed is a global assembler flag. I find that not ideal, because it cannot take into account hotness of a block/function, as for example loop alignment code does. Basic-block alignment of loops is explicitly opt-in on an block-by-block basis -- the compiler simply emits a p2align directive where it needs, and the assembler honors that. And so, MachineBlockPlacement::alignBlocks has a bunch of conditions under which it will avoid emitting a p2align. This seems like a good model -- the assembler does what it's told by the compiler (or assembly-writer). Making the branch-instruction-alignment work similarly seems like it would be good.

IMO it would be nicest if there could be a directive that requests to specially-align the next instruction. However, the fused-jcc case makes that quite tricky, so perhaps this ought to also be a mode which can be enabled/disabled on a region as well.

Enabling by default
-------------------

Previously, I'd mentioned that it seemed likely we couldn't actually enable branch-alignment by default because it'll probably break people's inline-asm and standalone asm files. That would be solved by making everything controllable within the asm file. The compiler could then insert the directives for its own code, and disable it around inline assembly. And standalone asm files could remain unaffected, unless they opt in. With that, we could actually enable the alignment by default, for compiled output in certain cpu-tuning modes, if it's warranted.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157