[PATCH] D70157: Align branches within 32-Byte boundary

annita.zhang via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Dec 7 01:30:24 PST 2019


annita.zhang added a comment.



In D70157#1760278 <https://reviews.llvm.org/D70157#1760278>, @jyknight wrote:

>




> Branch alignment
> 
>  ---
> 
> The primary goal of this patch, restricting the placement of branch instructions, is a performance optimization. Similar to loop alignment, the desire is to increase speed, at the cost of code-size. However, the way this feature has been designed is a global assembler flag. I find that not ideal, because it cannot take into account hotness of a block/function, as for example loop alignment code does. Basic-block alignment of loops is explicitly opt-in on an block-by-block basis -- the compiler simply emits a p2align directive where it needs, and the assembler honors that. And so, MachineBlockPlacement::alignBlocks has a bunch of conditions under which it will avoid emitting a p2align. This seems like a good model -- the assembler does what it's told by the compiler (or assembly-writer). Making the branch-instruction-alignment work similarly seems like it would be good.
> 
> IMO it would be nicest if there could be a directive that requests to specially-align the next instruction. However, the fused-jcc case makes that quite tricky, so perhaps this ought to also be a mode which can be enabled/disabled on a region as well.

Yes, the primary goal of this patch is a performance optimization or mitigation. The intention is to provide a simple method for users to mitigate the performance impact of JCC MCU with less effort. We also provide users several options to tune the performance. But the basic idea is to make it easy for users to mitigate it and improve the performance.

Your proposal is a good idea. But I'm afraid it may not cover all the scenarios. Firstly, the proposal replies on compiler to detect the hotspots. But the compiler needs LTO and/or PGO to get the precise hot spots. Otherwise, if the compiler misses the hot spots which impact the application performance, the users have no way but have to insert the directives manually. Secondly, for the existing codes written in assembly, the compiler can't handle it. The users have to insert the directives by hand, which are pretty much work.

I think the current patch wants to give a simple and general solution to mitigate JCC MCU performance impact to both C/C++ and Assembly. And it doesn't need any source code change. I think your proposal will be a good enhancement on top of it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157





More information about the cfe-commits mailing list