[PATCH] D70157: Align branches within 32-Byte boundary

Thu Nov 21 20:33:29 PST 2019

skan marked an inline comment as done.
skan added a comment.

In D70157#1755927 <https://reviews.llvm.org/D70157#1755927>, @jyknight wrote:

> Thanks for the comments, they help a little. But it's still somewhat confusing, so let me write down what seems to be happening:
>
> - Before emitting every instruction, a new MCMachineDependentFragment is now emitted, of one of the multiple types:
>   - For most instructions, that'll be BranchPrefix.
>   - For things that need branch-alignment, it'll be BranchPadding, unless there's a fused conditional before, in which case it's BranchSplit
>   - For fused conditionals, it'll be FusedJccPadding.
> - After emitting an instruction that needs branch-alignment, all of those previously-emitted MCMachineDependentFragment are updated to point to the branch's fragment.
> - Thus, every MCDataFragment now only contains a single instruction (this property is depended upon for getInstSize, at least).
>
>   All the MCMachineDependentFragments in a region bounded by a branch at the end and either a branch or a fragment-type which is not type in {FT_Data, FT_MachineDependent, FT_Relaxable, FT_CompactEncodedInst} at the beginning, will reference the ending branch instruction's fragment.
>
>   Then, when it comes time to do relaxation, every one of those machine-dependent-fragments has the opportunity to grow its instruction a little bit. The first instruction in a "block" will grow up to 5 segment prefixes (via modifying the BranchPrefix fragment), and then if more is needed, more prefixes will be added to the next instruction, and so on. Until you run out of instructions in the region. At which point the BranchPadding or FusedJccPadding types (right before the branch/fused-branch) will be able to emit nops to achieve the desired alignment.
>
>   An alternative would be to simply emit NOPs before branches as needed. That would be substantially simpler, since it would only require special handling for a branch or a fused-branch. I assume things were done this substantially-more-complex way in order to reduce performance cost of inserting NOP instructions? Are there numbers for how much better it is to use segment prefixes, vs a separate nop instruction? It seems a little bit surprising to me that it would be that important, but I don't know...
>
>   I'll note that the method here has the semantic issue of making it effectively impossible to ever evaluate an expression like ".if . - symbol == 24" (assuming we're emitting instructions), since every instruction can now change size. I suspect that will make it impossible to turn this on by default without breaking a lot of assembly code. Previously, only certain instructions, like branches or arithmetic ops with constant arguments of unknown value, could change size.

Thanks for your detailed and accurate analysis for my code! I am sorry that this should have been done by me.

================
Comment at: llvm/lib/MC/MCAssembler.cpp:1058
+/// is the address of the symbol, which would casuse performance regression.
+void MCAssembler::moveSymbol(const MCFragment *Src, MCFragment *Dst) const {
+  if (!(Src && Dst && Dst->getKind() == MCFragment::FT_MachineDependent))
----------------
jyknight wrote:
> I don't think this is necessary. AFAICT, the symbols should already be in the right place -- pointing to the relax fragment, not the instruction itself, without this. And removing all this moveSymbol/updateSymbolMap code doesn't make any tests fail.
Yes, I check it and you are right. I will removing all this moveSymbol/updateSymbolMap code.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157