[PATCH] D71238: Align non-fused branches within 32-Byte boundary (basic case)

Mon Dec 16 15:47:32 PST 2019

jyknight added a comment.

I do like the idea of generalizing bundle_lock to mean generally "keep these instructions together and don't introduce anything extraneous in the middle". So, ".boundary_align <argument>" would apply to the next instruction-or-instruction-bundle, and will emit nops at its location, such that the next instruction is guaranteed to have the proper "within but not ending at a given 2**<arg> block" alignment.

This has the advantage that padding is only added to locations that explicitly ask for it -- unmodified assembly code will not be broken. (BTW, not breaking existing assembly code by introducing unexpected padding seems pretty important IMO, which is why I keep coming back to it. If we don't figure that out, I don't think we can enable the feature by default, which we will probably want to do in certain tunings.)

The disadvantage would be that the compiler needs to know exactly which instructions to request padding for.

But now I'm pondering if we may wish to leave space in our design to be able to avoid the other possible fallbacks out of the DSB. (E.g., avoiding having more than 3 jumps or 18 uops per 32-byte block of code.) I don't mean to suggest to implement any of those optimizations now, only that it's worth considering how we could implement that, as a potential future enhancement.

And, thinking about that, I can't really see a way to implement any of that with explicit per-instruction directives generated by the compiler. But, it seems like it would fit very well with a region-based mode. So, considering that potential future extension, I'm currently thinking that something similar to D71238 <https://reviews.llvm.org/D71238>, but with a directive to opt-in rather than command-line all-or-nothing, would be a great first-step.

And, we need a region-based annotation to mark which instructions can get prefix-padded, too.

So, compiler would request (let's say, with ".autopadding" and ".noautopadding") this mode for code where it's safe, and profitable to do. This would declare to the assembler that nop or prefix padding may be added as necessary before any instruction within the region to keep the instructions within whatever target microarchitectural constraints exist for the given architecture.

I imagine it not indicating any particular padding action be taken, only the opt-in for the assembler being _allowed_ to insert nops or padding.

So, my suggestions for next steps to take are:

- Add support for such a directive into this (D71238 <https://reviews.llvm.org/D71238>, basic case) patch.
- Add support for emitting it to clang. By default, I'd think it should be enabled around code clang emits, except for cold code, inline assembly, and patchable instructions emitted for PATCHPOINT, STATEPOINT, xray.
- Incrementally extend support to use prefix-padding, and any other such future padding smarts we want.

Unfortunately, I need to disappear right now, will not be able to join the zoom call. Sorry about that!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71238/new/

https://reviews.llvm.org/D71238