[PATCH] D121392: [BOLT] Set cold sections alignment explicitly

Mon Mar 14 13:30:00 PDT 2022

yota9 added inline comments.

================
Comment at: bolt/lib/Passes/LongJmp.cpp:301
       continue;
     DotAddress = alignTo(DotAddress, BinaryFunction::MinAlign);
     uint64_t Pad =
----------------
rafauler wrote:
> From what I understand, the real problem though is here, right?
> 
> We align with the minimum value, but we could be aligning more aggressively depending on "max align bytes", correct?
> 
> Also, this is done for every function. So if there is an alignment difference created from MinAlign vs. real alignment, I imagine this difference will increase as we emit more and more functions? What confused me then is why are we aligning only the very first function (via section alignment, since all functions are emitted to the same section). Wouldn't be better to replicate the logic to compute the correct alignment here? (Use a formula that retrieves how many bytes need to align and check it against max align bytes).  The goal of LongJmp is to replicate exactly what binary emitter is doing (I know this is unfortunate and bug prone, but I don't think there is a better way to have access to the exact offsets of the future layout of functions without calling the assembler and emitting everything)
The thing is that we don't align the first function here, we align the section. 
On function emittion we call 
```
Streamer.emitCodeAlignment
```

for each function. 

That function calls emitValueToAlignment located in MCObjectStreamer.cpp that contains the following lines:

```
  if (ByteAlignment > CurSec->getAlignment())
    CurSec->setAlignment(Align(ByteAlignment));
```

So depending on what we passed in emitValueToAlignment we might change the section alignment and (of course) the alignment of the very first function. I don't want to overcomplicate the code trying to first to calculate the section alignment that the functions alignment & etc (this should also be done for golang for example, so for every pass that will calculate the layout), so the easiest and the most straightforward way I see is to align the cold section using the maximum possible value explicitly. As for the space-wasting this is very minor, since the ByteAlignment is limited by uint16_t value (64kb) and most of the time we are speaking about just a ~64 bytes. But such a solution will significantly simplify the calculations for such passes :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121392/new/

https://reviews.llvm.org/D121392