[PATCH] D110365: [llvm][profile] Add padding after binary IDs

Thu Sep 23 18:37:28 PDT 2021

mcgrathr added inline comments.

================
Comment at: llvm/lib/ProfileData/InstrProfReader.cpp:558
     // Increment by binary id data length.
     BI += BinaryIdLen;
     if (BI > (const uint8_t *)DataBuffer->getBufferEnd())
----------------
leonardchan wrote:
> leonardchan wrote:
> > mcgrathr wrote:
> > > What is the packing protocol?  It seems wise to pad after *each* ID to make the next size field naturally-aligned, rather than having the second size field be misaligned if the first build ID length is not a multiple of 8.
> > > 
> > Right now there's no alignment and the size just seems to be added after the previous build ID. I updated `__llvm_write_build_ids` to take this suggestion of adding padding after each ID. This should also prevent us from needing to add padding after all the build IDs if there's padding in between each.
> > 
> > I'm wondering though if maybe adding padding after each ID would lead to some bloat compared to just adding padding once.
> > What is the packing protocol?
> 
> Oh, I may have misunderstood at first. In terms of packing, the only requirement seems to be that each header be 8-byte aligned (https://github.com/llvm/llvm-project/blob/dcadd64986b8a84dc244d4e7faa848fb4c18cea6/llvm/lib/ProfileData/InstrProfReader.cpp#L337). I believe all the other sections like counters, data, and names are aligned as a whole with padding coming after each section to make sure the next is aligned. So if we want to follow suite, we could stick to just padding after the binary IDs section unless padding in between IDs is more desireable.
The reason that other sections are aligned is so that normal, aligned word access can be used on their fields.  The same logic applies to the size field in the BinaryIds section.  There is a) no reason to expect multiple IDs in raw profiles in practice and b) no reason to worry about a few bytes per build ID in the context of the size of the data overall.  IMHO it is far more important to maintain universal invariants like arranging that fields with natural alignment requirements are all kept aligned in the data so that code doesn't need to use memcpy to access them safely.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110365/new/

https://reviews.llvm.org/D110365