[Mlir-commits] [mlir] [MLIR][Bytecode] Followup 8106c81 (PR #157136)

Sat Sep 6 02:46:37 PDT 2025

================
@@ -300,8 +298,19 @@ class EncodingReader {
       // alignment of the root buffer. If it is not, we cannot safely guarantee
       // that the specified alignment is globally correct.
       //
-      // E.g. if the buffer is 8k aligned and the section is 16k aligned,
-      // we could end up at an offset of 24k, which is not globally 16k aligned.
+      // E.g. if the buffer is 8k aligned and the section is marked to be 16k
+      // aligned:
+      // - (a) the alignTo call early returns when the pointer is 16k
+      // aligned but given the original 8k alignment we could offset into the
+      // padding by ~8k giving us 16k pointer alignment leaving another ~8k of
+      // padding in the bytecode file that will inadvertently be read when we
+      // attempt to parse the next section.
+      // - (b) we update alignTo to align relative to the start of the buffer,
+      // but given an 8k aligned buffer and section alignment of 16k, we could
+      // end up with a pointer that is 24k aligned (8k start alignment + 16k
+      // offset) instead of globally 16k aligned (versus 16k start alignment +
+      // 16k offset). This would result in incorrectly stated alignment for
+      // resources that reference data inside of the bytecode buffer.
----------------
joker-eph wrote:

I found this explanation still quite confusing to be honest.
I iterated with ChatGPT to get something more clear (I think it can likely be pruned though):

```
// If the section specifies an alignment requirement, handle it here.
if (hasAlignment) {
  // Read the requested alignment value from the stream.
  uint64_t alignment;
  if (failed(parseVarInt(alignment)))
    return failure();

  // Sanity check: the requested alignment must not exceed the alignment of the
  // root buffer itself. Otherwise we cannot guarantee that pointers derived
  // from this buffer will actually satisfy the requested alignment globally.
  //
  // Why is this necessary?
  //
  // Consider a root buffer that is guaranteed to be 8k aligned, but not 16k
  // aligned. For example, suppose the buffer starts at absolute address
  // 5×8k = 40960. If a section inside this buffer declares a 16k alignment
  // requirement, two problems can arise:
  //
  //   (a) If we simply "align forward" the current pointer to the next
  //       16k boundary, the amount of padding we skip depends on the buffer's
  //       starting address. For example:
  //
  //         buffer_start = 40960
  //         next 16k boundary = 49152
  //         bytes skipped = 49152 - 40960 = 8192
  //
  //       If the buffer had started at a different 8k-aligned address, the
  //       skipped bytes would change accordingly. This makes the section start
  //       unpredictable and leaves behind variable padding that could be
  //       misinterpreted as part of the next section.
  //
  //   (b) If instead we align relative to the buffer start, we may obtain
  //       addresses that are multiples of "buffer_start + section_alignment"
  //       rather than truly globally aligned addresses. For example:
  //
  //         buffer_start = 40960 (5×8k, 8k aligned but not 16k)
  //         offset       = 16384  (first multiple of 16k)
  //         section_ptr  = 40960 + 16384 = 57344
  //
  //       57344 is divisible by 8192, so it looks "8k aligned", but:
  //
  //         57344 % 16384 = 24576  ≠ 0
  //
  //       i.e. the section pointer is at absolute address 57344, which is not
  //       truly 16k aligned. Any consumer expecting true 16k alignment would
  //       see this as a violation.
  //
  // In short: the section's declared alignment must not exceed the alignment
  // of the root buffer; otherwise we cannot enforce it in a globally
  // consistent and deterministic way.
  ```

https://github.com/llvm/llvm-project/pull/157136