[Mlir-commits] [mlir] [MLIR][Bytecode] Followup 8106c81 (PR #157136)
Nikhil Kalra
llvmlistbot at llvm.org
Mon Sep 8 11:18:17 PDT 2025
https://github.com/nikalra updated https://github.com/llvm/llvm-project/pull/157136
>From 4ace8e2b14ec958b8fe69014a64fa7c1b3beca9a Mon Sep 17 00:00:00 2001
From: Nikhil Kalra <nkalra at apple.com>
Date: Fri, 5 Sep 2025 09:27:08 -0700
Subject: [PATCH 1/2] [MLIR][Bytecode] Followup 8106c81
Addressed code review feedback:
- Fixed some issues in the unit test
- Adjusted line wrapping in the docs
- Clarified comments in the bytecode reader
---
mlir/docs/BytecodeFormat.md | 4 +++-
mlir/lib/Bytecode/Reader/BytecodeReader.cpp | 17 ++++++++++----
mlir/unittests/Bytecode/BytecodeTest.cpp | 25 ++++++++++-----------
3 files changed, 28 insertions(+), 18 deletions(-)
diff --git a/mlir/docs/BytecodeFormat.md b/mlir/docs/BytecodeFormat.md
index 9846df8726295..f50ddeb25cc33 100644
--- a/mlir/docs/BytecodeFormat.md
+++ b/mlir/docs/BytecodeFormat.md
@@ -125,7 +125,9 @@ lazy-loading, and more. Each section contains a Section ID, whose high bit
indicates if the section has alignment requirements, a length (which allows for
skipping over the section), and an optional alignment. When an alignment is
present, a variable number of padding bytes (0xCB) may appear before the section
-data. The alignment of a section must be a power of 2. The input bytecode buffer must satisfy the same alignment requirements as those of every section.
+data. The alignment of a section must be a power of 2.
+The input bytecode buffer must satisfy the same alignment requirements as
+those of every section.
## MLIR Encoding
diff --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
index d29053a2b6e65..287d926d66105 100644
--- a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -22,8 +22,6 @@
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Endian.h"
-#include "llvm/Support/Format.h"
-#include "llvm/Support/LogicalResult.h"
#include "llvm/Support/MemoryBufferRef.h"
#include "llvm/Support/SourceMgr.h"
@@ -300,8 +298,19 @@ class EncodingReader {
// alignment of the root buffer. If it is not, we cannot safely guarantee
// that the specified alignment is globally correct.
//
- // E.g. if the buffer is 8k aligned and the section is 16k aligned,
- // we could end up at an offset of 24k, which is not globally 16k aligned.
+ // E.g. if the buffer is 8k aligned and the section is marked to be 16k
+ // aligned:
+ // - (a) the alignTo call early returns when the pointer is 16k
+ // aligned but given the original 8k alignment we could offset into the
+ // padding by ~8k giving us 16k pointer alignment leaving another ~8k of
+ // padding in the bytecode file that will inadvertently be read when we
+ // attempt to parse the next section.
+ // - (b) we update alignTo to align relative to the start of the buffer,
+ // but given an 8k aligned buffer and section alignment of 16k, we could
+ // end up with a pointer that is 24k aligned (8k start alignment + 16k
+ // offset) instead of globally 16k aligned (versus 16k start alignment +
+ // 16k offset). This would result in incorrectly stated alignment for
+ // resources that reference data inside of the bytecode buffer.
if (failed(alignmentValidator(alignment)))
return emitError("failed to align section ID: ", unsigned(sectionID));
diff --git a/mlir/unittests/Bytecode/BytecodeTest.cpp b/mlir/unittests/Bytecode/BytecodeTest.cpp
index 9ea6560f712a1..d7b442f6832d0 100644
--- a/mlir/unittests/Bytecode/BytecodeTest.cpp
+++ b/mlir/unittests/Bytecode/BytecodeTest.cpp
@@ -72,6 +72,8 @@ TEST(Bytecode, MultiModuleWithResource) {
ASSERT_TRUE(module);
// Write the module to bytecode.
+ // Ensure that reserveExtraSpace is called with the size needed to write the
+ // bytecode buffer.
MockOstream ostream;
EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
ostream.buffer = std::make_unique<std::byte[]>(space);
@@ -128,31 +130,28 @@ TEST(Bytecode, AlignmentFailure) {
ASSERT_TRUE(module);
// Write the module to bytecode.
- MockOstream ostream;
- EXPECT_CALL(ostream, reserveExtraSpace).WillOnce([&](uint64_t space) {
- ostream.buffer = std::make_unique<std::byte[]>(space);
- ostream.size = space;
- });
+ std::string serializedBytecode;
+ llvm::raw_string_ostream ostream(serializedBytecode);
ASSERT_TRUE(succeeded(writeBytecodeToFile(module.get(), ostream)));
// Create copy of buffer which is not aligned to requested resource alignment.
- std::string buffer((char *)ostream.buffer.get(),
- (char *)ostream.buffer.get() + ostream.size);
+ std::string buffer(serializedBytecode);
size_t bufferSize = buffer.size();
- // Increment into the buffer until we get to a power of 2 alignment that is
- // not 32 bit aligned.
+ // Increment into the buffer until we get to an address that is 2 byte aligned
+ // but not 32 byte aligned.
size_t pad = 0;
while (true) {
- if (llvm::isAddrAligned(Align(2), &buffer[pad]) &&
- !llvm::isAddrAligned(Align(32), &buffer[pad]))
+ if (llvm::isAddrAligned(Align(2), buffer.data() + pad) &&
+ !llvm::isAddrAligned(Align(32), buffer.data() + pad))
break;
pad++;
- buffer.reserve(bufferSize + pad);
+ // Pad the beginning of the buffer to push the start point to an unaligned
+ // value.
+ buffer.insert(0, 1, ' ');
}
- buffer.insert(0, pad, ' ');
StringRef alignedBuffer(buffer.data() + pad, bufferSize);
// Attach a diagnostic handler to get the error message.
>From bf9f0a7cdca8591d991b192ef3732d8818862068 Mon Sep 17 00:00:00 2001
From: Nikhil Kalra <nkalra at apple.com>
Date: Mon, 8 Sep 2025 11:18:00 -0700
Subject: [PATCH 2/2] update comment
---
mlir/lib/Bytecode/Reader/BytecodeReader.cpp | 47 ++++++++++++++-------
1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
index 287d926d66105..1659437e1eb24 100644
--- a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -294,23 +294,38 @@ class EncodingReader {
if (failed(parseVarInt(alignment)))
return failure();
- // Check that the requested alignment is less than or equal to the
- // alignment of the root buffer. If it is not, we cannot safely guarantee
- // that the specified alignment is globally correct.
+ // Check that the requested alignment must not exceed the alignment of
+ // the root buffer itself. Otherwise we cannot guarantee that pointers
+ // derived from this buffer will actually satisfy the requested alignment
+ // globally.
//
- // E.g. if the buffer is 8k aligned and the section is marked to be 16k
- // aligned:
- // - (a) the alignTo call early returns when the pointer is 16k
- // aligned but given the original 8k alignment we could offset into the
- // padding by ~8k giving us 16k pointer alignment leaving another ~8k of
- // padding in the bytecode file that will inadvertently be read when we
- // attempt to parse the next section.
- // - (b) we update alignTo to align relative to the start of the buffer,
- // but given an 8k aligned buffer and section alignment of 16k, we could
- // end up with a pointer that is 24k aligned (8k start alignment + 16k
- // offset) instead of globally 16k aligned (versus 16k start alignment +
- // 16k offset). This would result in incorrectly stated alignment for
- // resources that reference data inside of the bytecode buffer.
+ // Consider a bytecode buffer that is guaranteed to be 8k aligned, but not
+ // 16k aligned (e.g. absolute address 40960. If a section inside this
+ // buffer declares a 16k alignment requirement, two problems can arise:
+ //
+ // (a) If we "align forward" the current pointer to the next
+ // 16k boundary, the amount of padding we skip depends on the
+ // buffer's starting address. For example:
+ //
+ // buffer_start = 40960
+ // next 16k boundary = 49152
+ // bytes skipped = 49152 - 40960 = 8192
+ //
+ // This leaves behind variable padding that could be misinterpreted
+ // as part of the next section.
+ //
+ // (b) If we align relative to the buffer start, we may
+ // obtain addresses that are multiples of "buffer_start +
+ // section_alignment" rather than truly globally aligned
+ // addresses. For example:
+ //
+ // buffer_start = 40960 (5×8k, 8k aligned but not 16k)
+ // offset = 16384 (first multiple of 16k)
+ // section_ptr = 40960 + 16384 = 57344
+ //
+ // 57344 is 8k aligned but not 16k aligned.
+ // Any consumer expecting true 16k alignment would see this as a
+ // violation.
if (failed(alignmentValidator(alignment)))
return emitError("failed to align section ID: ", unsigned(sectionID));
More information about the Mlir-commits
mailing list