[llvm] [llvm lib] Fix endian bug in #189098 (PR #189778)
Roy Shi via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 31 17:46:27 PDT 2026
https://github.com/royitaqi created https://github.com/llvm/llvm-project/pull/189778
Both functions assumed little-endian host memory layout when reading/writing non-power-of-two byte sizes (3, 5, 6, 7). On big-endian hosts (SPARC, s390x), memcpy copies bytes into the MSB end of a uint64_t, leaving the value in the wrong bit position. Fix by offsetting the memcpy/write pointer to always target the LSB bytes.
--
Note: I don't have a big endian machine to test this change!
Details: I have requested an account on cfarm but it can take several days to be accepted. Will be great if someone else can help test this on big endian machines. If not, we can either wait for me to get an account, or we can yolo merge and see if they fix the issue.
>From 0697eb5ea2fd20d27c8b9d17be171c128577caae Mon Sep 17 00:00:00 2001
From: Roy Shi <royshi at meta.com>
Date: Tue, 31 Mar 2026 17:37:45 -0700
Subject: [PATCH] Fix big-endian host bug in DataExtractor::getUnsigned() and
FileWriter::writeUnsigned()
Both functions assumed little-endian host memory layout when reading/writing
non-power-of-two byte sizes (3, 5, 6, 7). On big-endian hosts (SPARC, s390x),
memcpy copies bytes into the MSB end of a uint64_t, leaving the value in the
wrong bit position. Fix by offsetting the memcpy/write pointer to always
target the LSB bytes.
Prompts:
- "Let's work on the gsymutil 64 bit support"
- "Actually, switching to fix an endianness issue"
- "It's a different PR. See https://github.com/llvm/llvm-project/pull/189098.
My guess is that in that PR, the endianness is tested incorrectly or the
condition is written incorrectly so that the tests are failing on big-endian
machines."
- "I don't understand this. Let's walk through an example where we use
DataExtractor to read 3 bytes from a byte array, the array is [1, 2, 3, ...],
the DataExtractor is set to IsLittleEndian=true. Can you show me byte by byte
why my original code is wrong on a big endian machine?"
- "Can you do the same simulator, but this time with IsLittleEndian=false?"
- "I see. This bug can also be fixed by still memcpy, but make sure to copy
into the least significant bytes of the value, correct?"
- "Which would be faster at runtime? I think we should use this to decide which
fix to implement"
- "Let's implement the conditional memcpy approach for runtime performance."
- "constexpr doesn't work in a ternary?"
- "I edited the code, how about now?"
- "K I'm happy with DataExtractor::getUnsigned() as of now. Let's look at
FileWriter::writeUnsigned()"
- "This looks better and more consistent. Can you run the tests and see that
they pass?"
- "memorize, commit, push"
---
llvm/lib/DebugInfo/GSYM/FileWriter.cpp | 14 ++++++--------
llvm/lib/Support/DataExtractor.cpp | 13 ++++++-------
2 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/llvm/lib/DebugInfo/GSYM/FileWriter.cpp b/llvm/lib/DebugInfo/GSYM/FileWriter.cpp
index 7c55ae03b6f0b..5d6d81dfbc3b6 100644
--- a/llvm/lib/DebugInfo/GSYM/FileWriter.cpp
+++ b/llvm/lib/DebugInfo/GSYM/FileWriter.cpp
@@ -55,15 +55,13 @@ void FileWriter::writeUnsigned(uint64_t Value, size_t ByteSize) {
assert((ByteSize == 8 || (Value & (uint64_t)-1 << (8 * ByteSize)) == 0) &&
"potential data loss: higher bits are non-zero");
// Swap and shift bytes if endianness doesn't match.
- if (ByteOrder != llvm::endianness::native) {
- // Say ByteSize is 3.
- // high low
- // Input bytes: 00 00 00 00 00 AA BB CC
- // Swapped bytes: CC BB AA 00 00 00 00 00
- // Shifted bytes: 00 00 00 00 00 CC BB AA
+ if (ByteOrder != llvm::endianness::native)
Value = sys::getSwappedBytes(Value) >> (8 * (8 - ByteSize));
- }
- OS.write(reinterpret_cast<const char *>(&Value), ByteSize);
+ // Write from the least significant bytes of Value regardless of host
+ // endianness.
+ OS.write(reinterpret_cast<const char *>(&Value) +
+ (sys::IsLittleEndianHost ? 0 : 8 - ByteSize),
+ ByteSize);
}
void FileWriter::fixup32(uint32_t U, uint64_t Offset) {
diff --git a/llvm/lib/Support/DataExtractor.cpp b/llvm/lib/Support/DataExtractor.cpp
index 071093504e591..6b58b492002fd 100644
--- a/llvm/lib/Support/DataExtractor.cpp
+++ b/llvm/lib/Support/DataExtractor.cpp
@@ -144,16 +144,15 @@ uint64_t DataExtractor::getUnsigned(uint64_t *offset_ptr, uint32_t byte_size,
uint64_t offset = *offset_ptr;
if (!prepareRead(offset, byte_size, Err))
return val;
- std::memcpy(&val, &Data.data()[offset], byte_size);
+ // Copy into the least significant bytes of val regardless of host
+ // endianness.
+ std::memcpy(reinterpret_cast<char *>(&val) +
+ (sys::IsLittleEndianHost ? 0 : 8 - byte_size),
+ &Data.data()[offset], byte_size);
+ // Swap the least significant bytes of val if endianness doesn't match.
if (sys::IsLittleEndianHost != IsLittleEndian)
- // Say byte_size is 3.
- // high low
- // Read bytes: 00 00 00 00 00 AA BB CC
- // Swapped bytes: CC BB AA 00 00 00 00 00
- // Shifted bytes: 00 00 00 00 00 CC BB AA
val = sys::getSwappedBytes(val) >> (8 * (8 - byte_size));
- // Advance the offset
*offset_ptr += byte_size;
return val;
}
More information about the llvm-commits
mailing list