[Lldb-commits] [lldb] [lldb][NFC] Add some override methods to VirtualDataExtractor (PR #179858)
Jason Molenda via lldb-commits
lldb-commits at lists.llvm.org
Wed Feb 4 20:08:26 PST 2026
https://github.com/jasonmolenda created https://github.com/llvm/llvm-project/pull/179858
This changes VirtualDataExtractor's GetByteSize to return the virtual byte size of the buffer (external users only understand the data contents in terms of the virtual sizes & offsets). There are check methods in DataExtractor that check they are not going off the end of a buffer, they usually use the BytesLeft() method. There are a couple of callers of BytesLeft() externally, but it is predominantly an internal use API. I have BytesLeft() use the physical size of the buffer, not the virtual size, for the benefit of the DataExtractor methods. (and to avoid duplicating all of them down in VirtualDataExtractor)
Another problem is the we call SetData on DataExtractorSP's (e.g. see the ObjectFile ctor) with the DataBuffer it already has, an offset of 0, and the GetByteSize. A no-op for a DataExtractor that is already using that DataBuffer. But SetData would try to use that length as a physical size, and truncate the buffer that the DataExtractor would accept.
I added VirtualDataExtractor subclass methods for the SetData's, detect (1) data being added to an uninitialized DataExtractor, (2) the same data / offset / length as currently being used is added to the DataExtractor (a no-op), or (3) we're genuinely changing the data source or setting an offset / length that is different. This final case we're not ready to handle today, I added asserts for them so we can catch it in debug builds, and then I clear the LookupTable and add a no-op entry so this extractor will behave like a plain DataExtractor -- because I don't know better to do. If we genuinely need to handle this case, and I'm pretty sure we don't need to, I'd have to assume that we're taking a subset of the original data source (an offset & length), so we'd need to update all of the LookupTable entries to reflect the new offsets, and remove entries that are no longer referring to the subsetted range. I'll leave that until there's any evidence it's actually needed.
rdar://148939795
>From 940fbf6578654a6ba9fa65835b49e803447baf99 Mon Sep 17 00:00:00 2001
From: Jason Molenda <jmolenda at apple.com>
Date: Wed, 4 Feb 2026 19:48:21 -0800
Subject: [PATCH] [lldb][NFC] Add some override methods to VirtualDataExtractor
This changes VirtualDataExtractor's GetByteSize to return the virtual
byte size of the buffer (external users only understand the data
contents in terms of the virtual sizes & offsets). There are check
methods in DataExtractor that check they are not going off the end
of a buffer, they usually use the BytesLeft() method. There are a
couple of callers of BytesLeft() externally, but it is predominantly
an internal use API. I have BytesLeft() use the physical size of
the buffer, not the virtual size, for the benefit of the DataExtractor
methods. (and to avoid duplicating all of them down in
VirtualDataExtractor)
Another problem is the we call SetData on DataExtractorSP's (e.g. see
the ObjectFile ctor) with the DataBuffer it already has, an offset of
0, and the GetByteSize. A no-op for a DataExtractor that is already
using that DataBuffer. But SetData would try to use that length as a
physical size, and truncate the buffer that the DataExtractor would
accept.
I added VirtualDataExtractor subclass methods for the SetData's,
detect (1) data being added to an uninitialized DataExtractor, (2)
the same data / offset / length as currently being used is added
to the DataExtractor (a no-op), or (3) we're genuinely changing the
data source or setting an offset / length that is different. This
final case we're not ready to handle today, I added asserts for
them so we can catch it in debug builds, and then I clear the
LookupTable and add a no-op entry so this extractor will behave
like a plain DataExtractor -- because I don't know better to do.
If we genuinely need to handle this case, and I'm pretty sure we
don't need to, I'd have to assume that we're taking a subset of the
original data source (an offset & length), so we'd need to update
all of the LookupTable entries to reflect the new offsets, and
remove entries that are no longer referring to the subsetted range.
I'll leave that until there's any evidence it's actually needed.
rdar://148939795
---
lldb/include/lldb/Utility/DataExtractor.h | 20 +--
.../lldb/Utility/VirtualDataExtractor.h | 21 ++++
.../BSD-Archive/ObjectContainerBSDArchive.cpp | 5 +-
.../ObjectContainerMachOFileset.cpp | 4 +-
.../ObjectContainerUniversalMachO.cpp | 4 +-
.../Plugins/ObjectFile/ELF/ObjectFileELF.cpp | 11 +-
.../Plugins/ObjectFile/ELF/ObjectFileELF.h | 4 +-
.../ObjectFile/XCOFF/ObjectFileXCOFF.cpp | 2 +-
lldb/source/Utility/VirtualDataExtractor.cpp | 116 ++++++++++++++++++
9 files changed, 162 insertions(+), 25 deletions(-)
diff --git a/lldb/include/lldb/Utility/DataExtractor.h b/lldb/include/lldb/Utility/DataExtractor.h
index 7011fa93c2112..d3d9872100320 100644
--- a/lldb/include/lldb/Utility/DataExtractor.h
+++ b/lldb/include/lldb/Utility/DataExtractor.h
@@ -281,7 +281,7 @@ class DataExtractor {
///
/// \return
/// The total number of bytes of data this object refers to.
- uint64_t GetByteSize() const { return m_end - m_start; }
+ virtual uint64_t GetByteSize() const { return m_end - m_start; }
/// Extract a C string from \a *offset_ptr.
///
@@ -858,7 +858,7 @@ class DataExtractor {
return GetSubsetExtractorSP(0);
}
- lldb::DataBufferSP &GetSharedDataBuffer() { return m_data_sp; }
+ const lldb::DataBufferSP &GetSharedDataBuffer() const { return m_data_sp; }
bool HasData() { return m_start && m_end && m_end - m_start > 0; }
@@ -922,8 +922,8 @@ class DataExtractor {
///
/// \return
/// The number of bytes that this object now contains.
- lldb::offset_t SetData(const void *bytes, lldb::offset_t length,
- lldb::ByteOrder byte_order);
+ virtual lldb::offset_t SetData(const void *bytes, lldb::offset_t length,
+ lldb::ByteOrder byte_order);
/// Adopt a subset of \a data.
///
@@ -947,8 +947,8 @@ class DataExtractor {
///
/// \return
/// The number of bytes that this object now contains.
- lldb::offset_t SetData(const DataExtractor &data, lldb::offset_t offset,
- lldb::offset_t length);
+ virtual lldb::offset_t SetData(const DataExtractor &data,
+ lldb::offset_t offset, lldb::offset_t length);
/// Adopt a subset of shared data in \a data_sp.
///
@@ -972,9 +972,9 @@ class DataExtractor {
///
/// \return
/// The number of bytes that this object now contains.
- lldb::offset_t SetData(const lldb::DataBufferSP &data_sp,
- lldb::offset_t offset = 0,
- lldb::offset_t length = LLDB_INVALID_OFFSET);
+ virtual lldb::offset_t SetData(const lldb::DataBufferSP &data_sp,
+ lldb::offset_t offset = 0,
+ lldb::offset_t length = LLDB_INVALID_OFFSET);
/// Set the byte_order value.
///
@@ -1028,7 +1028,7 @@ class DataExtractor {
bool Append(void *bytes, lldb::offset_t length);
- lldb::offset_t BytesLeft(lldb::offset_t offset) const {
+ virtual lldb::offset_t BytesLeft(lldb::offset_t offset) const {
const lldb::offset_t size = GetByteSize();
if (size > offset)
return size - offset;
diff --git a/lldb/include/lldb/Utility/VirtualDataExtractor.h b/lldb/include/lldb/Utility/VirtualDataExtractor.h
index bcd4054912315..af7e76816191a 100644
--- a/lldb/include/lldb/Utility/VirtualDataExtractor.h
+++ b/lldb/include/lldb/Utility/VirtualDataExtractor.h
@@ -59,6 +59,22 @@ class VirtualDataExtractor : public DataExtractor {
llvm::ArrayRef<uint8_t> GetData() const override;
+ uint64_t GetByteSize() const override { return GetVirtualByteSize(); }
+
+ lldb::offset_t BytesLeft(lldb::offset_t offset) const override {
+ return PhysicalBytesLeft(offset);
+ }
+
+ lldb::offset_t SetData(const void *bytes, lldb::offset_t length,
+ lldb::ByteOrder byte_order) override;
+
+ lldb::offset_t SetData(const DataExtractor &data, lldb::offset_t offset,
+ lldb::offset_t length) override;
+
+ lldb::offset_t SetData(const lldb::DataBufferSP &data_sp,
+ lldb::offset_t offset = 0,
+ lldb::offset_t length = LLDB_INVALID_OFFSET) override;
+
/// Unchecked overrides
/// @{
uint8_t GetU8_unchecked(lldb::offset_t *offset_ptr) const override;
@@ -76,6 +92,11 @@ class VirtualDataExtractor : public DataExtractor {
bool ValidateVirtualRead(lldb::offset_t virtual_addr,
lldb::offset_t length) const;
+ uint64_t GetVirtualByteSize() const;
+ uint64_t GetPhysicalByteSize() const;
+ lldb::offset_t VirtualBytesLeft(lldb::offset_t virtual_offset) const;
+ lldb::offset_t PhysicalBytesLeft(lldb::offset_t physical_offset) const;
+
private:
LookupTable m_lookup_table;
};
diff --git a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
index bfa543adeffc2..bee84281ebd52 100644
--- a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
+++ b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
@@ -441,9 +441,8 @@ size_t ObjectContainerBSDArchive::GetModuleSpecifications(
if (!file || !extractor_sp)
return 0;
- DataExtractorSP data_extractor_sp = extractor_sp->GetSubsetExtractorSP(
- data_offset,
- extractor_sp->GetSharedDataBuffer()->GetByteSize() - data_offset);
+ DataExtractorSP data_extractor_sp =
+ extractor_sp->GetSubsetExtractorSP(data_offset);
// We have data, which means this is the first 512 bytes of the file Check to
// see if the magic bytes match and if they do, read the entire table of
// contents for the archive and cache it
diff --git a/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp b/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
index b1b0d0067d404..0b99e3faafabd 100644
--- a/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
+++ b/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
@@ -227,8 +227,8 @@ size_t ObjectContainerMachOFileset::GetModuleSpecifications(
if (!extractor_sp)
return initial_count;
- DataExtractorSP data_extractor_sp = extractor_sp->GetSubsetExtractorSP(
- data_offset, extractor_sp->GetByteSize());
+ DataExtractorSP data_extractor_sp =
+ extractor_sp->GetSubsetExtractorSP(data_offset);
if (!data_extractor_sp)
return initial_count;
if (MagicBytesMatch(*data_extractor_sp)) {
diff --git a/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp b/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
index 215c419a0a9c6..366a8852b9b5e 100644
--- a/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
+++ b/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
@@ -195,8 +195,8 @@ size_t ObjectContainerUniversalMachO::GetModuleSpecifications(
if (!extractor_sp)
return initial_count;
- DataExtractorSP data_extractor_sp = extractor_sp->GetSubsetExtractorSP(
- data_offset, extractor_sp->GetByteSize());
+ DataExtractorSP data_extractor_sp =
+ extractor_sp->GetSubsetExtractorSP(data_offset);
if (ObjectContainerUniversalMachO::MagicBytesMatch(*data_extractor_sp)) {
llvm::MachO::fat_header header;
std::vector<FatArch> fat_archs;
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
index 90afd5b2dc93a..24024ca3fb99c 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
@@ -481,7 +481,7 @@ ObjectFile *ObjectFileELF::CreateMemoryInstance(
return nullptr;
}
-bool ObjectFileELF::MagicBytesMatch(DataBufferSP &data_sp,
+bool ObjectFileELF::MagicBytesMatch(const DataBufferSP &data_sp,
lldb::addr_t data_offset,
lldb::addr_t data_length) {
if (data_sp &&
@@ -2768,7 +2768,7 @@ static void ApplyELF64ABS64Relocation(Symtab *symtab, ELFRelocation &rel,
symtab->FindSymbolByID(ELFRelocation::RelocSymbol64(rel));
if (symbol) {
addr_t value = symbol->GetAddressRef().GetFileAddress();
- DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
+ const DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
// ObjectFileELF creates a WritableDataBuffer in CreateInstance.
WritableDataBuffer *data_buffer =
llvm::cast<WritableDataBuffer>(data_buffer_sp.get());
@@ -2795,7 +2795,7 @@ static void ApplyELF64ABS32Relocation(Symtab *symtab, ELFRelocation &rel,
return;
}
uint32_t truncated_addr = (value & 0xFFFFFFFF);
- DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
+ const DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
// ObjectFileELF creates a WritableDataBuffer in CreateInstance.
WritableDataBuffer *data_buffer =
llvm::cast<WritableDataBuffer>(data_buffer_sp.get());
@@ -2819,7 +2819,7 @@ static void ApplyELF32ABS32RelRelocation(Symtab *symtab, ELFRelocation &rel,
return;
}
assert(llvm::isUInt<32>(value) && "Valid addresses are 32-bit");
- DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
+ const DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
// ObjectFileELF creates a WritableDataBuffer in CreateInstance.
WritableDataBuffer *data_buffer =
llvm::cast<WritableDataBuffer>(data_buffer_sp.get());
@@ -2896,7 +2896,8 @@ unsigned ObjectFileELF::ApplyRelocations(
if (symbol) {
addr_t f_offset =
rel_section->GetFileOffset() + ELFRelocation::RelocOffset32(rel);
- DataBufferSP &data_buffer_sp = debug_data.GetSharedDataBuffer();
+ const DataBufferSP &data_buffer_sp =
+ debug_data.GetSharedDataBuffer();
// ObjectFileELF creates a WritableDataBuffer in CreateInstance.
WritableDataBuffer *data_buffer =
llvm::cast<WritableDataBuffer>(data_buffer_sp.get());
diff --git a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
index 4992e9e2482f3..0eba491df77b0 100644
--- a/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
+++ b/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h
@@ -85,8 +85,8 @@ class ObjectFileELF : public lldb_private::ObjectFile {
lldb::offset_t length,
lldb_private::ModuleSpecList &specs);
- static bool MagicBytesMatch(lldb::DataBufferSP &data_sp, lldb::addr_t offset,
- lldb::addr_t length);
+ static bool MagicBytesMatch(const lldb::DataBufferSP &data_sp,
+ lldb::addr_t offset, lldb::addr_t length);
// PluginInterface protocol
llvm::StringRef GetPluginName() override { return GetPluginNameStatic(); }
diff --git a/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp b/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
index 0bcccfc059bbc..b34c03ea56a05 100644
--- a/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
+++ b/lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp
@@ -167,7 +167,7 @@ bool ObjectFileXCOFF::MagicBytesMatch(DataExtractorSP &extractor_sp,
lldb::addr_t data_offset,
lldb::addr_t data_length) {
DataExtractorSP magic_extractor_sp =
- extractor_sp->GetSubsetExtractorSP(data_offset, data_length);
+ extractor_sp->GetSubsetExtractorSP(data_offset);
// Need to set this as XCOFF is only compatible with Big Endian
magic_extractor_sp->SetByteOrder(eByteOrderBig);
lldb::offset_t offset = 0;
diff --git a/lldb/source/Utility/VirtualDataExtractor.cpp b/lldb/source/Utility/VirtualDataExtractor.cpp
index 83520072a95a8..e2bd67a07fc8d 100644
--- a/lldb/source/Utility/VirtualDataExtractor.cpp
+++ b/lldb/source/Utility/VirtualDataExtractor.cpp
@@ -86,6 +86,122 @@ const void *VirtualDataExtractor::GetData(offset_t *offset_ptr,
return result;
}
+offset_t VirtualDataExtractor::SetData(const void *bytes, lldb::offset_t length,
+ lldb::ByteOrder byte_order) {
+ // Invoked from the base class ctor
+ if (!m_data_sp || m_start == nullptr)
+ return DataExtractor::SetData(bytes, length, byte_order);
+
+ // A no-op SetData that is setting the same data buffer again
+ if (!m_data_sp && m_start == bytes && length == GetVirtualByteSize())
+ return GetVirtualByteSize();
+
+ assert("SetData(1) called on VirtualDataExtractor that already had data" &&
+ false);
+
+ // FIXME calling SetData on a VirtualDataExtractor that already has a
+ // data buffer means the LookupTable needs to be either replaced, or
+ // if we assume the buffer is a subset of the original, we need to
+ // update all the entries to have correct new offsets into the buffer
+ // and remove entries that are outside the new range.
+ // For now, zero out the LookupTable and behave as if this is a simple
+ // DataExtractor.
+ DataExtractor::SetData(bytes, length, byte_order);
+ m_lookup_table.Clear();
+ m_lookup_table.Append(
+ VirtualDataExtractor::LookupTable::Entry{0, GetPhysicalByteSize(), 0});
+
+ return GetVirtualByteSize();
+}
+
+offset_t VirtualDataExtractor::SetData(const DataExtractor &data,
+ lldb::offset_t offset,
+ lldb::offset_t length) {
+ // Invoked from the base class ctor
+ if (!m_data_sp || m_start == nullptr)
+ return DataExtractor::SetData(data, offset, length);
+
+ // A no-op SetData that is setting the same data buffer again
+ if (m_data_sp && data.GetSharedDataBuffer().get() == m_data_sp.get() &&
+ offset == 0 && length == GetVirtualByteSize())
+ return GetVirtualByteSize();
+ assert("SetData(2) called on VirtualDataExtractor that already had data" &&
+ false);
+
+ // FIXME calling SetData on a VirtualDataExtractor that already has a
+ // data buffer means the LookupTable needs to be either replaced, or
+ // if we assume the buffer is a subset of the original, we need to
+ // update all the entries to have correct new offsets into the buffer
+ // and remove entries that are outside the new range.
+ // For now, zero out the LookupTable and behave as if this is a simple
+ // DataExtractor.
+ DataExtractor::SetData(data, offset, length);
+ m_lookup_table.Clear();
+ m_lookup_table.Append(
+ VirtualDataExtractor::LookupTable::Entry{0, GetPhysicalByteSize(), 0});
+
+ return GetVirtualByteSize();
+}
+
+offset_t VirtualDataExtractor::SetData(const lldb::DataBufferSP &data_sp,
+ lldb::offset_t offset,
+ lldb::offset_t length) {
+ // Invoked from the base class ctor
+ if (!m_data_sp || m_start == nullptr)
+ return DataExtractor::SetData(data_sp, offset, length);
+
+ // A no-op SetData that is setting the same data buffer again
+ if (m_data_sp && data_sp.get() == m_data_sp.get() && offset == 0 &&
+ length == GetVirtualByteSize())
+ return GetVirtualByteSize();
+
+ assert("SetData(3) called on VirtualDataExtractor that already had data" &&
+ false);
+
+ // FIXME calling SetData on a VirtualDataExtractor that already has a
+ // data buffer means the LookupTable needs to be either replaced, or
+ // if we assume the buffer is a subset of the original, we need to
+ // update all the entries to have correct new offsets into the buffer
+ // and remove entries that are outside the new range.
+ // For now, zero out the LookupTable and behave as if this is a simple
+ // DataExtractor.
+ DataExtractor::SetData(data_sp, offset, length);
+ m_lookup_table.Clear();
+ m_lookup_table.Append(
+ VirtualDataExtractor::LookupTable::Entry{0, GetPhysicalByteSize(), 0});
+
+ return GetVirtualByteSize();
+}
+
+uint64_t VirtualDataExtractor::GetVirtualByteSize() const {
+ offset_t lowest = -1ULL;
+ offset_t highest = 0;
+ for (const auto ent : m_lookup_table) {
+ lowest = std::min(lowest, ent.base);
+ highest = std::max(highest, ent.base + ent.size);
+ }
+ return highest - lowest;
+}
+
+uint64_t VirtualDataExtractor::GetPhysicalByteSize() const {
+ return m_end - m_start;
+}
+
+offset_t VirtualDataExtractor::VirtualBytesLeft(offset_t virtual_offset) const {
+ const offset_t size = GetVirtualByteSize();
+ if (size > virtual_offset)
+ return size - virtual_offset;
+ return 0;
+}
+
+offset_t
+VirtualDataExtractor::PhysicalBytesLeft(offset_t physical_offset) const {
+ const offset_t size = m_end - m_start;
+ if (size > physical_offset)
+ return size - physical_offset;
+ return 0;
+}
+
const uint8_t *VirtualDataExtractor::PeekData(offset_t offset,
offset_t length) const {
// Override to treat offset as virtual address.
More information about the lldb-commits
mailing list