[Lldb-commits] [PATCH] D78396: [lldb/Dataformatter] Add support for CoreFoundation Dictionaries and Sets

Wed Apr 22 00:30:01 PDT 2020

labath added inline comments.

================
Comment at: lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp:47-62
+  DataBufferHeap buffer(size, 0);
+  m_exe_ctx_ref.GetProcessSP()->ReadMemory(addr, buffer.GetBytes(), size,
+                                           error);
+  if (error.Fail())
+    return false;
+
+  DataExtractor data_extractor = DataExtractor(
----------------
mib wrote:
> labath wrote:
> > Why don't you just pass `m_ht.get()` to `ReadMemory` ?
> IIUC, `ReadMemory` will read memory matching the inferior endianness. That's why, I'm using a `DataExtractor`, to translate, the copied bytes to the host endianness.
Well, I am glad that you are thinking about non-native endianness, but I am afraid things just don't work that way. There's no way an API like `CopyData(void *, size_t) ` can automatically fix endianness problems. Why? Because the fix depends on the structure of the data.

Imagine this sequence of bytes: `00 01 02 03 04 05 06 07`. If this sequence really represents a sequence of (little-endian)  bytes (`uint8_t`s), then the corresponding "big-endian" sequence would be identical. If, however, it represents 2-byte words (uint16_t), then the big-endian representation would be `01 00 03 02 05 04 07 06`. For 4-byte words, it would be `03 02 01 00 07 06 05 04`, etc.

In short, you need to know the structure of the data to translate endiannes, and `CopyData` does not know that. Indeed, if you look at the implementation of that function you'll see that it just does a `memcpy`.

My suggestion for reading directly into the object was based on the assumption that are fine with things working only if everything is of the same endianness. I wouldn't have allowed that in more cross-platform code, but I didn't feel like policing all corners of lldb.

However, if you do want to make this endian-correct (which I do encourage), then you have a couple of options:
- ditch the structs and use data extractor functions to read (GetU32, GetAddress, etc) the individual fields -- these know the size of the object they are accessing, and can adjust endianness appropriately
- compare host and target endianness and call llvm::sys::swapByteOrder on the individual fields if they differ
- use llvm's `packed_endian_specific_integral` instead of native types in the struct definitions. This would require passing the endiannes as a template parameter and then dispatching to the appropriate struct based on the runtime value similar to how you've already done for byte sizes.

Each of these options has its drawbacks, but I've seen them all places throughout llvm. In this particular case, I have a feeling the simplest option would be to go the DataExtractor route...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78396/new/

https://reviews.llvm.org/D78396