[Lldb-commits] [PATCH] D62503: Add ReadCStringFromMemory for faster string reads

Wed Jun 12 22:00:15 PDT 2019

aadsm marked an inline comment as done.
aadsm added a comment.

Yeah, that's a good question and I did think about it at the time I wrote D62715 <https://reviews.llvm.org/D62715>. Here's my rationale:
Regarding the chunk reading in the ReadMemory I was not sure because I don't know how common that is and it is faster to read the entire size without chunking (the code is also less complex this way). I'd imagine that in the common case we usually know what we're reading? (e.g.: reading the entire function assembly code)
That's why I thought about the string "extraction" as a special case for memory reading. However, even if we had a ReadMemory that did chunking I'd still want to do chunking in the ReadCStringFromMemory. Otherwise we'd be reading much more than needed for a string because there is no way for the ReadMemory to know it could stop (after finding a '\0') so it might end up reading say 2KB word by word with ptrace. (I did think for 5s to pass an optional stop character to ReadMemory but that would be shoehorning an edge case into a generic and simple function, it would be better to put the chunk reading logic into a separate thing that both ReadMemory and ReadCStringFromMemory could reuse).

Valid concern on other platforms that may have better ptrace implementations. @clayborg you suggested to move it from Linux to the Protocol class, any thoughts on this?

================
Comment at: lldb/unittests/Host/NativeProcessProtocolTest.cpp:128-133
+  EXPECT_THAT_ERROR(
+      Process.ReadCStringFromMemory(0x0, &string[0], sizeof(string), bytes_read)
+          .ToError(),
+      llvm::Succeeded());
+  EXPECT_STREQ(string, "hel");
+  EXPECT_EQ(bytes_read, 3UL);
----------------
labath wrote:
> I'm wondering how will the caller know that the read has been truncated. Given that ReadMemory already returns an error in case of partial reads, maybe we could do the same here ? (return an error, but still set bytes_read and the string as they are now). What do you think ?
I'm a bit hesitant on this one, because it's different semantics.
ReadMemory returns an error when it was not able to read everything that it was told to.

In this test case we were able to read everything but didn't find a null terminator. I decided to set a null terminator in this case and say that we read 1 less byte in order to 1) actually have a C string (I don't want people to strlen it and fail) and 2) allow people to read in chunks if they want (however, I don't know how useful this will be if at all) or allow them to partially read a string or even to realize that the string is larger than originally expected.

There is a way to know if it didn't read a full string with `strlen(string) == bytes_read` but that'd be O(n). You can also do `bytes_read == sizeof-1 && string[size-2] != '\0'` but it kind of sucks.
I have no good solution here :D unless we want to return an enum with the 3 different options.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62503/new/

https://reviews.llvm.org/D62503