[Lldb-commits] [PATCH] D62503: Add ReadCStringFromMemory for faster string reads

Thu Jun 13 01:21:46 PDT 2019

labath added a comment.

In D62503#1540959 <https://reviews.llvm.org/D62503#1540959>, @aadsm wrote:

> Yeah, that's a good question and I did think about it at the time I wrote D62715 <https://reviews.llvm.org/D62715>. Here's my rationale:
>  Regarding the chunk reading in the ReadMemory I was not sure because I don't know how common that is and it is faster to read the entire size without chunking (the code is also less complex this way). I'd imagine that in the common case we usually know what we're reading? (e.g.: reading the entire function assembly code)
>  That's why I thought about the string "extraction" as a special case for memory reading. However, even if we had a ReadMemory that did chunking I'd still want to do chunking in the ReadCStringFromMemory. Otherwise we'd be reading much more than needed for a string because there is no way for the ReadMemory to know it could stop (after finding a '\0') so it might end up reading say 2KB word by word with ptrace. (I did think for 5s to pass an optional stop character to ReadMemory but that would be shoehorning an edge case into a generic and simple function, it would be better to put the chunk reading logic into a separate thing that both ReadMemory and ReadCStringFromMemory could reuse).

Ok, I think I understand what you mean. It does make sense to do chunked reads here, as most strings will be relatively short in most cases, so it does not make sense to read full 4k bytes only to find the null terminator on byte 2. However, in this case one would expect to have some fixed chunk size (128, or whatever would be the "common" string length), and read the data like that. So seeing page size being used as input here is surprising (and it means you can still end up reading 4k bytes slowly, if you happen to start reading near the beginning of a page). However, I also get why you're implementing it this way -- you're betting on the fact that most read will end up in the "fast" memory, and just want to avoid wandering into the "slow" part needlessly. That sounds like a reasonable thing to assume. So overall, I think you've convinced me about this, and we can keep this implementation, until evidence saying otherwise shows up.

================
Comment at: lldb/unittests/Host/NativeProcessProtocolTest.cpp:100
+
+TEST(NativeProcessProtocolTest, ReadCStringFromMemory) {
+  NiceMock<MockDelegate> DummyDelegate;
----------------
It would be nice to have one more test where reads cross the page boundary, to make sure chunks are concatenated properly.

================
Comment at: lldb/unittests/Host/NativeProcessProtocolTest.cpp:128-133
+  EXPECT_THAT_ERROR(
+      Process.ReadCStringFromMemory(0x0, &string[0], sizeof(string), bytes_read)
+          .ToError(),
+      llvm::Succeeded());
+  EXPECT_STREQ(string, "hel");
+  EXPECT_EQ(bytes_read, 3UL);
----------------
aadsm wrote:
> labath wrote:
> > I'm wondering how will the caller know that the read has been truncated. Given that ReadMemory already returns an error in case of partial reads, maybe we could do the same here ? (return an error, but still set bytes_read and the string as they are now). What do you think ?
> I'm a bit hesitant on this one, because it's different semantics.
> ReadMemory returns an error when it was not able to read everything that it was told to.
> 
> In this test case we were able to read everything but didn't find a null terminator. I decided to set a null terminator in this case and say that we read 1 less byte in order to 1) actually have a C string (I don't want people to strlen it and fail) and 2) allow people to read in chunks if they want (however, I don't know how useful this will be if at all) or allow them to partially read a string or even to realize that the string is larger than originally expected.
> 
> There is a way to know if it didn't read a full string with `strlen(string) == bytes_read` but that'd be O(n). You can also do `bytes_read == sizeof-1 && string[size-2] != '\0'` but it kind of sucks.
> I have no good solution here :D unless we want to return an enum with the 3 different options.
> In this test case we were able to read everything but ...
Is that really true? We were told to read a c string. We did not find a c string, so instead we chose to mangle the data that we have read into a c string. That seems a bit error-ish.

I also find it confusing that in this (truncated) case the returned value corresponds with the length of the string, while in the "normal" case in includes the nul terminator. I think it would be more natural to do it the other way around, as that would be similar to what snprintf does.

How about we avoid this issue entirely, and have the function return `Expected<StringRef>` (the StringRef being backed by the buffer you pass in as the argument) instead? This way you don't have to do any null termination, as the StringRef carries the length implicitly. And one can tell that the read may be incomplete by comparing the  stringref size with the input buffer size.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62503/new/

https://reviews.llvm.org/D62503