[Lldb-commits] [PATCH] D73860: [lldb/StringPrinter] Avoid reading garbage in uninitialized strings

Tue Feb 4 01:35:30 PST 2020

teemperor requested changes to this revision.
teemperor added inline comments.

================
Comment at: lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp:8
+// A corrupt libcxx string which points to garbage and has a crazy length.
+static unsigned char garbage_string_long[] = {185, 52, 168, 29, 1, 0, 0, 0, 168, 61, 175, 29, 1, 0, 0, 0, 104, 222, 174, 29, 1, 0, 0, 0};
+
----------------
I think those byte arrays need a quick comment about which elements mean what (or how they trigger the respective code paths). Just pointing out which bytes are supposed to overwrite which `std::string` members is good enough. Something like a macro maybe? `#define STD_STRING_BYTES(cap, size, length) {cap, size, length}`

================
Comment at: lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp:29
+    if (sizeof(std::string) == sizeof(garbage_string_sso))
+      memcpy((void *)&garbage1, &garbage_string_sso, sizeof(std::string));
+    if (sizeof(std::string) == sizeof(garbage_string_long))
----------------
shafik wrote:
> vsk wrote:
> > shafik wrote:
> > > While I get what you are doing here, we know he structure of libc++ SSO implementation and we are manually building a corrupt one, this is fragile to changes in the implementation. 
> > > 
> > > I don't have an immediate suggestion for an alternative approach but if we stick with this we should stick a big comment explaining this, perhaps laying out the assumptions of the internal layout we are assuming and maybe some sanity checks maybe using `offsetof` to verify fields exist and are where we expect them to be.
> > I don't see how this is fragile. The structure of libc++'s SSO implementation is ABI, and is unlikely to change (esp. not in a way that turns either one of the garbage strings into a valid string). I've left comments explaining what's wrong with both of the garbage strings, but can leave a pointer to https://joellaity.com/2020/01/31/string.html for more info?
> Sure, that note would be fine.
Can you instead do a `#if _LIBCPP_ABI_VERSION == 1` and have the #else as an #error that this test needs updating. We don't support any other libc++ ABI beside 1 in LLDB but if we ever do then this should not silently pass.

================
Comment at: lldb/source/DataFormatters/StringPrinter.cpp:73
                                                           uint8_t *&next) {
+  assert(isInHalfOpenRange(buffer, buffer, buffer_end) &&
+         "Cannot read the first byte of ASCII string buffer");
----------------
Isn't this just `assert(buffer<buffer_end)`? That's less confusing IMHO (and I think in general this check can be in `GetPrintable` as this should always be true for all `GetPrintableImpl`).

================
Comment at: lldb/source/DataFormatters/StringPrinter.cpp:140
                                                          uint8_t *&next) {
+  assert(isInHalfOpenRange(buffer, buffer, buffer_end) &&
+         "Cannot read the first byte of UTF8 string buffer");
----------------
Same as above.

================
Comment at: lldb/source/DataFormatters/StringPrinter.cpp:149
+  if ((utf8_encoded_len == 0 || utf8_encoded_len > 4) ||
+      !isInHalfOpenRange(buffer + (utf8_encoded_len - 1), buffer, buffer_end))
     return retval;
----------------
Isnt' `!isInHalfOpenRange(buffer + (utf8_encoded_len - 1), buffer, buffer_end))` just `buffer + (utf8_encoded_len - 1U) < buffer_end`? `utf8_encoded_len` is always positive so the check if it adding it to `buffer` makes it smaller than `buffer` can only happen with an integer overflow IIUC (which we probably should check against more explicitly then).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73860/new/

https://reviews.llvm.org/D73860