<div dir="ltr">Thanks Greg for the detailed explanation, very helpful. <div>1. Just to confirm, the weird string <span style="color:rgb(80,0,80);font-size:12.8px">displayed is because '</span><span style="color:rgb(80,0,80);font-size:12.8px">data_' points to some random memory? So what gdb displays is also some random memory content not something that more meaningful than us? I thought we(lldb) did not display std::string content well but gdb does it correct. </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">2. I guess the std::string formatter did not kick in because our company may link some special stl implementation. Let me share our binary for you to confirm.</span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">3. I dumped the content of the object we try to json.dumps() against, here is the content:</span></div><div><font color="#500050"><span style="font-size:12.8px">response: {'id': 57, 'result': {'result': [{'name': 'data_', 'value': {'type': 'object', 'description': '(char *) "\xc9\xc3UH\\x89\xe5H\x8 9}\xf8H\\x8bE\xf8]\xc3\x90UH\\x89\xe5H\x83\xec\x10H\\x89}\xf8H\\x8bE\xf8H\\x89\xc7\xe8~\\xb4\xca\xff\\x90\xc9\xc3UH\\x89\ xe5SH\\x83\xec\x18H\\x89}\xe8H\x89u\xe0H\x8bE\xe0H\x89\xc7\xe8\\x9e\xff\xff\xffH\\x8b\\x18H\\x8bE\xe8H\x89\xc7\xe8O\\xb4\ xca\xffH\\x89\xc6\xbf\\b"', 'objectId': 'RemoteObjectManager.118'}}, {'name': 'size_', 'value': {'type': 'object', 'descr iption': '(std::size_t) 0'}}, {'name': 'capacity_', 'value': {'type': 'object', 'description': '(std::size_t) 14411518807 58558720'}}]}}</span></font></div><div><span style="color:rgb(80,0,80);font-size:12.8px">So seems that the problem is json.dumps() is trying to treat the raw byte array as utf8 which failed. </span><br></div><div><span style="color:rgb(80,0,80);font-size:12.8px">So we need to figure out how to escape the raw byte array into string so that we can json.dumps() it. </span><span style="color:rgb(80,0,80);font-size:12.8px">The key question is how do we know the correct encoding of the byte array. Is my understanding correct that only the formatter has the knowledge to decode the byte array correctly? If we fail to find a type formatter(which is this case) and get a raw field with byte array, we have no knowledge of the encoding so either we have to guess one default encoding and try it or just display the raw byte array content instead of decoding it? </span><span style="color:rgb(80,0,80);font-size:12.8px"> </span></div><div><span style="color:rgb(80,0,80);font-size:12.8px"><br></span></div><div><span style="color:rgb(80,0,80);font-size:12.8px">Jeffrey</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 28, 2016 at 9:47 AM, Greg Clayton <span dir="ltr"><<a href="mailto:gclayton@apple.com" target="_blank">gclayton@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So you need to be prepared to escape any text that can have special characters. A "std::string" or any container can contain special characters. If you are encoding stuff into JSON, you will either need to escape any special characters, or hex encode the string into ASCII hex bytes.<br>
<br>
In debuggers we often get bogus data because variables are not initialized, but the compiler tells us that a variable is valid in address range [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a variable in this case, a std::string might contain bogus data and the bytes might not make sense. So you always have to be prepared for bad data.<br>
<br>
If we look at:<br>
<span class=""><br>
store_ = {<br>
= {<br>
small_ = "www"<br>
ml_ = (data_ =<br>
"��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",<br>
size_ = 0, capacity_ = 1441151880758558720)<br>
}<br>
}<br>
}<br>
<br>
</span>We can see the "size_" is zero, and capacity_ is 1441151880758558720 (which is 0x1400000000000000). "data_" seems to be some random pointer.<br>
<br>
On MacOSX, we have a special formatting code that displays std::string in CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or LoadLibStdcppFormatters() functions with code like:<br>
<br>
lldb::TypeSummaryImplSP std_string_summary_sp(new CXXFunctionSummaryFormat(stl_summary_flags, lldb_private::formatters::LibcxxStringSummaryProvider, "std::string summary provider"));<br>
cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"), std_string_summary_sp);<br>
<br>
Special flags are set on std::string to say "don't show children of this and just show a summary" So if a std::string contained "hello". So for the following code:<br>
<br>
std::string h ("hello");<br>
<br>
You should just see:<br>
<br>
(lldb) fr var h<br>
(std::__1::string) h = "hello"<br>
<br>
If you take a look at the normal value in the raw we see:<br>
<br>
(lldb) fr var --raw h<br>
(std::__1::string) h = {<br>
__r_ = {<br>
std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, std::__1::allocator<char>, 2> = {<br>
__first_ = {<br>
= {<br>
__l = {<br>
__cap_ = 122511465736202<br>
__size_ = 0<br>
__data_ = 0x0000000000000000<br>
}<br>
__s = {<br>
= {<br>
__size_ = '\n'<br>
__lx = '\n'<br>
}<br>
__data_ = {<br>
[0] = 'h'<br>
[1] = 'e'<br>
[2] = 'l'<br>
[3] = 'l'<br>
[4] = 'o'<br>
[5] = '\0'<br>
[6] = '\0'<br>
[7] = '\0'<br>
[8] = '\0'<br>
[9] = '\0'<br>
[10] = '\0'<br>
[11] = '\0'<br>
[12] = '\0'<br>
[13] = '\0'<br>
[14] = '\0'<br>
[15] = '\0'<br>
[16] = '\0'<br>
[17] = '\0'<br>
[18] = '\0'<br>
[19] = '\0'<br>
[20] = '\0'<br>
[21] = '\0'<br>
[22] = '\0'<br>
}<br>
}<br>
__r = {<br>
__words = {<br>
[0] = 122511465736202<br>
[1] = 0<br>
[2] = 0<br>
}<br>
}<br>
}<br>
}<br>
}<br>
}<br>
}<br>
<br>
So the main question is why are our "std::string" formatters not kicking in for you. That comes down to a typename match, or the format of the string isn't what the formatter is expecting.<br>
<br>
But again, since you std::string can contain anything, you will need to escape any and all text that is encoded into JSON to ensure it doesn't contain anything JSON can't deal with.<br>
<div><div class="h5"><br>
> On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org">lldb-dev@lists.llvm.org</a>> wrote:<br>
><br>
> Thanks Siva. All the DW_TAG_member related errors seems to go away after patching with your fix. The current problem is handling the decoding.<br>
><br>
> Here is the correct decoding from gdb whic might be useful:<br>
> (gdb) p corpus<br>
> $3 = (const std::string &) @0x7fd133cfb888: {<br>
> static npos = 18446744073709551615, store_ = {<br>
> static kIsLittleEndian = <optimized out>,<br>
> static kIsBigEndian = <optimized out>, {<br>
> small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = {<br>
> data_ = 0x777777 <std::_Any_data::_M_access<void folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, void>::type::value_type folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&, folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, void>::type::value_type folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}, void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() const+25> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b", size_ = 0,<br>
> capacity_ = 1441151880758558720}}}}<br>
><br>
> Utf-16 does not seem to decode it, while 'latin-1' does:<br>
> >>> '\xc9'.decode('utf-16')<br>
> Traceback (most recent call last):<br>
> File "<stdin>", line 1, in <module><br>
> File "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py", line 16, in decode<br>
> return codecs.utf_16_decode(input, errors, True)<br>
> UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: truncated data<br>
> >>> '\xc9'.decode('latin-1')<br>
> u'\xc9'<br>
><br>
> Instead of guessing what kind of decoding I should use, I would use 'ensure_ascii=False' to prevent the crash for now.<br>
><br>
> I tried to reproduce this crash, but it seems that the crash might be related with some internal stl implementation we are using. I will see if I can narrow down to a small repro later.<br>
><br>
> Thanks<br>
> Jeffrey<br>
><br>
> On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <<a href="mailto:sivachandra@gmail.com">sivachandra@gmail.com</a>> wrote:<br>
> On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <<a href="mailto:jeffrey.fudan@gmail.com">jeffrey.fudan@gmail.com</a>> wrote:<br>
> > Btw: after patching with Siva's fix <a href="http://reviews.llvm.org/D18008" rel="noreferrer" target="_blank">http://reviews.llvm.org/D18008</a>, the<br>
> > first field 'small_' is fixed, however the second field 'ml_' still emits<br>
> > garbage:<br>
> ><br>
> > (lldb) fr v corpus<br>
> > (const string &const) corpus = error: summary string parsing error: {<br>
> > store_ = {<br>
> > = {<br>
> > small_ = "www"<br>
> > ml_ = (data_ =<br>
> > "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",<br>
> > size_ = 0, capacity_ = 1441151880758558720)<br>
> > }<br>
> > }<br>
> > }<br>
><br>
> Do you still see the DW_TAG_member related error?<br>
><br>
> A wild (and really wild at that) guess: Is it utf16 data that is being<br>
> decoded as utf8?<br>
><br>
> As David Blaikie mentioned on the other thread, it would really help<br>
> if you provide us with a minimal example to repro this. Atleast, repro<br>
> instructions.<br>
><br>
</div></div>> _______________________________________________<br>
> lldb-dev mailing list<br>
> <a href="mailto:lldb-dev@lists.llvm.org">lldb-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br>
<br>
</blockquote></div><br></div>