<div dir="ltr">Thanks Enrico. This is very detailed! I will take a look. <div>Btw: originally, I was hoping that data formatter can be added without changing the source code. Like giving a xml/json format file telling lldb the memory layout/structure of the data structure, lldb can parse the xml/json and deduce the formatting. This is approach used by data visualizer in VS debugger: <a href="https://msdn.microsoft.com/en-us/library/jj620914.aspx">https://msdn.microsoft.com/en-us/library/jj620914.aspx</a></div><div>This will make adding data formatter more extensible/flexible. Any reason we did not take this approach? </div><div><br></div><div>Jeffrey</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 6, 2016 at 11:49 AM, Enrico Granata <span dir="ltr"><<a href="mailto:egranata@apple.com" target="_blank">egranata@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Apr 5, 2016, at 2:42 PM, Jeffrey Tan <<a href="mailto:jeffrey.fudan@gmail.com" target="_blank">jeffrey.fudan@gmail.com</a>> wrote:</div><br><div><div dir="ltr">Hi Enrico,<div><br></div><div>Any suggestion/example how to add a data formatter for our own STL string? From the output below I can see we are using our own "<b>fbstring_core</b>" which I assume I need to write a type summary for this type:</div><div><br></div><div><div>frame variable corpus -T</div><div>(const string &const) corpus = error: summary string parsing error: {</div><div>  (std::<b>fbstring_core</b><char>) store_ = {</div><div>    (std::<b>fbstring_core</b><char>::(anonymous union))  = {</div><div>      (char [24]) small_ = "www"</div><div>      (std::fbstring_core<char>::MediumLarge) ml_ = {</div><div>        (char *) data_ = 0x0000000000777777 "H\x89U\xa8H\x89M\xa0L\x89E\x98H\x8bE\xa8H\x89��_U��D\x88e�H\x8bE\xa0H\x89��]U��H\x89�H\x8dE�H\x89�H\x89��� ��L\x8dm�H\x8bE\x98H\x89��IU��\x88]�L\x8be\xb0L\x89��</div><div>        (std::size_t) size_ = 0</div><div>        (std::size_t) capacity_ = 1441151880758558720</div><div>      }</div><div>    }</div><div>  }</div><div>}</div></div><div><br></div></div></div></blockquote><div><br></div></span><div>Admittedly, this is going to be a little vague since I haven’t really seen your code and I am only working off of one sample</div><div><br></div><div>There’s going to be two parts to getting this to work:</div><div><br></div><div><b>Part 1 - Formatting fbstring_core</b></div><div><br></div><div>At a glance, an fbstring_core<char> can be backed by two representations. A “small” representation (a char array), and a “medium/large" representation (a char* + a size)</div><div>I assume that the way you tell one from the other is</div><div><br></div><div>if (size == 0) small</div><div>else medium-large</div><div><br></div><div>If my assumption is not correct, you’ll need to discover what the correct discriminator logic is - the class has to know, and so do you :-)</div><div><br></div><div>Armed with that knowledge, look in lldb source/Plugins/Language/CPlusPlus/Formatters/LibCxx.cpp</div><div>There’s a bunch of code that deals with formatting llvm’s libc++ std::string - which follows a very similar logic to your class</div><div><br></div><div><div style="margin:0px;font-size:13px;line-height:normal;font-family:'CMU Typewriter Text'"><span>ExtractLibcxxStringInfo() </span>is the function that handles discovering which layout the string uses - where the data lives - and how much data there is</div></div><div><br></div><div>Once you have told yourself how much data there is (the size) and where it lives (array or pointer), <span style="font-family:'CMU Typewriter Text';font-size:13px">LibcxxStringSummaryProvider() </span>has the easy task - it sets up a StringPrinter, tells it how much data to print, where to get it from, and then delegates the StringPrinter to do the grunt work</div><div>StringPrinter is a nifty little tool - it can handle generating summaries for different kinds of strings (UTF8? UTF16? we got it - is a \0 a terminator? what quote character would you like? …) - you point it at some data, set up a few options, and it will generate a printable representation for you - if your string type is doing anything out of the ordinary, let’s talk - I am definitely open to extending StringPrinter to handle even more magic</div><div><br></div><div><b>Part 2 - Teaching std::string that it can be backed by an fbstring_core</b></div><div><br></div><div>At the end of part 1, you’ll probably end up with a FBStringCoreSummaryProvider() - now you need to teach LLDB about it</div><div>The obvious thing you could do would be to go in <span style="color:rgb(79,129,135);font-family:'CMU Typewriter Text';font-size:13px">CPlusPlusLanguage</span><span style="font-family:'CMU Typewriter Text';font-size:13px">::GetFormatters() </span>add a LoadFBStringFormatter(g_category) to it - and then imitate - say - <span style="font-family:'CMU Typewriter Text';font-size:13px">LoadLibCxxFormatters()</span></div><div><br></div><div><div style="margin:0px;font-size:13px;line-height:normal;font-family:'CMU Typewriter Text';color:rgb(209,47,27)"><span style="color:#000000">    </span><span style="color:#31595d">AddCXXSummary</span><span style="color:#000000">(cpp_category_sp, </span><span style="color:#4f8187">lldb_private</span><span style="color:#000000">::</span><span style="color:#4f8187">formatters</span><span style="color:#000000">::</span><span style="color:#31595d">FBStringCoreSummaryProvider</span><span style="color:#000000">, </span><span>“fbstringcore summary provider"</span><span style="color:#000000">, </span><span style="color:#4f8187">ConstString</span><span style="color:#000000">(</span><span>“std::fbstring_core<.+>"</span><span style="color:#000000">), stl_summary_flags, </span><span style="color:#bb2ca2">true</span><span style="color:#000000">);</span></div><div><span style="color:#000000"><br></span></div><div><span style="color:#000000">That will work - but what you would see is:</span></div><div><span style="color:#000000"><br></span></div><div><span style="color:#000000"><blockquote type="cite"><div dir="ltr"><div><span class=""><div>(const string &const) corpus = error: summary string parsing error: {</div></span><div>  (std::<b>fbstring_core</b><char>) store_ = “www"</div></div></div></blockquote><div><div dir="ltr"><div><div><br></div><div>You wanna do</div><div><br></div><div>(lldb) log enable lldb formatters</div><div>(lldb) frame variable -T corpus</div><div><br></div><div>It will list one or more typenames - the most specific one is the one you like (e.g. for libc++ we get std::__1::string - this is how we tell ourselves this is the std::string from libc++)</div><div>Once you find that typename, you’ll make a new formatter - FBStringSummaryProvider() - and register that formatter with that very specific typename</div><div><br></div><div>All that FBStringSummaryProvider() has to do is get the “store_” member (ValueObject::GetChildMemberWithName() is your friend) - and pass it down to FBStringCoreSummaryProvider()</div><div><br></div><div><br></div><div>I understand this may seem a little convoluted and arcane at first - but feel free to ask more questions, and I’ll try to help out!</div></div></div></div></span></div></div><div><div class="h5"><br><blockquote type="cite"><div><div dir="ltr"><div>Thanks.</div><div>Jeffrey<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 28, 2016 at 11:38 AM, Enrico Granata <span dir="ltr"><<a href="mailto:egranata@apple.com" target="_blank">egranata@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>This is kind of orthogonal to your problem, but the reason why you are not seeing the kind of simplified printing Greg is suggesting, is because your std::string doesn’t look like any of the kinds we recognize</div><div><br></div><div>Specifically, LLDB data formatters work by matching against type names, and once they recognize a typename, then they try to inspect the variable in order to grab a summary</div><div>In your example, your std::string exposes a layout that we are not handling - hence we bail out of the formatter and we fall back to the raw view</div><div><br></div><div>If you want pretty printing to work, you’ll need to write a data formatter</div><div><br></div><div>There are a few avenues. The obvious easy one is to extend the existing std::string formatter to recognize your type’s internal layout.</div><div>If one were signing up for more infrastructure work, they could decide to try and detect shared library loads and load formatters that match with whatever libraries are being loaded.</div><div><div><br><div><blockquote type="cite"><div>On Mar 28, 2016, at 9:47 AM, Greg Clayton via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a>> wrote:</div><br><div><div>So you need to be prepared to escape any text that can have special characters. A "std::string" or any container can contain special characters. If you are encoding stuff into JSON, you will either need to escape any special characters, or hex encode the string into ASCII hex bytes. <br><br>In debuggers we often get bogus data because variables are not initialized, but the compiler tells us that a variable is valid in address range [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a variable in this case, a std::string might contain bogus data and the bytes might not make sense. So you always have to be prepared for bad data.<br><br>If we look at:<br><br>  store_ = {<br>     = {<br>      small_ = "www"<br>      ml_ = (data_ =<br>"��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",<br>size_ = 0, capacity_ = 1441151880758558720)<br>    }<br>  }<br>}<br><br>We can see the "size_" is zero, and capacity_ is 1441151880758558720 (which is 0x1400000000000000). "data_" seems to be some random pointer. <br><br>On MacOSX, we have a special formatting code that displays std::string in CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or LoadLibStdcppFormatters() functions with code like:<br><br>    lldb::TypeSummaryImplSP std_string_summary_sp(new CXXFunctionSummaryFormat(stl_summary_flags, lldb_private::formatters::LibcxxStringSummaryProvider, "std::string summary provider"));<br>    cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"), std_string_summary_sp);<br><br>Special flags are set on std::string to say "don't show children of this and just show a summary" So if a std::string contained "hello". So for the following code:<br><br>std::string h ("hello");<br><br>You should just see:<br><br>(lldb) fr var h<br>(std::__1::string) h = "hello"<br><br>If you take a look at the normal value in the raw we see:<br><br>(lldb) fr var --raw h<br>(std::__1::string) h = {<br>  __r_ = {<br>    std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, std::__1::allocator<char>, 2> = {<br>      __first_ = {<br>         = {<br>          __l = {<br>            __cap_ = 122511465736202<br>            __size_ = 0<br>            __data_ = 0x0000000000000000<br>          }<br>          __s = {<br>             = {<br>              __size_ = '\n'<br>              __lx = '\n'<br>            }<br>            __data_ = {<br>              [0] = 'h'<br>              [1] = 'e'<br>              [2] = 'l'<br>              [3] = 'l'<br>              [4] = 'o'<br>              [5] = '\0'<br>              [6] = '\0'<br>              [7] = '\0'<br>              [8] = '\0'<br>              [9] = '\0'<br>              [10] = '\0'<br>              [11] = '\0'<br>              [12] = '\0'<br>              [13] = '\0'<br>              [14] = '\0'<br>              [15] = '\0'<br>              [16] = '\0'<br>              [17] = '\0'<br>              [18] = '\0'<br>              [19] = '\0'<br>              [20] = '\0'<br>              [21] = '\0'<br>              [22] = '\0'<br>            }<br>          }<br>          __r = {<br>            __words = {<br>              [0] = 122511465736202<br>              [1] = 0<br>              [2] = 0<br>            }<br>          }<br>        }<br>      }<br>    }<br>  }<br>}<br><br>So the main question is why are our "std::string" formatters not kicking in for you. That comes down to a typename match, or the format of the string isn't what the formatter is expecting.<br><br>But again, since you std::string can contain anything, you will need to escape any and all text that is encoded into JSON to ensure it doesn't contain anything JSON can't deal with.<br><br><blockquote type="cite">On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a>> wrote:<br><br>Thanks Siva. All the DW_TAG_member related errors seems to go away after patching with your fix. The current problem is handling the decoding. <br><br>Here is the correct decoding from gdb whic might be useful:<br>(gdb) p corpus<br>$3 = (const std::string &) @0x7fd133cfb888: {<br>  static npos = 18446744073709551615, store_ = {<br>    static kIsLittleEndian = <optimized out>,<br>    static kIsBigEndian = <optimized out>, {<br>      small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = {<br>        data_ = 0x777777 <std::_Any_data::_M_access<void folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, void>::type::value_type folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&, folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, void>::type::value_type folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}, void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() const+25> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b", size_ = 0,<br>        capacity_ = 1441151880758558720}}}}<br><br>Utf-16 does not seem to decode it, while 'latin-1' does:<br><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">'\xc9'.decode('utf-16')<br></blockquote></blockquote></blockquote>Traceback (most recent call last):<br>  File "<stdin>", line 1, in <module><br>  File "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py", line 16, in decode<br>    return codecs.utf_16_decode(input, errors, True)<br>UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: truncated data<br><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">'\xc9'.decode('latin-1')<br></blockquote></blockquote></blockquote>u'\xc9'<br><br>Instead of guessing what kind of decoding I should use, I would use 'ensure_ascii=False' to prevent the crash for now.<br><br>I tried to reproduce this crash, but it seems that the crash might be related with some internal stl implementation we are using. I will see if I can narrow down to a small repro later. <br><br>Thanks<br>Jeffrey<br><br>On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <<a href="mailto:sivachandra@gmail.com" target="_blank">sivachandra@gmail.com</a>> wrote:<br>On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <<a href="mailto:jeffrey.fudan@gmail.com" target="_blank">jeffrey.fudan@gmail.com</a>> wrote:<br><blockquote type="cite">Btw: after patching with Siva's fix <a href="http://reviews.llvm.org/D18008" target="_blank">http://reviews.llvm.org/D18008</a>, the<br>first field 'small_' is fixed, however the second field 'ml_' still emits<br>garbage:<br><br>(lldb) fr v corpus<br>(const string &const) corpus = error: summary string parsing error: {<br>  store_ = {<br>     = {<br>      small_ = "www"<br>      ml_ = (data_ =<br>"��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",<br>size_ = 0, capacity_ = 1441151880758558720)<br>    }<br>  }<br>}<br></blockquote><br>Do you still see the DW_TAG_member related error?<br><br>A wild (and really wild at that) guess: Is it utf16 data that is being<br>decoded as utf8?<br><br>As David Blaikie mentioned on the other thread, it would really help<br>if you provide us with a minimal example to repro this. Atleast, repro<br>instructions.<br><br>_______________________________________________<br>lldb-dev mailing list<br><a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br></blockquote><br>_______________________________________________<br>lldb-dev mailing list<br><a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br></div></div></blockquote></div><br></div></div><div>

<div style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br>Thanks,</div><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><i>- Enrico</i><br>📩 egranata@<font color="#ff2600"></font>.com ☎️ 27683</div>

</div>

<br></div></blockquote></div><br></div>

</div></blockquote></div></div></div><div><div class="h5"><br><div>

<div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br>Thanks,</div><div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><i>- Enrico</i><br>📩 egranata@<font color="#ff2600"></font>.com ☎️ 27683</div>

</div>

<br></div></div></div></blockquote></div><br></div>