<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath <<a href="mailto:pavel@labath.sk">pavel@labath.sk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:<br>

> Hi Everyone,<br>

> <br>

> In September I sent out an RFC [1] about adding reproducers to LLDB. <br>

> Over the<br>

> past few months, I landed the reproducer framework, support for the GDB <br>

> remote<br>

> protocol and a bunch of preparatory changes. There's still an open code <br>

> review<br>

> [2] for dealing with files, but that one is currently blocked by a change to<br>

> the VFS in LLVM [3].<br>

> <br>

> The next big piece of work is supporting user commands (e.g. in the <br>

> driver) and<br>

> SB API calls. Originally I expected these two things to be separate, but <br>

> Pavel<br>

> made a good case [4] that they're actually very similar.<br>

> <br>

> I created a prototype of how I envision this to work. As usual, we can<br>

> differentiate between capture and replay.<br>

> <br>

> ## SB API Capture<br>

> <br>

> When capturing a reproducer, every SB function/method is instrumented <br>

> using a<br>

> macro at function entry. The added code tracks the function identifier<br>

> (currently we use its name with __PRETTY_FUNCTION__) and its arguments.<br>

> <br>

> It also tracks when a function crosses the boundary between internal and<br>

> external use. For example, when someone (be it the driver, the python <br>

> binding<br>

> or the RPC server) call SBFoo, and in its implementation SBFoo calls <br>

> SBBar, we<br>

> don't need to record SBBar. When invoking SBFoo during replay, it will <br>

> itself<br>

> call SBBar.<br>

> <br>

> When a boundary is crossed, the function name and arguments are <br>

> serialized to a<br>

> file. This is trivial for basic types. For objects, we maintain a table that<br>

> maps pointer values to indices and serialize the index.<br>

> <br>

> To keep our table consistent, we also need to track return for functions <br>

> that<br>

> return an object by value. We have a separate macro that wraps the returned<br>

> object.<br>

> <br>

> The index is sufficient because every object that is passed to a <br>

> function has<br>

> crossed the boundary and hence was recorded. During replay (see below) <br>

> we map<br>

> the index to an address again which ensures consistency.<br>

> <br>

> ## SB API Replay<br>

> <br>

> To replay the SB function calls we need a way to invoke the corresponding<br>

> function from its serialized identifier. For every SB function, there's a<br>

> counterpart that deserializes its arguments and invokes the function. These<br>

> functions are added to the map and are called by the replay logic.<br>

> <br>

> Replaying is just a matter looping over the function identifiers in the<br>

> serialized file, dispatching the right deserialization function, until <br>

> no more<br>

> data is available.<br>

> <br>

> The deserialization function for constructors or functions that return <br>

> by value<br>

> contains additional logic for dealing with the aforementioned indices. The<br>

> resulting objects are added to a table (similar to the one described <br>

> earlier)<br>

> that maps indices to pointers. Whenever an object is passed as an <br>

> argument, the<br>

> index is used to get the actual object from the table.<br>

> <br>

> ## Tool<br>

> <br>

> Even when using macros, adding the necessary capturing and replay code is<br>

> tedious and scales poorly. For the prototype, we did this by hand, but we<br>

> propose a new clang-based tool to streamline the process.<br>

> <br>

> For the capture code, the tool would validate that the macro matches the<br>

> function signature, suggesting a fixit if the macros are incorrect or <br>

> missing.<br>

> Compared to generating the macros altogether, it has the advantage that we<br>

> don't have "configured" files that are harder to debug (without faking line<br>

> numbers etc).<br>

> <br>

> The deserialization code would be fully generated. As shown in the prototype<br>

> there are a few different cases, depending on whether we have to account for<br>

> objects or not.<br>

> <br>

> ## Prototype Code<br>

> <br>

> I created a differential [5] on Phabricator with the prototype. It <br>

> contains the<br>

> necessary methods to re-run the gdb remote (reproducer) lit test.<br>

> <br>

> ## Feedback<br>

> <br>

> Before moving forward I'd like to get the community's input. What do you <br>

> think<br>

> about this approach? Do you have concerns or can we be smarter <br>

> somewhere? Any<br>

> feedback would be greatly appreciated!<br>

> <br>

> Thanks,<br>

> Jonas<br>

> <br>

> [1] <a href="http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html</a><br>

> [2] <a href="https://reviews.llvm.org/D54617" rel="noreferrer" target="_blank">https://reviews.llvm.org/D54617</a><br>

> [3] <a href="https://reviews.llvm.org/D54277" rel="noreferrer" target="_blank">https://reviews.llvm.org/D54277</a><br>

> [4] <a href="https://reviews.llvm.org/D55582" rel="noreferrer" target="_blank">https://reviews.llvm.org/D55582</a><br>

> [5] <a href="https://reviews.llvm.org/D56322" rel="noreferrer" target="_blank">https://reviews.llvm.org/D56322</a><br>

> <br>

> _______________________________________________<br>

> lldb-dev mailing list<br>

> <a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br>

> <br>

<br></blockquote><div><br></div><div>Thanks for the feedback Pavel! </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

[Adding Tamas for his experience with recording and replaying APIs.]<br>

<br>

<br>

Thank you for sharing the prototype Jonas. It looks very interesting, <br>

but there are a couple of things that worry me about it.<br>

<br>

The first one is the usage of __PRETTY_FUNCTION__. That sounds like a <br>

non-starter even for an initial implementation, as the string that <br>

expands to is going to differ between compilers (gcc and clang will <br>

probably agree on it, but I know for a fact it will be different on <br>

msvc). It that was just an internal property of the serialization <br>

format, then it might be fine, but it looks like you are hardcoding the <br>

values in code to connect the methods with their replayers, which is <br>

going to be a problem.<br></blockquote><div><br></div><div>I should've made it clear that I didn't intend to have that checked in. Originally the idea was that we would always use the clang-based tool to generate the IDs (the __PRETTY_FUNCTION__ was just a placeholder for the prototype). I didn't give that much more thought, but if the tool would not generate the ID but just check the macro, that's becomes a valid concern again. </div><div>  </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

I've been thinking about how could this be done better, and the best <br>

(though not ideal) way I came up with is using the functions address as <br>

the key. That's guaranteed to be unique everywhere. Of course, you <br>

cannot serialize that to a file, but since you already have a central <br>

place where you list all intercepted functions (to register their <br>

replayers), that place can be also used to assign unique integer IDs to <br>

these functions. So then the idea would be that the SB_RECORD macro <br>

takes the address of the current function, that gets converted to an ID <br>

in the lookup table, and the ID gets serialized.<br></blockquote><div><br></div><div>It sound like you would generate the indices at run-time. How would that work with regards to the the reverse mapping? </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

The part that bugs me about this is that taking the address of an <br>

overloaded function is extremely tedious (you have to write something <br>

like static_cast<function_prototype>(&SBFoo::Bar)). That would mean all <br>

of these things would have to be passed to the RECORD macro. OTOH, the <br>

upshot of this would be that the macro would now have sufficient <br>

information to perform pretty decent error checking on its invocation. <br>

Another nice about this could be that once you already have a prototype <br>

and an address of the function, it should be possible (with sufficient <br>

template-fu) to synthesize replay code for the function automatically, <br>

at least in the simple cases, which would avoid the repetitiveness of <br>

the current replay code. Together, this might obviate the need for any <br>

clang plugins or other funny build steps.<br></blockquote><div><br></div><div>See my previous question. I see how the signature would help with decoding but still missing how you'd get the mapping.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

The second thing I noticed is the usage of pointers for identifying <br>

object. A pointer is good for that but only while the object it points <br>

to is alive. Once the object is gone, the pointer can (and most likely <br>

will) be reused. So, it sounds to me like you also need to track the <br>

lifetime of these objects. That may be as simple as intercepting <br>

constructor/destructor calls, but I haven't seen anything like that yet <br>

(though I haven't looked at all details of the patch).<br></blockquote><div><br></div><div>This shouldn't be a problem. When a new object is created it will be recorded in the table with a new identifier. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

Tying into that is the recording of return values. It looks like the <br>

current RECORD_RETURN macro will record the address of the temporary <br>

object in the frame of the current function. However, that address will <br>

become invalid as soon as the function returns as the result object will <br>

be copied into a location specified by the caller as a part of the <br>

return processing. Are you handling this in any way?<br></blockquote><div><br></div><div>We capture the temporary and the call to the copy-assignment constructor. This is not super efficient but it's the best we can do.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

The final thing, which I noticed is the lack of any sign of threading <br>

support. I'm not too worried about that, as that sounds like something <br>

that could be fitted into the existing framework incrementally, but it <br>

is something worth keeping in mind, as you're going to run into that <br>

pretty soon.<br></blockquote><div><br></div><div>Yup, I've intentially ignored this for now. </div></div></div>