[lldb-dev] [Reproducers] SBReproducer RFC

Mon Jan 7 10:26:12 PST 2019

On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath <pavel at labath.sk> wrote:

> On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:
> > Hi Everyone,
> >
> > In September I sent out an RFC [1] about adding reproducers to LLDB.
> > Over the
> > past few months, I landed the reproducer framework, support for the GDB
> > remote
> > protocol and a bunch of preparatory changes. There's still an open code
> > review
> > [2] for dealing with files, but that one is currently blocked by a
> change to
> > the VFS in LLVM [3].
> >
> > The next big piece of work is supporting user commands (e.g. in the
> > driver) and
> > SB API calls. Originally I expected these two things to be separate, but
> > Pavel
> > made a good case [4] that they're actually very similar.
> >
> > I created a prototype of how I envision this to work. As usual, we can
> > differentiate between capture and replay.
> >
> > ## SB API Capture
> >
> > When capturing a reproducer, every SB function/method is instrumented
> > using a
> > macro at function entry. The added code tracks the function identifier
> > (currently we use its name with __PRETTY_FUNCTION__) and its arguments.
> >
> > It also tracks when a function crosses the boundary between internal and
> > external use. For example, when someone (be it the driver, the python
> > binding
> > or the RPC server) call SBFoo, and in its implementation SBFoo calls
> > SBBar, we
> > don't need to record SBBar. When invoking SBFoo during replay, it will
> > itself
> > call SBBar.
> >
> > When a boundary is crossed, the function name and arguments are
> > serialized to a
> > file. This is trivial for basic types. For objects, we maintain a table
> that
> > maps pointer values to indices and serialize the index.
> >
> > To keep our table consistent, we also need to track return for functions
> > that
> > return an object by value. We have a separate macro that wraps the
> returned
> > object.
> >
> > The index is sufficient because every object that is passed to a
> > function has
> > crossed the boundary and hence was recorded. During replay (see below)
> > we map
> > the index to an address again which ensures consistency.
> >
> > ## SB API Replay
> >
> > To replay the SB function calls we need a way to invoke the corresponding
> > function from its serialized identifier. For every SB function, there's a
> > counterpart that deserializes its arguments and invokes the function.
> These
> > functions are added to the map and are called by the replay logic.
> >
> > Replaying is just a matter looping over the function identifiers in the
> > serialized file, dispatching the right deserialization function, until
> > no more
> > data is available.
> >
> > The deserialization function for constructors or functions that return
> > by value
> > contains additional logic for dealing with the aforementioned indices.
> The
> > resulting objects are added to a table (similar to the one described
> > earlier)
> > that maps indices to pointers. Whenever an object is passed as an
> > argument, the
> > index is used to get the actual object from the table.
> >
> > ## Tool
> >
> > Even when using macros, adding the necessary capturing and replay code is
> > tedious and scales poorly. For the prototype, we did this by hand, but we
> > propose a new clang-based tool to streamline the process.
> >
> > For the capture code, the tool would validate that the macro matches the
> > function signature, suggesting a fixit if the macros are incorrect or
> > missing.
> > Compared to generating the macros altogether, it has the advantage that
> we
> > don't have "configured" files that are harder to debug (without faking
> line
> > numbers etc).
> >
> > The deserialization code would be fully generated. As shown in the
> prototype
> > there are a few different cases, depending on whether we have to account
> for
> > objects or not.
> >
> > ## Prototype Code
> >
> > I created a differential [5] on Phabricator with the prototype. It
> > contains the
> > necessary methods to re-run the gdb remote (reproducer) lit test.
> >
> > ## Feedback
> >
> > Before moving forward I'd like to get the community's input. What do you
> > think
> > about this approach? Do you have concerns or can we be smarter
> > somewhere? Any
> > feedback would be greatly appreciated!
> >
> > Thanks,
> > Jonas
> >
> > [1] http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html
> > [2] https://reviews.llvm.org/D54617
> > [3] https://reviews.llvm.org/D54277
> > [4] https://reviews.llvm.org/D55582
> > [5] https://reviews.llvm.org/D56322
> >
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >
>
>
Thanks for the feedback Pavel!

> [Adding Tamas for his experience with recording and replaying APIs.]
>
>
> Thank you for sharing the prototype Jonas. It looks very interesting,
> but there are a couple of things that worry me about it.
>
> The first one is the usage of __PRETTY_FUNCTION__. That sounds like a
> non-starter even for an initial implementation, as the string that
> expands to is going to differ between compilers (gcc and clang will
> probably agree on it, but I know for a fact it will be different on
> msvc). It that was just an internal property of the serialization
> format, then it might be fine, but it looks like you are hardcoding the
> values in code to connect the methods with their replayers, which is
> going to be a problem.
>

I should've made it clear that I didn't intend to have that checked in.
Originally the idea was that we would always use the clang-based tool to
generate the IDs (the __PRETTY_FUNCTION__ was just a placeholder for the
prototype). I didn't give that much more thought, but if the tool would not
generate the ID but just check the macro, that's becomes a valid concern
again.

> I've been thinking about how could this be done better, and the best
> (though not ideal) way I came up with is using the functions address as
> the key. That's guaranteed to be unique everywhere. Of course, you
> cannot serialize that to a file, but since you already have a central
> place where you list all intercepted functions (to register their
> replayers), that place can be also used to assign unique integer IDs to
> these functions. So then the idea would be that the SB_RECORD macro
> takes the address of the current function, that gets converted to an ID
> in the lookup table, and the ID gets serialized.
>

It sound like you would generate the indices at run-time. How would that
work with regards to the the reverse mapping?

> The part that bugs me about this is that taking the address of an
> overloaded function is extremely tedious (you have to write something
> like static_cast<function_prototype>(&SBFoo::Bar)). That would mean all
> of these things would have to be passed to the RECORD macro. OTOH, the
> upshot of this would be that the macro would now have sufficient
> information to perform pretty decent error checking on its invocation.
> Another nice about this could be that once you already have a prototype
> and an address of the function, it should be possible (with sufficient
> template-fu) to synthesize replay code for the function automatically,
> at least in the simple cases, which would avoid the repetitiveness of
> the current replay code. Together, this might obviate the need for any
> clang plugins or other funny build steps.
>

See my previous question. I see how the signature would help with decoding
but still missing how you'd get the mapping.

> The second thing I noticed is the usage of pointers for identifying
> object. A pointer is good for that but only while the object it points
> to is alive. Once the object is gone, the pointer can (and most likely
> will) be reused. So, it sounds to me like you also need to track the
> lifetime of these objects. That may be as simple as intercepting
> constructor/destructor calls, but I haven't seen anything like that yet
> (though I haven't looked at all details of the patch).
>

This shouldn't be a problem. When a new object is created it will be
recorded in the table with a new identifier.

> Tying into that is the recording of return values. It looks like the
> current RECORD_RETURN macro will record the address of the temporary
> object in the frame of the current function. However, that address will
> become invalid as soon as the function returns as the result object will
> be copied into a location specified by the caller as a part of the
> return processing. Are you handling this in any way?
>

We capture the temporary and the call to the copy-assignment constructor.
This is not super efficient but it's the best we can do.

> The final thing, which I noticed is the lack of any sign of threading
> support. I'm not too worried about that, as that sounds like something
> that could be fitted into the existing framework incrementally, but it
> is something worth keeping in mind, as you're going to run into that
> pretty soon.
>

Yup, I've intentially ignored this for now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20190107/3a12a0db/attachment.html>