[lldb-dev] Inquiry for performance monitors

Tamas Berghammer via lldb-dev lldb-dev at lists.llvm.org
Mon Feb 1 04:05:08 PST 2016


If you want to go with the path to implement it outside LLDB then I would
suggest to implement it with an out of tree plugin written in C++. You can
use the SB API the same way as you can from python and additionally it have
a few advantages:
* You have a C/C++ API what makes it easy to integrate the functionality
into an IDE (they just have to link to your shared library)
* You can generate a Python API if you need one with SWIG the same way we
do it for the SB API
* You don't have to worry about making the code both Python 2.7 and Python
3.5 compatible

You can see a very simple example for implementing an out of tree C++
plugin in <lldb>/examples/plugins/commands

On Mon, Feb 1, 2016 at 10:53 AM Pavel Labath via lldb-dev <
lldb-dev at lists.llvm.org> wrote:

> Speaking for Android Studio, I think that we *could* use a
> python-based implementation (hard to say exactly without knowing the
> details of the implementation), but I believe a different
> implementation could be *easier* to integrate. Plus, if the solution
> integrates more closely with lldb, we could surface some of the data
> in the command-line client as well.
>
> pl
>
> On 1 February 2016 at 10:30, Ravitheja Addepally
> <ravithejawork at gmail.com> wrote:
> > And what about the ease of integration into a an IDE, I don't really
> know if
> > the python based approach would be usable or not in this context ?
> >
> > On Mon, Feb 1, 2016 at 11:17 AM, Pavel Labath <labath at google.com> wrote:
> >>
> >> It feels to me that the python based approach could run into a dead
> >> end fairly quickly: a) you can only access the data when the target is
> >> stopped; b) the self-tracing means that the evaluation of these
> >> expressions would introduce noise in the data; c) overhead of all the
> >> extra packets(?).
> >>
> >> So, I would be in favor of a lldb-server based approach. I'm not
> >> telling you that you shouldn't do that, but I don't think that's an
> >> approach I would take...
> >>
> >> pl
> >>
> >>
> >> On 1 February 2016 at 08:58, Ravitheja Addepally
> >> <ravithejawork at gmail.com> wrote:
> >> > Ok, that is one option, but one of the aim for this activity is to
> make
> >> > the
> >> > data available for use by the IDE's like Android Studio or XCode or
> any
> >> > other that may want to display this information in its environment so
> >> > keeping that in consideration would the complete python based approach
> >> > be
> >> > useful ? or would providing LLDB api's to extract raw perf data from
> the
> >> > target be useful ?
> >> >
> >> > On Thu, Jan 21, 2016 at 10:00 PM, Greg Clayton <gclayton at apple.com>
> >> > wrote:
> >> >>
> >> >> One thing to think about is you can actually just run an expression
> in
> >> >> the
> >> >> program that is being debugged without needing to change anything in
> >> >> the GDB
> >> >> remote server. So this can all be done via python commands and would
> >> >> require
> >> >> no changes to anything. So you can run an expression to enable the
> >> >> buffer.
> >> >> Since LLDB supports multiple line expression that can define their
> own
> >> >> local
> >> >> variables and local types. So the expression could be something like:
> >> >>
> >> >> int perf_fd = (int)perf_event_open(...);
> >> >> struct PerfData
> >> >> {
> >> >>     void *data;
> >> >>     size_t size;
> >> >> };
> >> >> PerfData result = read_perf_data(perf_fd);
> >> >> result
> >> >>
> >> >>
> >> >> The result is then a structure that you can access from your python
> >> >> command (it will be a SBValue) and then you can read memory in order
> to
> >> >> get
> >> >> the perf data.
> >> >>
> >> >> You can also split things up into multiple calls where you can run
> >> >> perf_event_open() on its own and return the file descriptor:
> >> >>
> >> >> (int)perf_event_open(...)
> >> >>
> >> >> This expression will return the file descriptor
> >> >>
> >> >> Then you could allocate memory via the SBProcess:
> >> >>
> >> >> (void *)malloc(1024);
> >> >>
> >> >> The result of this expression will be the buffer that you use...
> >> >>
> >> >> Then you can read 1024 bytes at a time into this newly created
> buffer.
> >> >>
> >> >> So a solution that is completely done in python would be very
> >> >> attractive.
> >> >>
> >> >> Greg
> >> >>
> >> >>
> >> >> > On Jan 21, 2016, at 7:04 AM, Ravitheja Addepally
> >> >> > <ravithejawork at gmail.com> wrote:
> >> >> >
> >> >> > Hello,
> >> >> >       Regarding the questions in this thread please find the
> answers
> >> >> > ->
> >> >> >
> >> >> > How are you going to present this information to the user? (I know
> >> >> > debugserver can report some performance data... Have you looked
> into
> >> >> > how that works? Do you plan to reuse some parts of that
> >> >> > infrastructure?) and How will you get the information from the
> server
> >> >> > to
> >> >> > the client?
> >> >> >
> >> >> >  Currently I plan to show a list of instructions that have been
> >> >> > executed
> >> >> > so far, I saw the
> >> >> > implementation suggested by pavel, the already present
> infrastructure
> >> >> > is
> >> >> > a little bit lacking in terms of the needs of the
> >> >> > project, but I plan to follow a similar approach, i.e to extract
> the
> >> >> > raw
> >> >> > trace data by querying the server (which can use the
> >> >> > perf_event_open to get the raw trace data from the kernel) and
> >> >> > transport
> >> >> > it through gdb packets ( qXfer packets
> >> >> >
> >> >> >
> >> >> >
> https://sourceware.org/gdb/onlinedocs/gdb/Branch-Trace-Format.html#Branch-Trace-Format
> ).
> >> >> > At the client side the raw trace data
> >> >> > could be passed on to python based command that could decode the
> >> >> > data.
> >> >> > This also eliminates the dependency of libipt since LLDB
> >> >> > would not decode the data itself.
> >> >> >
> >> >> > There is also the question of this third party library.  Do we
> take a
> >> >> > hard dependency on libipt (probably a non-starter), or only use it
> if
> >> >> > it's
> >> >> > available (much better)?
> >> >> >
> >> >> > With the above mentioned way LLDB would not need the library, who
> >> >> > ever
> >> >> > wants to use the python command would have to install it separately
> >> >> > but LLDB
> >> >> > wont need it
> >> >> >
> >> >> > With the performance counters, the interface would still be
> >> >> > perf_event_open, so if there was a perf_wrapper in LLDB server then
> >> >> > it could
> >> >> > be reused to configure and use the
> >> >> > software performance counters as well, you would just need to pass
> >> >> > different attributes in the perf_event_open system call, plus I
> think
> >> >> > the
> >> >> > perf_wrapper could be reused to
> >> >> > get CoreSight information as well (see
> >> >> > https://lwn.net/Articles/664236/
> >> >> > )
> >> >> >
> >> >> >
> >> >> > On Wed, Oct 21, 2015 at 8:57 PM, Greg Clayton <gclayton at apple.com>
> >> >> > wrote:
> >> >> > one main benefit to doing this externally is allow this to be done
> >> >> > remotely over any debugger connection. If you can run expressions
> to
> >> >> > enable/disable/setup the memory buffer/access the buffer contents,
> >> >> > then you
> >> >> > don't need to add code into the debugger to actually do this.
> >> >> >
> >> >> > Greg
> >> >> >
> >> >> > > On Oct 21, 2015, at 11:54 AM, Greg Clayton <gclayton at apple.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > IMHO the best way to provide this information is to implement
> >> >> > > reverse
> >> >> > > debugging packets in a GDB server (lldb-server). If you enable
> this
> >> >> > > feature
> >> >> > > via some packet to lldb-server, and that enables the gathering of
> >> >> > > data that
> >> >> > > keeps the last N instructions run by all threads in some buffer
> >> >> > > that gets
> >> >> > > overwritten. The lldb-server enables it and gives a buffer to the
> >> >> > > perf_event_interface(). Then clients can ask the lldb-server to
> >> >> > > step back in
> >> >> > > any thread. Only when the data is requested do we actually use
> the
> >> >> > > data to
> >> >> > > implement the reverse stepping.
> >> >> > >
> >> >> > > Another way to do this would be to use a python based command
> that
> >> >> > > can
> >> >> > > be added to any target that supports this. The plug-in could
> >> >> > > install a set
> >> >> > > of LLDB commands. To see how to create new lldb command line
> >> >> > > commands in
> >> >> > > python, see the section named "CREATE A NEW LLDB COMMAND USING A
> >> >> > > PYTHON
> >> >> > > FUNCTION" on the http://lldb.llvm.org/python-reference.html web
> >> >> > > page.
> >> >> > >
> >> >> > > Then you can have some commands like:
> >> >> > >
> >> >> > > intel-pt-start
> >> >> > > intel-pt-dump
> >> >> > > intel-pt-stop
> >> >> > >
> >> >> > > Each command could have options and arguments as desired. The
> >> >> > > "intel-pt-start" command could make an expression call to enable
> >> >> > > the feature
> >> >> > > in the target by running and expression that runs the some
> >> >> > > perf_event_interface calls that would allocate some memory and
> hand
> >> >> > > it to
> >> >> > > the Intel PT stuff. The "intel-pt-dump" could just give a raw
> dump
> >> >> > > all of
> >> >> > > history for one or more threads (again, add options and arguments
> >> >> > > as needed
> >> >> > > to this command). The python code could bridge to C and use the
> >> >> > > intel
> >> >> > > libraries that know how to process the data.
> >> >> > >
> >> >> > > If this all goes well we can think about building it into LLDB
> as a
> >> >> > > built in command.
> >> >> > >
> >> >> > >
> >> >> > >> On Oct 21, 2015, at 9:50 AM, Zachary Turner via lldb-dev
> >> >> > >> <lldb-dev at lists.llvm.org> wrote:
> >> >> > >>
> >> >> > >> There are two different kinds of performance counters: OS
> >> >> > >> performance
> >> >> > >> counters and CPU performance counters.  It sounds like you're
> >> >> > >> talking about
> >> >> > >> the latter, but it's worth considering whether this could be
> >> >> > >> designed in a
> >> >> > >> way to support both (i.e. even if you don't do both yourself, at
> >> >> > >> least make
> >> >> > >> the machinery reusable and apply to both for when someone else
> >> >> > >> wanted to
> >> >> > >> come through and add OS perf counters).
> >> >> > >>
> >> >> > >> There is also the question of this third party library.  Do we
> >> >> > >> take a
> >> >> > >> hard dependency on libipt (probably a non-starter), or only use
> it
> >> >> > >> if it's
> >> >> > >> available (much better)?
> >> >> > >>
> >> >> > >> As Pavel said, how are you planning to present the information
> to
> >> >> > >> the
> >> >> > >> user?  Through some sort of top level command like "perfcount
> >> >> > >> instructions_retired"?
> >> >> > >>
> >> >> > >> On Wed, Oct 21, 2015 at 8:16 AM Pavel Labath via lldb-dev
> >> >> > >> <lldb-dev at lists.llvm.org> wrote:
> >> >> > >> [ Moving this discussion back to the list. I pressed the wrong
> >> >> > >> button
> >> >> > >> when replying.]
> >> >> > >>
> >> >> > >> Thanks for the explanation Ravi. It sounds like a very useful
> >> >> > >> feature
> >> >> > >> indeed. I've found a reference to the debugserver profile data
> in
> >> >> > >> GDBRemoteCommunicationClient.cpp:1276, so maybe that will help
> >> >> > >> with
> >> >> > >> your investigation. Maybe also someone more knowledgeable can
> >> >> > >> explain
> >> >> > >> what those A packets are used for (?).
> >> >> > >>
> >> >> > >>
> >> >> > >> On 21 October 2015 at 15:48, Ravitheja Addepally
> >> >> > >> <ravithejawork at gmail.com> wrote:
> >> >> > >>> Hi,
> >> >> > >>>   Thanx for your reply, some of the future processors to be
> >> >> > >>> released
> >> >> > >>> by
> >> >> > >>> Intel have this hardware support for recording the instructions
> >> >> > >>> that
> >> >> > >>> were
> >> >> > >>> executed by the processor and this recording process is also
> >> >> > >>> quite
> >> >> > >>> fast and
> >> >> > >>> does not add too much computational load. Now this hardware is
> >> >> > >>> made
> >> >> > >>> accessible via the perf_event_interface where one could map a
> >> >> > >>> region
> >> >> > >>> of
> >> >> > >>> memory for this purpose by passing it as an argument to this
> >> >> > >>> perf_event_interface. The recorded instructions are then
> written
> >> >> > >>> to
> >> >> > >>> the
> >> >> > >>> memory region assigned. Now this is basically the raw
> >> >> > >>> information,
> >> >> > >>> which can
> >> >> > >>> be obtained from the hardware. It can be interpreted and
> >> >> > >>> presented
> >> >> > >>> to the
> >> >> > >>> user in the following ways ->
> >> >> > >>>
> >> >> > >>> 1) Instruction history - where the user gets basically a list
> of
> >> >> > >>> all
> >> >> > >>> instructions that were executed
> >> >> > >>> 2) Function Call History - It is also possible to get a list of
> >> >> > >>> all
> >> >> > >>> the
> >> >> > >>> functions called in the inferior
> >> >> > >>> 3) Reverse Debugging with limited information - In GDB this is
> >> >> > >>> only
> >> >> > >>> the
> >> >> > >>> functions executed.
> >> >> > >>>
> >> >> > >>> This raw information also needs to decoded (even before you can
> >> >> > >>> disassemble
> >> >> > >>> it ), there is already a library released by Intel called
> libipt
> >> >> > >>> which can
> >> >> > >>> do that. At the moment we plan to work with Instruction
> History.
> >> >> > >>> I will look into the debugserver infrastructure and get back to
> >> >> > >>> you.
> >> >> > >>> I guess
> >> >> > >>> for the server client communication we would rely on packets
> >> >> > >>> only.
> >> >> > >>> In case
> >> >> > >>> of concerns about too much data being transferred, we can limit
> >> >> > >>> the
> >> >> > >>> number
> >> >> > >>> of entries we report because anyway the amount of data recorded
> >> >> > >>> is
> >> >> > >>> too big
> >> >> > >>> to present all at once so we would have to resort to something
> >> >> > >>> like
> >> >> > >>> a
> >> >> > >>> viewport.
> >> >> > >>>
> >> >> > >>> Since a lot of instructions can be recorded this way, the
> >> >> > >>> function
> >> >> > >>> call
> >> >> > >>> history can be quite useful for debugging and especially since
> it
> >> >> > >>> is
> >> >> > >>> a lot
> >> >> > >>> faster to collect function traces this way.
> >> >> > >>>
> >> >> > >>> -ravi
> >> >> > >>>
> >> >> > >>> On Wed, Oct 21, 2015 at 3:14 PM, Pavel Labath <
> labath at google.com>
> >> >> > >>> wrote:
> >> >> > >>>>
> >> >> > >>>> Hi,
> >> >> > >>>>
> >> >> > >>>> I am not really familiar with the perf_event interface (and I
> >> >> > >>>> suspect
> >> >> > >>>> others aren't also), so it might help if you explain what kind
> >> >> > >>>> of
> >> >> > >>>> information do you plan to collect from there.
> >> >> > >>>>
> >> >> > >>>> As for the PtraceWrapper question, I think that really depends
> >> >> > >>>> on
> >> >> > >>>> bigger design decisions. My two main questions for a feature
> >> >> > >>>> like
> >> >> > >>>> this
> >> >> > >>>> would be:
> >> >> > >>>> - How are you going to present this information to the user?
> (I
> >> >> > >>>> know
> >> >> > >>>> debugserver can report some performance data... Have you
> looked
> >> >> > >>>> into
> >> >> > >>>> how that works? Do you plan to reuse some parts of that
> >> >> > >>>> infrastructure?)
> >> >> > >>>> - How will you get the information from the server to the
> >> >> > >>>> client?
> >> >> > >>>>
> >> >> > >>>> pl
> >> >> > >>>>
> >> >> > >>>>
> >> >> > >>>> On 21 October 2015 at 13:41, Ravitheja Addepally via lldb-dev
> >> >> > >>>> <lldb-dev at lists.llvm.org> wrote:
> >> >> > >>>>> Hello,
> >> >> > >>>>>       I want to implement support for reading Performance
> >> >> > >>>>> measurement
> >> >> > >>>>> information using the perf_event_open system calls. The
> motive
> >> >> > >>>>> is
> >> >> > >>>>> to add
> >> >> > >>>>> support for Intel PT hardware feature, which is available
> >> >> > >>>>> through
> >> >> > >>>>> the
> >> >> > >>>>> perf_event interface. I was thinking of implementing a new
> >> >> > >>>>> Wrapper
> >> >> > >>>>> like
> >> >> > >>>>> PtraceWrapper in NativeProcessLinux files. My query is that,
> is
> >> >> > >>>>> this a
> >> >> > >>>>> correct place to start or not ? in case not, could someone
> >> >> > >>>>> suggest
> >> >> > >>>>> me
> >> >> > >>>>> another place to begin with ?
> >> >> > >>>>>
> >> >> > >>>>> BR,
> >> >> > >>>>> A Ravi Theja
> >> >> > >>>>>
> >> >> > >>>>>
> >> >> > >>>>> _______________________________________________
> >> >> > >>>>> lldb-dev mailing list
> >> >> > >>>>> lldb-dev at lists.llvm.org
> >> >> > >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >> >> > >>>>>
> >> >> > >>>
> >> >> > >>>
> >> >> > >> _______________________________________________
> >> >> > >> lldb-dev mailing list
> >> >> > >> lldb-dev at lists.llvm.org
> >> >> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >> >> > >> _______________________________________________
> >> >> > >> lldb-dev mailing list
> >> >> > >> lldb-dev at lists.llvm.org
> >> >> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >> >> > >
> >> >> >
> >> >> >
> >> >>
> >> >
> >
> >
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20160201/e55779d0/attachment-0001.html>


More information about the lldb-dev mailing list