[Lldb-commits] [PATCH] D107079: [lldb] Add an option for emitting LLDB logs as JSON

Wed Aug 11 07:54:34 PDT 2021

teemperor added a comment.

In D107079#2918185 <https://reviews.llvm.org/D107079#2918185>, @jasonmolenda wrote:

> Sorry for not commenting on this on earlier, I wanted to think about it a bit.
>
> I think the usefulness of this can depend on the type of logging.  l often turn on gdb-remote packet logging, or thread step logging, in an interactive debug session and watch the log messages streaming as I do lldb commands.  I think interposing a JSON formatter in this is less straightforward - not that reading JSON is that difficult, but it's just adding layering between me and the log messages.  WIth this patch the logging will be like {msg="..."} and fine, but we'll add more fields to this over time and it will become less easy to read.
>
> Maybe this can be an *option* to logging like verbose logging is.  I don't have any problem with the default mode being JSON and people who want rawness can add an option to log enable to not do that.
>
> I can't get too high and mighty with my philosophy of disintermediation -- gdb-remote packets are often actually using binary data, and even compressed, and lldb reformats all of that before logging so my terminal doesn't get borked by escape/high-bit characters being dumped on to it. But I read the gdb-remote packet logs every day and wrapping it in JSON doesn't make anything better for this use case, I don't think.  This may be more a Jason opinion than a gdb-remote packet reader opinion!

Yeah, I actually wanted to write finish my comment above that started as "Do we want to keep the old plain text format too? To be honest, I don't really see much" with "point in that but I don't mind keeping the old format around if someone thinks its useful." Somehow that part of my sentence got deleted. Anyway, the plain text format is still around and still the default so I think this concern is addressed.

In D107079#2914189 <https://reviews.llvm.org/D107079#2914189>, @jingham wrote:

> When you are using the logging for a reliably reproducible problem (some parsing issue, for instance) then there's no harm to the general "turn on all logging and we'll sort it out on our end".  But for any issues that are timing sensitive the problem is that too much logging changes the timing and often causes bugs to disappear.  This patch will add volume to log emission and slow its emission down, so it goes a bit in the wrong direction.  I don't think that's a reason NOT to do this, hopefully the overhead is small.  But it means we can't assume "turning on all logs" is a reasonable default behavior for logging.  If anything, we need to go in the other direction and add a more fine-grained verbosity so we can ask for "a bit of log channel A and a bit of B and more of C" or whatever.

I think this is also addressed by having the plain text format around in its current form.

> I'm not entirely sure what you intend when you say:
>
>> Every log message is now a serialized JSON object (that is now also guaranteed to occupy exactly one line) that is just appended to the log file.
>
> Log messages are going to have new-lines in them because it's quite convenient to use the various object Description or Dump methods to log objects and those are generally laid out to be human readable (and are used for that purpose in other contexts.)  By "single line" it seems you mean "all log messages with new lines will have the newline converted to "\n" as a character string before being inserted into the JSON.  This seems more an implementation detail, I'm not sure why you call it out.  In any case, preserving the multi-line structure of log content is important for eventually being able to read them, so as long as the encoders & printers do the right thing, it doesn't really matter how they are encoded.

This is just about the fact that every JSON log message is guaranteed to be one line (even if the message contains newlines which will just be escaped). You can still log multiline messages as before  The `1 object = 1 line` property is just relevant for my comment below: `What about logs that are printed to stdin and we want to parse? I think you can easily make a tool that filters out the non-JSON output. Logs are guaranteed to have every object on its own line, so you can just go line-by-line and filter them based on that.`. It's technically also possible now with both formats around to have a plain text channel and a JSON channel print to the same file, so because of this `1 object = 1 line` property that is something that log parser could handle if we wanted to. Not that we should ever actually tell someone to log both formats into one file, but at least the final output would be salvageable this way.

> Conversely, however, it is NOT the case that each LLDB_LOGF call is a single Log message.  For instance, at the beginning and end of each round of "ShouldStop" negotiation, I run through the ThreadPlanStack of each thread that had a non-trivial stop reason and print each thread plan's description in order.  I LOG a header for the output, then (fortunately) I have a routine that dumps all the info to a stream, so I can log that part in one go.  Then I print an end message with another LLDB_LOGF.  Putting these into separate JSON entries makes keeping them together harder - thus making automated tools harder to write - harder to scan with anything but a plain text dumper.
>
> We could require that all unitary log messages build up the whole message off-line into a stream, then log in one go.  But it might be more convenient to have a "start message/end message" pair, so you can call LLDB_LOGF in a loop printing individual objects, then close off the message.  Some kind of RAII deal would be convenient for this.
>
> It would be even more convenient if this allows nesting.  Then you can do "logging the shared library state change - start the list of dylibs - start the log of dylib A, end dylib A, etc..."  That way the individual entity dumps would be structured in the log message that dumped them, and you could progressively reveal them in a neat way.
>
> We definitely need a plain text dumper that we provide.  A very common task in looking through logs is to find an address of some entity in one log message, and then search through the other logs looking at each one to see if it is interesting.  And very often to see if it is embedded in another interesting log transaction, so the context is important.  That's best done with a plain text dump of the log.

I agree with the point of making it easier to make one big structures log message. I don't see how the JSON format really changes the current situation that much. Right now if you do multiple LLDB_LOG calls there is no guarantee that the final log contains them next to each other and each LLDB_LOG call gets its own metainformation dumped next to it. The JSON format does exactly the same and with a plain text dumper the output is 100% identical. I think the RAII idea that just collects a log and then sends it all as one (potentially somehow structured) message seems like a good improvement, but it seems more a like a future work item (especially now that the plain text format is still a thing).

In D107079#2913826 <https://reviews.llvm.org/D107079#2913826>, @DavidSpickett wrote:

>> With the added channels in each message we could just do the filtering on our side and tell people to just enable every channel in every category.
>
> +1 to this, was the first thing I tried to do on the viewer you linked.
>
>> Will there be an upstream log viewer? I don't know, but we could easily put a simple one on the website that just formats a log as a simple HTML table. That is easily done in vanilla JavaScript. But that should be a follow up patch.
>
> As in on llvm.org? That would be neat.

Yeah, I think a simple JSON-to-HTML-table converter would fit on the official website. Not sure if we want to go full JS framework and start including external resources and so on. But writing a parser for the format is a matter of seconds, so I expect other people to write more complicated viewers.

> Python does great with JSON so I'm sure it would be easy to hack on more specific scripts later too.
>
> Will you be adding documentation about what the format is? (said web page would be a good place for it actually)

Done, I just put it on the website where the other internal design stuff is documented.

> I know it won't be that complicated but it's always nice to see the total set of fields you might expect.

Agreed, I added a table with the fields + type + explanation on the website.

In D107079#2913864 <https://reviews.llvm.org/D107079#2913864>, @mib wrote:

> This looks like a great effort to improve logging and debugging! Have you considered also adding logging levels (info, debug, warning, error ...) ? I think the `CommandReturnObject` and `Status` classes already makes the distinction between warnings and errors messages. We could enrich logging further more by having logging levels to filter / color-code the messages.

Jepp, see my 'possible follow ups" comment above :). Having a log highlighting/filter for all the cases where we error but just keep chugging along would be really nice.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107079/new/

https://reviews.llvm.org/D107079