[lldb-dev] logging in lldb

Tue Dec 6 10:20:08 PST 2016

On 6 December 2016 at 17:39, Jim Ingham <jingham at apple.com> wrote:
>
>> On Dec 6, 2016, at 8:23 AM, Pavel Labath <labath at google.com> wrote:
>>
>> Hello all,
>>
>> I've been thinking about how to simplify the way we do logging in
>> lldb. The main two problems I see are:
>> a) inability to print complex types easily - this was always tricky,
>> but lately has become even more problematic, as we're starting to
>> introduce llvm::StringRef, chrono, and other non-C types everywhere
>
> Streaming output tends to chunk the output text into smaller bits.  That's a problem when you are doing "I have a log line from some customer, I have no idea where it is in the code, so I'm going to search for the text in the code."  That actually happens pretty frequently.  The more you chunk up the text, the harder this task is.  This problem exists in the current method especially since the restriction to 80 chars meant that lots of contiguous log lines are now nonsensically broken up into chunks.  It would be nice not to make it worse.  This is more about ways to write log lines, however; more cultural than implementation.
>
> Like Greg, I find streaming harder to read than printf, particularly if I'm trying to make things indent nicely for reading.  So again, I'd argue for adding these macros, but not trying to replace the current mechanism.
Fair enough. llvm::formatv() (see lldb-commits thread, we seem do have
diverged a bit...) should satisfy both of these requirements
("monolithic strings" and "not-streams"). Would you be OK with that?
(BTW, having a file+function name in each log message makes it easier
to find the source)

>
>
>> b) the need to explicitly specify the log source (class and function
>> names) explicitly - this leads to increased verboseness and
>> inconsistency
>
> I think having a convenience to add function name decorators is fine, but we need a way not to do it all the time.  We have logs that don't have function names because they log over several functions, and having the different names adds visual noise, would make the rest of the log impossible to line up nicely, and adds no important information.  You will write log statements infrequently, but you will have to read them much more frequently, so taking a little more effort to make them readable is well worth it.

What I am optimizing for is readability of the source code (the fact
that it coincides with "the time to write the log statement" is a
lucky coincidence). I often find that I am looking at a function which
spans several screens, even though it doesn't actually do much,
because half of the function is logging. Normally, I would try to
split this function up into smaller ones, but I find that the amount
of work done by the function is actually quite reasonable and
undivisible, but due to logging it gets spread out enormously.

I would be fine with some way of inhibiting decorations. But, I want
to hear what you think about this alternative: Just have file names
and line numbers always aligned. For lines, we could reserve four
characters, for filenames 35 (only 7 files currently are longer than
35 chars -- these would get truncated, or misaligned, as you see fit).

>
>> c) the need to write if(log) everywhere -- also increases visual noise
>
> Sometimes you need to do some preliminary calculations to gather the materials you want to log, so the code in the "if (log)" is not trivial.  It has to be inside the "if (log)" however, since you only want to do logging computations if you are logging.  I'd really rather not have to write this in a macro as that makes it hard to read and harder to debug.  That doesn't argue against adding these macros, but it does argue that we need to keep the old form.
>
> Aesthetically, your form does not make it clear that the logging statements are not getting done if logging is off.  I don't want somebody casually working on the code not to log something because they were unsure of this.  We could address that through documentation, a solution whose goodness is weighted by the faith you have in how many people read this sort of documentation.

I would assume people who care about performance to this level will be
familiar with the performance characteristics of logging. For me this
potential confusion is less important than making the code flow
nicely, although I see how you may have different opinion.

>> - LLDB_LOG() vs. LLDB_LOG(log) -- All our log variables are named log,
>> so maybe we don't have to reference them explicitly? I couldn't decide
>> whether that helps or hurts readability. (I wanted to keep the log
>> variables, instead of specifying the log flags explicitly, as that
>> means individual log statements can be very cheap, and short.)
>
> I'd rather keep this explicit.  I don't know if we do it ever, but it is totally reasonable that one function might want to funnel information to two different channels.  Also, it makes explicit the connection between the log object and the channel it is opened for.  It is a bit weird to have a seemingly unused log creation line, then a bunch of logs later on that don't look connected to it.  The more oddities you put in the code the harder it is for a new engineer to begin to grok it.

OK.

>
> For starters, it has builtin support for formatting ranges.  So if you're ever looping over entities formatting them, as long as the entity itself has a formatter defined, you can simply pass it directly to formatv and it will do the right thing.  (It has some customization support built in as well so you can control how to separate items).
> Secondly, When you have to do some work to get the value you want to format, you could either define a custom formatter for the type in question, and the formatter does all the work, or you could write a function to return the value you want to format, and call that in the argument list, as long as the return type has an associated formatter.
> I'm sure there will still be cases where you want to use the old method, but I think this would handle quite a few of them and reduce the amount of code you have to read / write when dealing with logging code.

Agreed. We're bound to run into cases where we want to have code that
is run only when logging is enabled, but it isn't directly a logging
statement. So we should have some form of "if(log)", but it should not
need to be used that often.

> So the stream stuff all now needs to mark where the logging begins and ends:
>
>    if (log) *log << log::start_message << 1 << ' ' << 2 << ' ' << 3 << '\n' << std::end_message;
> And that starts to ruin the nice concrete log calls we had with log->Printf() since each call to log->Printf() was a complete message. I am open to ideas on how to solve this. Anyone have any ideas?

So, the way I solve that in my mockup is by having a temporary object.
It's constructor does any initialization work necessary (in my case,
printing file and line number). In the destructor I flush the log
message to the underlying streams. You could use the same approach,
but synchronize via mutexes, or whatever else you want.