[lldb-dev] logging in lldb

Tue Dec 6 10:31:54 PST 2016

> On Dec 6, 2016, at 10:20 AM, Pavel Labath <labath at google.com> wrote:
> 
> On 6 December 2016 at 17:39, Jim Ingham <jingham at apple.com> wrote:
>> 
>>> On Dec 6, 2016, at 8:23 AM, Pavel Labath <labath at google.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I've been thinking about how to simplify the way we do logging in
>>> lldb. The main two problems I see are:
>>> a) inability to print complex types easily - this was always tricky,
>>> but lately has become even more problematic, as we're starting to
>>> introduce llvm::StringRef, chrono, and other non-C types everywhere
>> 
>> Streaming output tends to chunk the output text into smaller bits.  That's a problem when you are doing "I have a log line from some customer, I have no idea where it is in the code, so I'm going to search for the text in the code."  That actually happens pretty frequently.  The more you chunk up the text, the harder this task is.  This problem exists in the current method especially since the restriction to 80 chars meant that lots of contiguous log lines are now nonsensically broken up into chunks.  It would be nice not to make it worse.  This is more about ways to write log lines, however; more cultural than implementation.
>> 
>> Like Greg, I find streaming harder to read than printf, particularly if I'm trying to make things indent nicely for reading.  So again, I'd argue for adding these macros, but not trying to replace the current mechanism.
> Fair enough. llvm::formatv() (see lldb-commits thread, we seem do have
> diverged a bit...) should satisfy both of these requirements
> ("monolithic strings" and "not-streams"). Would you be OK with that?
> (BTW, having a file+function name in each log message makes it easier
> to find the source)
> 

That would be a great option to add to "log enable"!

>> 
>> 
>>> b) the need to explicitly specify the log source (class and function
>>> names) explicitly - this leads to increased verboseness and
>>> inconsistency
>> 
>> I think having a convenience to add function name decorators is fine, but we need a way not to do it all the time.  We have logs that don't have function names because they log over several functions, and having the different names adds visual noise, would make the rest of the log impossible to line up nicely, and adds no important information.  You will write log statements infrequently, but you will have to read them much more frequently, so taking a little more effort to make them readable is well worth it.
> 
> What I am optimizing for is readability of the source code (the fact
> that it coincides with "the time to write the log statement" is a
> lucky coincidence). I often find that I am looking at a function which
> spans several screens, even though it doesn't actually do much,
> because half of the function is logging. Normally, I would try to
> split this function up into smaller ones, but I find that the amount
> of work done by the function is actually quite reasonable and
> undivisible, but due to logging it gets spread out enormously.
> 
> I would be fine with some way of inhibiting decorations. But, I want
> to hear what you think about this alternative: Just have file names
> and line numbers always aligned. For lines, we could reserve four
> characters, for filenames 35 (only 7 files currently are longer than
> 35 chars -- these would get truncated, or misaligned, as you see fit).
> 
I would like this to be an option you can enable and not require it to be on.

> 
>> 
>>> c) the need to write if(log) everywhere -- also increases visual noise
>> 
>> Sometimes you need to do some preliminary calculations to gather the materials you want to log, so the code in the "if (log)" is not trivial.  It has to be inside the "if (log)" however, since you only want to do logging computations if you are logging.  I'd really rather not have to write this in a macro as that makes it hard to read and harder to debug.  That doesn't argue against adding these macros, but it does argue that we need to keep the old form.
>> 
>> Aesthetically, your form does not make it clear that the logging statements are not getting done if logging is off.  I don't want somebody casually working on the code not to log something because they were unsure of this.  We could address that through documentation, a solution whose goodness is weighted by the faith you have in how many people read this sort of documentation.
> 
> I would assume people who care about performance to this level will be
> familiar with the performance characteristics of logging. For me this
> potential confusion is less important than making the code flow
> nicely, although I see how you may have different opinion.

Different parts of the code do different amounts of logging. The expression parses is quite verbose in its logging, but this logging is required to figure out bugs that people run into.

> 
>>> - LLDB_LOG() vs. LLDB_LOG(log) -- All our log variables are named log,
>>> so maybe we don't have to reference them explicitly? I couldn't decide
>>> whether that helps or hurts readability. (I wanted to keep the log
>>> variables, instead of specifying the log flags explicitly, as that
>>> means individual log statements can be very cheap, and short.)
>> 
>> I'd rather keep this explicit.  I don't know if we do it ever, but it is totally reasonable that one function might want to funnel information to two different channels.  Also, it makes explicit the connection between the log object and the channel it is opened for.  It is a bit weird to have a seemingly unused log creation line, then a bunch of logs later on that don't look connected to it.  The more oddities you put in the code the harder it is for a new engineer to begin to grok it.
> 
> OK.
> 
> 
>> 
>> For starters, it has builtin support for formatting ranges.  So if you're ever looping over entities formatting them, as long as the entity itself has a formatter defined, you can simply pass it directly to formatv and it will do the right thing.  (It has some customization support built in as well so you can control how to separate items).
>> Secondly, When you have to do some work to get the value you want to format, you could either define a custom formatter for the type in question, and the formatter does all the work, or you could write a function to return the value you want to format, and call that in the argument list, as long as the return type has an associated formatter.
>> I'm sure there will still be cases where you want to use the old method, but I think this would handle quite a few of them and reduce the amount of code you have to read / write when dealing with logging code.
> 
> Agreed. We're bound to run into cases where we want to have code that
> is run only when logging is enabled, but it isn't directly a logging
> statement. So we should have some form of "if(log)", but it should not
> need to be used that often.

It should be a formatter option for collections to allow you to print out different ranges. If I have a target and a target knows how to print the process and the process knows how to print the threads and the threads know how to print the stack frames, it doesn't mean I always want to see everything. Sounds like we can easily control this in the llvm::formatv case with extra formatting options:

This might print just a summary of the target:

llvm::formatv("{0}", target);

This might print the target and info about the process:

llvm::formatv("{0;process}", target);

The question becomes, can we then forward arguments into the "Process" logger to control if the threads are displayed?

llvm::formatv("{0;process{threads{frames}}}", target);

This would allow recursive formatters to be called with specified arguments. Just a thought, but if we are going to delve into recursive formatters, we might need to think about that.

> 
>> So the stream stuff all now needs to mark where the logging begins and ends:
>> 
>>   if (log) *log << log::start_message << 1 << ' ' << 2 << ' ' << 3 << '\n' << std::end_message;
>> And that starts to ruin the nice concrete log calls we had with log->Printf() since each call to log->Printf() was a complete message. I am open to ideas on how to solve this. Anyone have any ideas?
> 
> 
> So, the way I solve that in my mockup is by having a temporary object.
> It's constructor does any initialization work necessary (in my case,
> printing file and line number). In the destructor I flush the log
> message to the underlying streams. You could use the same approach,
> but synchronize via mutexes, or whatever else you want.

So the file and line number should be done by the logging logic probably at the same time as any of the other options (pid/tid, thread name, timestamp, backtrace, etc).