[lldb-dev] logging in lldb

Fri Dec 9 07:36:23 PST 2016

On Fri, Dec 9, 2016 at 4:00 AM Jason Molenda via Phabricator <
reviews at reviews.llvm.org> wrote:

>
> I'm not thrilled with the formatv reinvention of format specification.
> The printf formatters are a bizarre little invention, but it's a bizarre
> invention that we've all worked with for decades and everyone who works in
> C knows well.  The formatv format specification may be superior, but I
> don't want to see innovation here, I want clarity with the linga franca
> that every C programmer knows.  I think the barrier to adopting something
> non-standard is very high.
>
While everyone knows it well, somehow it's still complicated enough that
its usage is riddled with errors.  I can't even count the number of bugs
I've fixed related to incorrect usage of format specifiers.

>
> The formatv format specification string seems to explicitly mention the
> ordering of the arguments, like llvm::formatv("{0} {1} {2}", first, second,
> third).  I'm guessing this allows you to print the arguments in different
> order than they appear?  Like  llvm::formatv("{0} {2} {2} {1}", first,
> second, third)?  What's the benefit of this vrs. the uncertainty of which
> arguments are used in different parts of the string (short of counting them
> out by hand)?  If I have a formatv format specification like "{0} {1} {2}
> {3} {4}" and I want to insert an argument between 2 and 3, I need to
> renumber 3 and 4 now?  Or do I do "{0} {1} {2} {5} {3} {4}" and throw my
> argument on to the end?
>
Either / or would be fine.  BTW, this is not the primary advantage of
having the arguments be numbered.  The primary advantage is that it allows
you to print the same argument multiple times without computing it.
formatv("{0} {0} {0}", 7);   will print "7 7 7".

> If formatv could take something a LOT closer to printf, I'd be better with
> it.  Something like the python string formatters would be a real
> improvement - just %d for base 10 numbers, not having to worry about the
> size of the argument in the format string, for instance, and would still be
> readable & understandable by any C programmer out there.
>
I don't think familiarity with C is a compelling argument.  We're using C++
here, and the more C-isms we can get rid of the better, as long as there is
a compelling advantage.

BTW, you mentioned Python, but incidentally the %d %s style of string
formatting in Python is not recommended (being considered the "old" way),
and is replaced with a "new" way which, it turns out, is almost identical
to the syntax of formatv  (along with C#, Python is where I took
inspiration from in designing it)

Aside from the points mentioned above, formatv has some other nice
advantages.

1) Instead of silently compiling your code and then segfaulting when you
pass a type that printf doesn't know how to format, formatv will report a
helpful compiler error.

2) It allows you to write formatters for user-defined types, so that
instead of re-inventing this every time you write a format line, and
potentially passing many arguments to printf, you can pass one argument to
formatv, and have consistency in every single log line where you ever try
to print this.

Aside from not being familiar to C programmers, I believe it to be better
in every possible way.

On Fri, Dec 9, 2016 at 4:19 AM Pavel Labath <labath at google.com> wrote:

> There were additional comments on the lldb-commits thread. I am moving
> them here and responding:
>
> Jason Molenda wrote on the lldb-commits thread:
> > A couple of thoughts / two cents.
> > I don't mind the current if (log) { ... } style of logging, even with
> the PRIx64's and having to do filepath.GetPath().c_str() and all that. I
> like being able to do extra work in a if (log) block of code -- create a
> SymbolContext for something or whatever, and as I'm debugging that section
> of code I like to be able to put a breakpoint there and look more closely.
> If this is all in a preprocessor macro, a breakpoint is going to stop me on
> the if() and then I'm trying to step through a macro? And if my computation
> is down in a lambda I need to figure out how to put a breakpoint in that or
> step into that somehow to look at the values being computed. How do I
> create a SymbolContext my_symctx(...); and then include that in what I log
> if I have to do all of this in the formatv arguments?
>
> I am not proposing we completely eliminate the ability to do have
> blocks of code that run only when logging is enabled. I recognize that
> there are situations where that is necessary. However, 99% of our
> current logging code does not do that, and I am trying to reduce the
> footprint of that. So, I don't think you have to worry about this use
> case.
>
> > I'm not thrilled with the formatv reinvention of format specification.
> The printf formatters are a bizarre little invention, but it's a bizarre
> invention that we've all worked with for decades and everyone who works in
> C knows well. The formatv format specification may be superior, but I don't
> want to see innovation here, I want clarity with the linga franca that
> every C programmer knows. I think the barrier to adopting something
> non-standard is very high.
> > The formatv format specification string seems to explicitly mention the
> ordering of the arguments, like llvm::formatv("{0} {1} {2}", first, second,
> third). I'm guessing this allows you to print the arguments in different
> order than they appear? Like llvm::formatv("{0} {2} {2} {1}", first,
> second, third)? What's the benefit of this vrs. the uncertainty of which
> arguments are used in different parts of the string (short of counting them
> out by hand)? If I have a formatv format specification like "{0} {1} {2}
> {3} {4}" and I want to insert an argument between 2 and 3, I need to
> renumber 3 and 4 now? Or do I do "{0} {1} {2} {5} {3} {4}" and throw my
> argument on to the end?
> > Simply *ability* to put the arguments in any order is my concern. Of
> course I'm never going to do it, but I'm going to have to work on code
> where other people have done it.
>
> I think being able to refer to arguments directly i'a useful feature
> to have in some circumstances (printf also allows it, although with a
> syntax I always have to look up when I need it), but I agree it should
> be used sparingly. I don't think anyone will put the arguments out of
> order just for the fun of it. We can make that an official policy
> somewhere, if you're worried about it. I guess one of the reasons for
> the simplicity of the design was to only support the direct-reference
> versions (I admit I originally also though I could just write {}, and
> get the arguments in order).
>
> FWIW, if I had a printf statement with more than five arguments, I'd
> also have to count them by hand to make sure I got them matched right,
> especially if PRIx64's get involved.
>
> > I don't personally want to see the file / function / line numbers in my
> logging. It's fine if that's an option that other people can enable if they
> want it. There's going to be a growth in the size of our text for the
> constant strings for every log point having its filename and function, but
> it will probably not be significant (maybe the strings can be coalesced, I
> wouldn't bet one way or the other.)
>
> I am not sure what what be a good way to try this out, but I wouldn't
> be surprised if the overall size effect of this was a reduction in
> binary size -- the thing is, most of our logging statements contain
> this information already, but it's a part of the format string, so it
> definitely cannot be coalesced. If they were referenced as __FILE__,
> any sane compiler/linker will merge those references into a single
> string constant.
>
> I am not sure which part of the code base you are working with mostly,
> but in my experience most code adds file/function information already
> (RegisterContextLLDB seems to be the exception), and you have no way
> of turning it off. Currently we are going toward making that optional,
> so this would be an improvement in that regard as well.
>
>
> > If formatv could take something a LOT closer to printf, I'd be better
> with it. Something like the python string formatters would be a real
> improvement - just %d for base 10 numbers, not having to worry about the
> size of the argument in the format string, for instance, and would still be
> readable & understandable by any C programmer out there.
> Funnily enough, the inspiration for the syntax  actually did come from
> python, although not the % operator, but format() function (which is
> also the "new" way to do formatting in python). It was chosen because
> it is more powerful. I personally wouldn't go too overboard like some
> of the proposals floated around here, but I think in the common cases,
> the syntax is simple enough -- instead of %d you write {0}. If you
> happen to want hex, you write {0:x}. And the ability to format a
> StringRef without jumping through massive hoops is a killer feature
> for me.
>
> > If we're going to start using this alternate logging scheme, eventually
> there will be a push to use it everywhere, or someone will go through all
> the other classes in the code base and switch it out without consultation.
> We need to all sign on to this idea before it goes into the code base at
> all.
>
> This is why we are discussing it here. Does my response alleviate your
> concerns?
>
>
> Zachary Turner wrote on the lldb-commits thread:
> > Personally I'm not really a fan of source/line information at all.
> There is only a very small probability that anyone using a released LLDB
> (e.g. through Xcode) is going to have their log lines match up with ToT,
> because we're changing files all the time.  You change one thing on line
> 10, and now every other log line that comes after it, potentially tens of
> thousands of lines in the same file, are going to have their line
> information out of sync.
>
> This is the reason I propose to not embed (and assume the reason that
> the current log lines don't do it either) *line* information into the
> logs. However, file+function are changing much less frequently, and
> they are helpful, as often you just want to say "I've reached this
> place, with these variable values". This way, you don't have to write
> the first part out explicitly. Even if the function name changed
> during a refactor, it's still likely you will be able to find it under
> a similar name.
>
> cheers,
> pl
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20161209/d667f385/attachment-0001.html>