[llvm-dev] RFC: General purpose type-safe formatting library

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 2 15:54:47 PDT 2016


* UDL Syntax is removed in the latest version of the patch
<https://reviews.llvm.org/D25587>.
* Name changed to `formatv` since `format_string` is too much to type.
* Added conversion operators for `std::string` and `llvm::SmallString`.

I had some feedback offline (not on this thread, unfortunately) that it
might be worth using a printf style syntax instead of this Python-esque
syntax.  FTR, I actually somewhat object to this, for a couple of reasons:

1) It makes back-reference syntax ugly.   "{0} {1} {0}" is much clearer to
me than "%0$ %1$ %0$".  The latter syntax is also not a very well known
feature of printf and so unlikely to be used by people with a printf-style
implementation, whereas it's un-missable with the python-style syntax.

2) I don't see why we should need to specify the type of the argument with
%d if the compiler knows it's an integer.  Even if the we can add
compile-time checking to make it error, it seems unnecessary to even
encounter this situation in the first place.  I believe the compiler should
simply format what you give it.

3) One of the most useful aspects of the current approach is the ability to
plug in custom formatters for application specific data types.  This is not
straightforward with a printf-style syntax.

You might be able to hook up a template-specialization like mechanic to the
processing of %s (similar to my current approach), but it's not obvious how
you proceed from there to get custom format strings for individual types.
For example, a formatter which can print a TimeSpan in different units
depending on style options you pass in.  This is especially useful when
trying to print ranges where you often want to be able to specify a
different separator, or control the formatting of the underlying type.
 (e.g. it's not clear how you would elegantly format a range of integers in
hex using this style of approach).

I'm open to feedback here, so if you have an opinion one way or the other,
please LMK.


On Tue, Nov 1, 2016 at 5:39 AM Zachary Turner <zturner at google.com> wrote:

> The big problem i see is that to get compile time checking without the UDL
> we're going to have to do something like FORMAT_STRING("{0}") where this is
> a macro. It just seems really gross. It is true that it is harder to find
> the documentation, but that could be alleviated by putting all of this in
> its own namespace like llvm::formatv, then one could search the namespace
> On Mon, Oct 31, 2016 at 11:41 PM Sean Silva <chisophugis at gmail.com> wrote:
>
> On Mon, Oct 31, 2016 at 5:21 PM, Chandler Carruth via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Mon, Oct 31, 2016 at 3:46 PM Zachary Turner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi all,
>
> Tentatively final version is up here: https://reviews.llvm.org/D25587
>
> It has a verbal LGTM, but I plan to wait a bit longer just in case anyone
> has some additional thoughts.  It's a large patch, but if you're
> interested, one way you can help without doing a full-blown review is to
> look at the large comment blocks in FormatVariadic.h and
> FormatProviders.h.  Here I provide a formal description of the grammar of
> the replacement sequences and format syntax.  So you can look at this
> without looking at the code behind it and see if you have comments just on
> the format language.
>
> Here's a summary of (most) everything contained in this patch:
>
> 1) UDL Syntax for outputting to a stream or converting to a string.
>     outs() << "{0}"_fmt.stream(1)
>     std::string S = "{0}"_fmt.string(1);
>
>
> I continue to have a strong objection to using UDLs for this (or anything
> else in LLVM).
>
> I think this feature is poorly known by many programmers. I think it will
> produce error messages that are confusing and hard to debug. I think it
> will have a significant negative impact on compile time. I also think that
> it will exercise substantially less well tested parts of every host
> compiler for LLVM and subject us to an increased rate of mysterious host
> compiler bugs.
>
> I also think it forces programmers to be aware of a "magical" construct
> that doesn't really fit with the rest of the language.
>
> It isn't that any of these issues in isolation cannot be overcome, it is
> that I think the value provided by the UDL specifically is substantially
> smaller than the cost.
>
> I would *very strongly* prefer that this is accomplished with "normal" C++
> syntax, and that compile time checking is done with constexpr when
> available. I think that will give the overwhelming majority of the benefit
> with dramatically lower cost.
>
>
> +1, the UDL seems a bit too automagical.
> `format_string("{0}", 1)` is not that much longer than
> `"{0}"_fmt.string(1)`, but significantly less magical.
>
> Simple example: what should I type into a search engine to find the LLVM
> doxygen for the UDL? I know to search "llvm format_string" for the format
> string, but just from looking at a use of the UDL syntax it might not be
> clear that format_string is even called.
>
> -- Sean Silva
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161102/3f9650a0/attachment.html>


More information about the llvm-dev mailing list