[llvm-dev] RFC: General purpose type-safe formatting library

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 11 21:18:47 PDT 2016


On Tue, Oct 11, 2016 at 8:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:

> Hi,
>
> I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> A while back llvm::format() was introduced that made it possible to
> combine printf-style formatting with llvm streams.  However, this still
> comes with all the risks and pitfalls of printf.  Everyone is no-doubt
> familiar with these problems, but here are just a few anyway:
>
> 1. *Not type-safe.*  Not all compilers warn when you mess up the format
> specifier.  And when you're writing your own Printf-like functions, you
> need to tag them with __attribute__(format, printf) which again not all
> compilers have.
>
>
> I’m not very sensitive to the “not all compilers have” argument, however
> it is worth mentioning that the format may not be a string literal, which
> defeat the “sanitizer”.
>
>   If you change a const char * to a StringRef, it can silently succeed
> while passing your StringRef object to printf.  It should fail to compile!
>
>
> llvm::format now fails to compile as well :)
>
> However this does not address other issues, like: `format(“%d”,
> float_var)`
>
>
> 2. *Not security safe.  *Functions like sprintf() will happily smash your
> stack for you if you're not careful.
>
> 3. *Not portable (well kinda).  *Quick, how do you print a size_t?  You
> probably said %z.  Well MSVC didn't even support %z until 2015, which we
> aren't even officially requiring yet.  So you've gotta write (uint64_t)x
> and then use PRIx64.  Ugh.
>
> 4. *Redundant.*  If you're giving it an integer, why do you need to
> specify %d?  It's an integer!  We should be able to use the type system to
> our advantage.
>
> 5. *Not flexible.*  How do you print a std::chrono::time_point with
> llvm::format()?  You can't.  You have to resort to providing an overloaded
> streaming operator or formatting it some other way.
>
>
> It seems to me that there is no silver bullet for that: being for
> llvm::format() or your new proposal, there is some sort of glue/helpers
> that need to be provided for each and every non-standard type.
>
>
> So I've been working on a library that will solve all of these problems
> and more.
>
>
> Great! I appreciate the effort, and talking about that with Duncan last
> week he was mentioning that we should do it :)
>
>
>
> The high level design of my library is borrowed heavily from C#.  But if
> you're not familiar with C#, I believe boost has something similar in
> spirit.  The best way to show it off is with some examples:
>
> 1. os << format_string("Test");   // writes "test"
> 2. os << format_string("{0}", 7);  // writes "7"
>
> Immediately we can see one big difference between this and llvm::format()
> / printf.  You don't have to specify the type.  If you pass in an int, it
> formats it as an int.
>
> 3. os << format_string("{0} {0}", 7); // writes "7 7"
>
> #3 is an example of something that cannot be done elegantly with printf.
> Sure, you can pass it in twice, but if it's expensive to compute, this
> means you have to save it into a temporary.
>
>
> What about: printf(“%0$ %0$”, 7);
>
Well, umm..  I didn't even know about that.  And I wonder how many others
also don't.  How does it choose the type?  It seems there is no d in there.

>
>
> 4. os << format_string("{0:X}", 255);  // writes "0xFF"
> 5. os << format_string("{0:X7}", 255);  // writes "0x000FF"
> 6. os << format_string("{0}", foo_object); // fails to compile!
>
> Here is another example of an improvement over traditional formatting
> mechanisms.  If you pass an object for which it cannot find a formatter, it
> fails to compile.
>
> However, you can always define custom formatters for your own types.  If
> you write:
>
> namespace llvm {
>   template<>
>   struct format_provider<Foo> {
>     static void format(raw_ostream &S, const Foo &F, int Align, StringRef
> Options) {
>     }
>   };
> }
>
> Then #6 will magically compile, and invoke the function above to do the
> formatting.  There are other ways to customize the formatting behavior, but
> I'll keep going with some more examples:
>
>
> 7. os << format_string("{0:N}", -1234567);  // Writes "-1,234,567".  Note
> the commas.
>
>
> Why add commas? Because of the “:N”?
> This seems like localization-dependent: how do you handle that?
>
Yes, it is localization dependent.  That being said, llvm has 0 existing
support for localization.  We already print floating point numbers with
decimals, messages in English, etc.

The purpose of this example was to illustrate that each formatter can have
its own custom set of options.  For the case of integral arithemtic types,
those would be:

X : Uppercase hex
X- : Uppercase hex without the 0x prefix.
x : Lowercase hex
x- : Lowercase hex without the 0x prefix
N : comma grouped digits
E : scientific notation with uppercase E
e : scientific notation with lowercase e
P : percent
F : fixed point

But for floating point types, a different set of format specifiers would be
valid (for example, it doesn't make sense to print a floating point number
as hex)

If you wrote your own formatter (as described earlier in #6, the field
following the : would be passed in as the `Options` parameter, and the
implementation is free to use it however it wants.  The std::chrono
formatter takes strings similar to those described in #11, for example.


>
> What happens with the following?
>
> os << format_string("{0:N}", -123.455);
>

You would get "-123.46" (default precision of floating point types is 2
decimal places).  If you had -1234.566 it would print "-1,234.57" (you
could change the precision by specifying an integer after the N.  So {0:N3}
would print "-1,234.566").  For integral types the "precision" is the
number of digits, so if it's greater than the length of the number it would
pad left with 0s.  For floating point types it's the number of decimal
places, so it would pad right with 0s.

Of course, all these details are open for debate, that's just my initial
plan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/f43b2d57/attachment.html>


More information about the llvm-dev mailing list