[llvm-dev] RFC: General purpose type-safe formatting library

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 11 21:47:56 PDT 2016


Ok, well another example would be if you pass a pointer.  The only valid
options are various flavors of hex.  You wouldn't want to print a pointer
in scientific notation, for example.

On Tue, Oct 11, 2016 at 9:30 PM Mehdi Amini <mehdi.amini at apple.com> wrote:

> On Oct 11, 2016, at 9:18 PM, Zachary Turner <zturner at google.com> wrote:
>
>
>
> On Tue, Oct 11, 2016 at 8:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> Hi,
>
> I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> A while back llvm::format() was introduced that made it possible to
> combine printf-style formatting with llvm streams.  However, this still
> comes with all the risks and pitfalls of printf.  Everyone is no-doubt
> familiar with these problems, but here are just a few anyway:
>
> 1. *Not type-safe.*  Not all compilers warn when you mess up the format
> specifier.  And when you're writing your own Printf-like functions, you
> need to tag them with __attribute__(format, printf) which again not all
> compilers have.
>
>
> I’m not very sensitive to the “not all compilers have” argument, however
> it is worth mentioning that the format may not be a string literal, which
> defeat the “sanitizer”.
>
>   If you change a const char * to a StringRef, it can silently succeed
> while passing your StringRef object to printf.  It should fail to compile!
>
>
> llvm::format now fails to compile as well :)
>
> However this does not address other issues, like: `format(“%d”,
> float_var)`
>
>
> 2. *Not security safe.  *Functions like sprintf() will happily smash your
> stack for you if you're not careful.
>
> 3. *Not portable (well kinda).  *Quick, how do you print a size_t?  You
> probably said %z.  Well MSVC didn't even support %z until 2015, which we
> aren't even officially requiring yet.  So you've gotta write (uint64_t)x
> and then use PRIx64.  Ugh.
>
> 4. *Redundant.*  If you're giving it an integer, why do you need to
> specify %d?  It's an integer!  We should be able to use the type system to
> our advantage.
>
> 5. *Not flexible.*  How do you print a std::chrono::time_point with
> llvm::format()?  You can't.  You have to resort to providing an overloaded
> streaming operator or formatting it some other way.
>
>
> It seems to me that there is no silver bullet for that: being for
> llvm::format() or your new proposal, there is some sort of glue/helpers
> that need to be provided for each and every non-standard type.
>
>
> So I've been working on a library that will solve all of these problems
> and more.
>
>
> Great! I appreciate the effort, and talking about that with Duncan last
> week he was mentioning that we should do it :)
>
>
>
> The high level design of my library is borrowed heavily from C#.  But if
> you're not familiar with C#, I believe boost has something similar in
> spirit.  The best way to show it off is with some examples:
>
> 1. os << format_string("Test");   // writes "test"
> 2. os << format_string("{0}", 7);  // writes "7"
>
> Immediately we can see one big difference between this and llvm::format()
> / printf.  You don't have to specify the type.  If you pass in an int, it
> formats it as an int.
>
> 3. os << format_string("{0} {0}", 7); // writes "7 7"
>
> #3 is an example of something that cannot be done elegantly with printf.
> Sure, you can pass it in twice, but if it's expensive to compute, this
> means you have to save it into a temporary.
>
>
> What about: printf(“%0$ %0$”, 7);
>
> Well, umm..  I didn't even know about that.  And I wonder how many others
> also don't.  How does it choose the type?  It seems there is no d in there.
>
>
> Sorry, I meant printf(“%0$d %0$d”, 7);
>
>
>
>
>
> 4. os << format_string("{0:X}", 255);  // writes "0xFF"
> 5. os << format_string("{0:X7}", 255);  // writes "0x000FF"
> 6. os << format_string("{0}", foo_object); // fails to compile!
>
> Here is another example of an improvement over traditional formatting
> mechanisms.  If you pass an object for which it cannot find a formatter, it
> fails to compile.
>
> However, you can always define custom formatters for your own types.  If
> you write:
>
> namespace llvm {
>   template<>
>   struct format_provider<Foo> {
>     static void format(raw_ostream &S, const Foo &F, int Align, StringRef
> Options) {
>     }
>   };
> }
>
> Then #6 will magically compile, and invoke the function above to do the
> formatting.  There are other ways to customize the formatting behavior, but
> I'll keep going with some more examples:
>
>
> 7. os << format_string("{0:N}", -1234567);  // Writes "-1,234,567".  Note
> the commas.
>
>
> Why add commas? Because of the “:N”?
> This seems like localization-dependent: how do you handle that?
>
> Yes, it is localization dependent.  That being said, llvm has 0 existing
> support for localization.  We already print floating point numbers with
> decimals, messages in English, etc.
>
> The purpose of this example was to illustrate that each formatter can have
> its own custom set of options.  For the case of integral arithemtic types,
> those would be:
>
> X : Uppercase hex
> X- : Uppercase hex without the 0x prefix.
> x : Lowercase hex
> x- : Lowercase hex without the 0x prefix
> N : comma grouped digits
> E : scientific notation with uppercase E
> e : scientific notation with lowercase e
> P : percent
> F : fixed point
>
> But for floating point types, a different set of format specifiers would
> be valid (for example, it doesn't make sense to print a floating point
> number as hex)
>
>
> Not sure if it is the best example: hexadecimal is the default format for
> printing float literal in the IR I believe. But OK I see how it works!
>
>
> If you wrote your own formatter (as described earlier in #6, the field
> following the : would be passed in as the `Options` parameter, and the
> implementation is free to use it however it wants.  The std::chrono
> formatter takes strings similar to those described in #11, for example.
>
>
>
> What happens with the following?
>
> os << format_string("{0:N}", -123.455);
>
>
> You would get "-123.46" (default precision of floating point types is 2
> decimal places).  If you had -1234.566 it would print "-1,234.57" (you
> could change the precision by specifying an integer after the N.  So {0:N3}
> would print "-1,234.566").  For integral types the "precision" is the
> number of digits, so if it's greater than the length of the number it would
> pad left with 0s.  For floating point types it's the number of decimal
> places, so it would pad right with 0s.
>
> Of course, all these details are open for debate, that's just my initial
> plan.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/c1e41e13/attachment-0001.html>


More information about the llvm-dev mailing list