[llvm-dev] RFC: General purpose type-safe formatting library

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 11 23:15:52 PDT 2016


This is awesome. +1

Copying a time-tested design like C#'s (and which also Python uses) seems
like a really sound approach.

Do you have any particular plans w.r.t. converting existing uses of the
other formatting constructs? At the very least we can hopefully get rid of
format_hex/format_hex_no_prefix since I don't think there are too many uses
of those functions.

Also, Since the format string already can embed the surrounding literal
strings, do you anticipate the use case where you would want to use `OS <<
format_string(...) << ...something else...`?
Would `print(OS, "....", ....)` make more sense?

-- Sean Silva

On Tue, Oct 11, 2016 at 6:22 PM, Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> A while back llvm::format() was introduced that made it possible to
> combine printf-style formatting with llvm streams.  However, this still
> comes with all the risks and pitfalls of printf.  Everyone is no-doubt
> familiar with these problems, but here are just a few anyway:
>
> 1. *Not type-safe.*  Not all compilers warn when you mess up the format
> specifier.  And when you're writing your own Printf-like functions, you
> need to tag them with __attribute__(format, printf) which again not all
> compilers have.  If you change a const char * to a StringRef, it can
> silently succeed while passing your StringRef object to printf.  It should
> fail to compile!
>
> 2. *Not security safe.  *Functions like sprintf() will happily smash your
> stack for you if you're not careful.
>
> 3. *Not portable (well kinda).  *Quick, how do you print a size_t?  You
> probably said %z.  Well MSVC didn't even support %z until 2015, which we
> aren't even officially requiring yet.  So you've gotta write (uint64_t)x
> and then use PRIx64.  Ugh.
>
> 4. *Redundant.*  If you're giving it an integer, why do you need to
> specify %d?  It's an integer!  We should be able to use the type system to
> our advantage.
>
> 5. *Not flexible.*  How do you print a std::chrono::time_point with
> llvm::format()?  You can't.  You have to resort to providing an overloaded
> streaming operator or formatting it some other way.
>
> So I've been working on a library that will solve all of these problems
> and more.
>
>
> The high level design of my library is borrowed heavily from C#.  But if
> you're not familiar with C#, I believe boost has something similar in
> spirit.  The best way to show it off is with some examples:
>
> 1. os << format_string("Test");   // writes "test"
> 2. os << format_string("{0}", 7);  // writes "7"
>
> Immediately we can see one big difference between this and llvm::format()
> / printf.  You don't have to specify the type.  If you pass in an int, it
> formats it as an int.
>
> 3. os << format_string("{0} {0}", 7); // writes "7 7"
>
> #3 is an example of something that cannot be done elegantly with printf.
> Sure, you can pass it in twice, but if it's expensive to compute, this
> means you have to save it into a temporary.
>
> 4. os << format_string("{0:X}", 255);  // writes "0xFF"
> 5. os << format_string("{0:X7}", 255);  // writes "0x000FF"
>
> 6. os << format_string("{0}", foo_object); // fails to compile!
>
> Here is another example of an improvement over traditional formatting
> mechanisms.  If you pass an object for which it cannot find a formatter, it
> fails to compile.
>
> However, you can always define custom formatters for your own types.  If
> you write:
>
> namespace llvm {
>   template<>
>   struct format_provider<Foo> {
>     static void format(raw_ostream &S, const Foo &F, int Align, StringRef
> Options) {
>     }
>   };
> }
>
> Then #6 will magically compile, and invoke the function above to do the
> formatting.  There are other ways to customize the formatting behavior, but
> I'll keep going with some more examples:
>
> 7. os << format_string("{0:N}", -1234567);  // Writes "-1,234,567".  Note
> the commas.
> 8. os << format_string("{0:P}", 0.76);  // Writes "76.00%"
>
> You can also left justify and right justify.  For example:
>
> 9. os << format_string("{0,8:P}", 0.76);  // Writes "  76.00%"
> 10. os << format_string("{0,-8,P}", 0.76);  // Writes "76.00%  "
>
> And you can also format complicated types.  For example:
>
> 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}",
> std::chrono::system_clock::now());  // writes "10/11/2016 18:19:11"
>
>
> I already have a working proof of concept that supports most of the
> fundamental data types and formatting options such as percents, exponents,
> comma grouping, fixed point, hex, etc.
>
> To summarize, the advantages of this approach are:
>
> 1) *Safe.*  If it can't format your type, it won't even compile.
> 2) *Concise.*  You can re-use parameters multiple times without
> re-specifying them.
> 3) *Simple.  *You don't have to remember whether to use %llu or PRIx64 or
> %z, because format specifiers don't exist!
> 4) *Flexible.*  You can format types in a multitude of different ways
> while still having the nice format-string style syntax.
> 5) *Extensible.*  If you don't like the behavior of a built-in formatter,
> you can override it with your own.  If you have your own type which you'd
> like to be able to format, you can add formatting support for it in
> multiple different ways.
>
> I am hoping to have something ready for submitting later this week.  If
> this interests you, please help me out by reviewing my patch!  And if you
> think this would not be helpful for LLVM and I should not worry about this,
> let me know as well!
>
> Thanks,
> Zach
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161011/22513d65/attachment.html>


More information about the llvm-dev mailing list