<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 11, 2016, at 9:18 PM, Zachary Turner <<a href="mailto:zturner@google.com" class="">zturner@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Tue, Oct 11, 2016 at 8:59 PM Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div class="gmail_msg" style="word-wrap: break-word;">Hi,<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg">I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="gmail_msg" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><br class="gmail_msg m_-4054375192632283874Apple-interchange-newline"><div class="gmail_msg"><div dir="ltr" class="gmail_msg">A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">1.<span class="Apple-converted-space"> </span><b class="gmail_msg">Not type-safe.</b><span class="Apple-converted-space"> </span> Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have.</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.</div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"> <span class="Apple-converted-space"> </span>If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">llvm::format now fails to compile as well :)</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">However this does not address other issues, like: `format(“%d”, float_var)` </div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">2.<span class="Apple-converted-space"> </span><b class="gmail_msg">Not security safe. </b>Functions like sprintf() will happily smash your stack for you if you're not careful. </div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">3.<span class="Apple-converted-space"> </span><b class="gmail_msg">Not portable (well kinda). </b>Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">4.<span class="Apple-converted-space"> </span><b class="gmail_msg">Redundant.</b> <span class="Apple-converted-space"> </span>If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">5.<span class="Apple-converted-space"> </span><b class="gmail_msg">Not flexible.</b> <span class="Apple-converted-space"> </span>How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.</div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg">So I've been working on a library that will solve all of these problems and more.</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :)</div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">1. os << format_string("Test"); // writes "test"</div><div class="gmail_msg">2. os << format_string("{0}", 7); // writes "7"</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">3. os << format_string("{0} {0}", 7); // writes "7 7"</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary.</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">What about: printf(“%0$ %0$”, 7); </div></div></div></div></blockquote><div class="">Well, umm.. I didn't even know about that. And I wonder how many others also don't. How does it choose the type? It seems there is no d in there. </div></div></div></div></blockquote><div><br class=""></div><div>Sorry, I meant printf(“%0$d %0$d”, 7); </div><div><br class=""></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">4. os << format_string("{0:X}", 255); // writes "0xFF"</div><div class="gmail_msg">5. os << format_string("{0:X7}", 255); // writes "0x000FF"</div><div class="gmail_msg">6. os << format_string("{0}", foo_object); // fails to compile!</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">However, you can always define custom formatters for your own types. If you write:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">namespace llvm {</div><div class="gmail_msg"> <span class="Apple-converted-space"> </span>template<></div><div class="gmail_msg"> <span class="Apple-converted-space"> </span>struct format_provider<Foo> {</div><div class="gmail_msg"> <span class="Apple-converted-space"> </span>static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {</div><div class="gmail_msg"> <span class="Apple-converted-space"> </span>}</div><div class="gmail_msg"> <span class="Apple-converted-space"> </span>};</div><div class="gmail_msg">}</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples:</div></div></div></blockquote><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas.</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">Why add commas? Because of the “:N”?</div><div class="gmail_msg">This seems like localization-dependent: how do you handle that?</div></div></div></div></blockquote><div class="">Yes, it is localization dependent. That being said, llvm has 0 existing support for localization. We already print floating point numbers with decimals, messages in English, etc. </div><div class=""><br class=""></div><div class="">The purpose of this example was to illustrate that each formatter can have its own custom set of options. For the case of integral arithemtic types, those would be:</div><div class=""><br class=""></div><div class="">X : Uppercase hex</div><div class="">X- : Uppercase hex without the 0x prefix.<br class=""></div><div class="">x : Lowercase hex</div><div class="">x- : Lowercase hex without the 0x prefix</div><div class="">N : comma grouped digits</div><div class="">E : scientific notation with uppercase E</div><div class="">e : scientific notation with lowercase e</div><div class="">P : percent</div><div class="">F : fixed point</div><div class=""><br class=""></div><div class="">But for floating point types, a different set of format specifiers would be valid (for example, it doesn't make sense to print a floating point number as hex)</div></div></div></div></blockquote><div><br class=""></div><div>Not sure if it is the best example: hexadecimal is the default format for printing float literal in the IR I believe. But OK I see how it works!</div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">If you wrote your own formatter (as described earlier in #6, the field following the : would be passed in as the `Options` parameter, and the implementation is free to use it however it wants. The std::chrono formatter takes strings similar to those described in #11, for example.</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div class="gmail_msg" style="word-wrap: break-word;"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">What happens with the following?</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">os << format_string("{0:N}", -123.455);</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">You would get "-123.46" (default precision of floating point types is 2 decimal places). If you had -1234.566 it would print "-1,234.57" (you could change the precision by specifying an integer after the N. So {0:N3} would print "-1,234.566"). For integral types the "precision" is the number of digits, so if it's greater than the length of the number it would pad left with 0s. For floating point types it's the number of decimal places, so it would pad right with 0s.</div><div class=""><br class=""></div><div class="">Of course, all these details are open for debate, that's just my initial plan.</div></div></div></div></blockquote></div><br class=""></body></html>