[cfe-dev] [RFC] new format string attributes

Arthur O'Dwyer via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 25 10:52:45 PDT 2020


On Wed, Mar 25, 2020 at 12:59 PM Aaron Ballman via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Wed, Mar 25, 2020 at 11:51 AM Marcus Johnson <
> marcusljohnson1991 at gmail.com> wrote:
> > > That doesn't answer why we need a new format archtype. The archtypes
> > > are used because we want the check to model behavior specific to some
> > > API. If I understand your proposal properly, you're not proposing to
> > > add anything like uprintf() to a C library (and such an API doesn't
> > > already exist), so adding a new archtype surprises me. I would have
> > > thought the existing archtypes would suffice, but maybe I'm still
> > > misunderstanding a part of your proposal.
>

In particular, if all you want to do is support
`__attribute__((format(printf, x, y)))` on function parameters that happen
to be of type `char8_t*`, `char16_t*` or `char32_t*`, that should be
trivial.  Just look at how Clang works for arguments of type `wchar_t*` and
copy that.

...Oh wait, it looks like neither GCC nor Clang actually implement
format-string checking for wchar_t format strings!
https://godbolt.org/z/Tk9YCA

    std::wprintf("%s", 42);  // no diagnostic emitted

So that would be a *very* good place to start, IMO. Once the code is in
place to format-check wide string literals, it should be trivial to extend
it to also format-check char{8,16,32}_t literals.
Here's the existing bug report: https://bugs.llvm.org/show_bug.cgi?id=16810

*Orthogonally*, you seem to be proposing that there should be some new
printf format specifiers besides %s %c %[ (for char) and %ls %lc %l[ (for
wchar_t).  This is not a Clang issue; this is a library-design issue that
you should think about as you write your library function that takes a
format string (you know, the one you want to apply __attribute__((format))
to).  If you are not writing a library function, then you have nothing to
apply the attribute to, and therefore there's no reason for you to need
*anything* changed.
You throw out the ideas of %us for char16_t, %Us for char32_t, and have no
suggestion for char8_t. However, you cannot use %us as a format specifier,
because printf already gives that sequence a valid meaning:

    printf("hello %us world", 42u);  // prints "hello 42s world"

My off-the-top-of-my-head idea is that you should take a hint from MSVC;
they provide %I32d, %I64d, etc., for integer types, so how about %C8s,
%C16s, and %C32s for Unicode character string types?  However, again, this
is an issue to think about as you design your `MyPrintfLikeFunction` within
your own codebase. Maybe you'll find that you don't even need a format
specifier for those types.

(FWIW, the C and C++ party line seems to be that no "%C16s" or "%C32s" is
needed, because the modern approach is to separate transcoding from output.
You shouldn't be printf'ing Unicode strings directly; you should be first
transcoding them into `char` or `wchar_t` strings, and then printf'ing or
wprintf'ing those strings. Personally I don't think that approach is very
helpful in practice, though.)

–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200325/66134fc2/attachment.html>


More information about the cfe-dev mailing list