[cfe-dev] Overzealousness of of -Wformat causes problems for LLDB, others

Mon May 21 10:06:20 PDT 2012

On Fri, May 18, 2012 at 10:11 PM, M.E. O'Neill <oneill at cs.hmc.edu> wrote:
> If you build LLDB on Linux, you get many many warnings from Clang following this pattern:
>
> /whereever/lldb/llvm/tools/lldb/source/Symbol/ClangASTType.cpp:1012:31: warning:
>      conversion specifies type 'long long' but the argument has type 'int64_t'
>      (aka 'long') [-Wformat]
>                s->Printf("%lli", enum_value);
>                           ~~~^   ~~~~~~~~~~
>                           %ld

This was actually brought up on the #llvm IRC channel the other week.
I agree, it is a pain. :/

> /whereever/lldb/llvm/tools/lldb/source/Target/ThreadPlan.cpp:142:103: warning:
>      conversion specifies type 'unsigned long long' but the argument has type
>      'uint64_t' (aka 'unsigned long') [-Wformat]
>  ...#%u: tid = 0x%4.4llx, pc = 0x%8.8llx, sp = 0x%8.8llx, fp = 0x%8.8llx, "
>                                                                  ~~~~~~^
>                                                                  %8.8lu

With a more recent Clang version, it will at least suggest "%8.8lx".

>
> /whereever/lldb/llvm/tools/lldb/tools/driver/Driver.cpp:915:85: warning:
>      conversion specifies type 'unsigned long long' but the argument has type
>      'lldb::pid_t' (aka 'unsigned long') [-Wformat]
>  ...(message, sizeof(message), "Process %llu %s\n", process.GetProcessID(),
>                                         ~~~^        ~~~~~~~~~~~~~~~~~~~~~~
>                                         %lu
>
> For x86_64 code OS X, and Linux on pid_t, int64_t and uint64_t are typedefs to 64-bit quantities.  Exactly what those typedefs are is up to the platform though -- for example, possible definitions for int64_t include:
>
>        typedef long      int64_t       // Used by Linux
>  or    typedef long long int64_t       // Used by OS X
>  or    typedef intmax_t  int64_t       // Would also work on Linux and OS X
>  or    typedef ssize_t   int64_t       // Would also work on Linux and OS X
>  or    typedef ptrdiff_t int64_t       // Would also work on Linux and OS X
>  or    typedef quad_t    int64_t       // Would also work on Linux and OS X
>
> All these types are basically the same, they're 64-bit signed integers.  But each one has its own length modifier for printf (i.e., l, ll, j, z, t, and q, respectively).
>
> Even though they're structurally identical, Clang's format string checks care (somewhat) about which length modifier is chosen; Clang (usually!?!) warns when you pass a long long to a format that wants a long (or vice versa), because although they're structurally identical the types are considered distinct.  GCC does this too.
>
> If you know your C standard, you may say that portable code should really use the relevant stdint.h macro, and thus the first example should been written:
>
>        s->Printf(PRId64, enum_value)
>
> but first, who actually does this, and second, that still leaves the question of what to do about pid_t, since it has no such macro.

For pid_t, I guess a portable solution would be to cast it to intmax_t
and print that with "%jd".

> Interestingly, if we use "%zd" as our format, Clang is permissive about it, allowing us to pass in both longs and long longs without a complaint (is this a bug or a feature? is it documented somewhere? GCC isn't permissive in the same way...),

What happens here is that for "%zu", Clang will expect the type that
size_t is typedefed to, which on my system is unsigned long. In C,
there is no built-in distinct type for size_t, but Clang does keep
track of which integer type it uses for sizes, i.e. the result of
sizeof(), etc.

We want the result of sizeof to be printable with "%zu", and therefore
don't strictly enforce that the type of the argument is a typedef with
name "size_t".

>From what I can tell, GCC does this too, i.e. "printf("%zu\n",
sizeof(int))" doesn't yield a warning, even though the type of the
return value from sizeof isn't actually size_t, but unsigned long.

> but it isn't permissive about the others, including strangeness like allowing "%ld" to be used with intmax_t but not "%llu" (this is a bizarre choice because if long long was actually bigger than long, intmax_t would have to be long long).

We're not pedantic enough to warn about using "%ld" with intmax_t on a
system where intmax_t is typedefed to long. We will warn when using
"%lld", because "long" and "long long" are distinct types even if
they're the same size.

> It seems to me that we ought to have two kinds of warnings for format strings, one for things that are actually problems, and a -Wpedantic-format for things that are technically wrong, but are not actually problem for the platform we're compiling on, like using "%jd" to print a long long.  Pedantic warnings might even include using "%ld" for the intmax_t type, because even if it is typedefed to long on this platform, it might not be on another platform and for that reason you should really be using "%jd".

We already have some format warnings under -pedantic that warn about
using non-ISO C features.

I agree that it would be nice to separate warning that are concerned
with portability from warnings about code that is broken on the target
machine. But I also think this could be pretty tricky.

Not sure how much this helps, but hopefully it at least explains the
situation a little.

Thanks,
Hans