[cfe-dev] vprintf(3) and "format string is not a string literal"

Ted Kremenek kremenek at apple.com
Mon Dec 17 06:58:31 PST 2007

On Dec 17, 2007, at 5:48 AM, Shantonu Sen wrote:

>> In  case you are not aware, format-string attacks are an extremely  
>> serious and often exploited attack vector for a hacker to  
>> compromise a system, and have been the source of much grief.  
>> Clearly the logmessage function in your example code snippet is a  
>> possible exception because it serves as a wrapper to vprintf (so  
>> this warning could be considered a false positive).  If the format  
>> string checker was more powerful (i.e., it performed an inter- 
>> procedural program analysis), it should similarly flag casual uses  
>> of the logmessage function where the format string is not specified  
>> with a string constant.
> There would still be cases where you register a function like  
> logmessage() with a system library API as a callback, so LLVM would  
> not know either at compile time or runtime what the callers are.

That's a good point.  What you suggest essentially requires a whole  
program analysis, and the issue that comes to my mind is whether or  
not doing a whole program analysis to perform a simple check like this  
makes it less practical in the typical compile/debug cycle and more of  
a check you perform once and a while.  Moreover, the issue of whether  
or not registering a function like logmessage in a callback is a good  
idea is ultimately up to the programmer (or whoever is setting the  
security policies for a code base).  One option is that we could use  
an attribute to silence such warnings (especially if they are indeed  
false positives).  This would require people to at least think about  
what they are doing before they silence the warning.

>> Your question specifically regarding vprintf is an interesting  
>> one.  Clearly vprintf was designed with the idea of passing in  
>> arguments from some other source, be it via parameters from a  
>> caller of the current function or pulled from a data structure.   
>> Depending on how careful you are, this can be okay (such as with  
>> the logmessage function), but in general it's just a bad idea.  The  
>> va_list argument to vprintf (and friends) allows a tremendous  
>> amount of flexibility in how these functions are used (specifically  
>> be a dynamically specified argument list almost always implies a  
>> dynamically specified format string); the horribly consequence is  
>> that this code is very difficult to check statically for  
>> correctness, and can be the source of awful security holes and  
>> other bugs when these functions aren't used properly.   
>> Unfortunately, people often underestimate how easy it is to screw  
>> up how these functions are used, either when they are called  
>> directly or called indirectly via wrappers.
> I think what I'd like the eventual behavior to be is:
> 1) Functions that call printf(3) directly should have the warning  
> enabled

Sounds good.

> 2) Functions that call vprintf(3) and all callers are known, enable  
> the warning

Again this ties in with my previous point.  Knowing all callers  
requires a whole-program analysis, and defeats the notion of making  
this a quick check.  Building an accurate callgraph statically is not  
always possible, so there is always a chance of emitting a false  
positive.  Further, once you are doing whole-program analysis you can  
do an entirely different kind of checking; e.g. actually track tainted  
data to see if they are ever used as a format string (this applies to  
both a static and dynamic analysis of the program).  The other problem  
is that often programs are "open"; e.g. what policy should we set when  
analyzing library code?

One other suggestion: functions like vprintf are not used frequently.   
Perhaps requiring the user to add an attribute to silence a warning  
(see below) is the simplest solution.

> 3) If the function calling vprintf(3) is itself annotated with  
> something like GCC's __attribute__((__format__ ... )), make it the  
> responsibility of the second-order caller to verify arguments

Again, this requires a whole-program analysis to relax the constraints  
to the second-order caller.  It's almost better to add an attribute to  
the first-order caller of vprintf.  Then the local checker would see  
that vprintf is called by such a function (and not emit a warning for  
the direct call to vprintf), and when checking the second-order  
callers they can consult the attribute of the wrapper function  
(assuming the attribute is in the header file).  This kind of  
attribute/annotation trick has been used before in other systems to  
bubble up through an API where the actual checking takes place.

Just to be clear, I have no aversion against doing a whole-program  
analysis, and if this is what makes the most amount of sense to do  
this kind of checking the "right way" then that's what should be  
done.   As I have already mentioned, doing a whole-program analysis is  
a different beast altogether, and would complicate issuing warnings.

> 4) Otherwise disable it.
> For comparison, GCC has:
>       -Wformat-nonliteral
>           If -Wformat is specified, also warn if the format string  
> is not a
>           string literal and so cannot be checked, unless the format  
> function
>           takes its format arguments as a "va_list".

Understood.  The issue is whether or not GCC has the best policy.  I  
think some of the ideas we have tossed around in this thread point to  
some better solutions in how to handle format functions that take its  
formal argument as a va_list.

>> Ultimately, we should probably make warnings like these an option.   
>> People can then decide their own policy on when such warnings are  
>> emitted.
> How about this:
> $ clang -std=c99 a.c
> a.c:10:23: warning: format string is not a string literal  
> (potentially insecure)
>        ret = vprintf(fmt, ap);
>              ~~~~~~~ ^
> 1 diagnostic generated.
> $ clang -std=c99 a.c -Wno-format-nonliteral
> $

Sounds reasonable.  I'll look at the patch today.  Thanks!


More information about the cfe-dev mailing list