[cfe-dev] Static Analyzer Rocks Hard

Ted Kremenek kremenek at apple.com
Sun Jun 22 23:05:18 PDT 2008


On Jun 19, 2008, at 10:36 PM, Matthew Jimenez wrote:

>
> On Jun 19, 2008, at 4:54 PM, Ted Kremenek wrote:
>
>>
>> On Jun 19, 2008, at 1:03 PM, Matthew Jimenez wrote:
>>>
>>> Yes, this was what I was thinking, however I am unsure how I feel
>>> about it.
>>> Obviously any case where the allocation or initialization fail here
>>> would cause the code to misbehave, and this is something I had not
>>> taken into full consideration.
>>> Asserting the values after allocating them causes the analyzer to
>>> not report any problems, but that might be overkill.
>>
>> Assertions are funny things.
>
> I think that's my new favorite quote ;-)
>
>> If you know that the allocation could never fail then the assertion
>> properly silences the warning.  If the allocation *could* fail, it
>> merely turns an error that could be caught with static analysis into
>> a run-time error (i.e., the assertion fires, causing the program to
>> halt).
>>
>>> Perhaps what I'm more interested in are flags for the analyzer for
>>> assumptions it may be allowed to make.
>>> Normal behavior assuming allocation is always working and a flag to
>>> assume allocation could fail (or perhaps the reverse).
>>> Although, too many such flags might also be quite bad.
>>
>> Most code is not engineered to handle the case when memory
>> allocation fails.  When memory allocation starts to fail, programs
>> tend to do weird things since they don't know how to recover.  For
>> such programs, flagging leaks (or any other kind of bug) that occur
>> when memory allocation fails are likely going to be deemed as false
>> positives.
>>
>> That said, there is code out there that does care when allocations
>> fail.  Such code should be designed to handle and recover from
>> failure.  For such programs, the most robust test of the code is to
>> allow any arbitrary allocation to fail.  For such code, users can
>> silence particular warnings that they are 100% certain wouldn't fail
>> using assertions (which would result in run-time errors if they
>> actually did occur).  Writing code, however, that assumes that any
>> arbitrary allocation can fail is difficult at best.
>>
>> In order to be pragmatic, the analyzer assumes that an allocation
>> has succeeded *unless* you check for failure (i.e., a pointer is
>> tested for null).  This is exactly what happens in the test case;
>> since "sql" is tested against null on every iteration of the loop,
>> the analyzer assumes that the allocation could fail.
>
> This behavior does make sense, but should it really extend past the
> first usage of the of the variable or after the first pass of a loop?
> There may be cases where extending it might be a good idea, but I'd
> imagine those cases too be very rare in code.

I myself am often surprised how logic bugs can cross loop iterations.   
Programmers sometimes make assumptions that hold on the first loop  
iteration and not the second.

>> I believe that your test case, however, intended the "if (!sql)" to
>> really test to see if "sql" had not been assigned a new value after
>> its initial declaration.  We could potentially engineer the analyzer
>> to handle such cases.  For example, sql is initialized with the
>> value 0, so we can prove that the branch would unconditionally be
>> taken on the first branch of the loop, and thus we shouldn't treat
>> that particular branch like it was testing for the success of the
>> allocation.  Such heuristics go into the realm of black art, which
>> is fine if they are the right thing to do in 99% of the cases, but
>> in this case it may silence more real bugs than intended.
>
> I'm inclined to agree with the above - it is tricky and probably not
> needed at this point.
> At the moment, I can't think of additional real bugs that could be
> silenced, but I'm probably being too short-sighted.
>
>> So, I really see two options.  We can add a flag indicating that
>> allocations are assumed not to fail, or we just keep the current
>> behavior.
>
> After reading the description of the behavior above, I'm thinking a
> flag would not be a good idea.
>
>> Assuming the checker is run using the current behavior, it looks
>> like it is doing exactly the right thing.  From my perspective it
>> probably makes the most sense to add an assertion after the
>> allocation to "sql" since that really is the assumption you are
>> making.  In this way the code itself documents your assumptions.
>> Alternatively, the test case could be restructured, etc., but an
>> assertion seems like an easy answer and would make your code more
>> documented.
>
> Adding the asserts would probably be the best case in the projects I
> work on, but there are going to be many cases of this, and the
> original case was a little confusing at first glance.

Just a thought: In a way, adding a flag to the static analyzer to  
assume that all allocations succeed is almost the same as adding an  
assertion after every allocation.  The only difference is that there  
is no run-time check that an assertion provides.  The question is  
whether or not observing the assertion actually firing would be  
useful, since it is often obvious that a program is running out of  
memory.  This of course depends on the application and the preferences  
of the developer.

While I don't like the idea of having a proliferation of options to  
the checker which could overly complicate its interface, it does seem  
the me that "assuming allocations don't fail" would be a very common  
escape hatch when suppressing false positives for memory leaks and  
other bugs.  It's an assumption that holds 99% of the time --- except  
when it doesn't --- in which case most programs are screwed because  
they aren't written to handle such failure (this isn't really a bug,  
but a design decision).  Having a flag for this to the static analyzer  
allows one to assert this across an entire project.  In many ways its  
a personal decision about how software should be written, and whether  
or not a bug-finding tool should always scold you about it.

Just a thought.

>> What do you think?  We're trying to make the tool as useful as
>> possible.  Different people will want different operating models for
>> the tool, but another benefit of the tool is that it makes one think
>> about their assumptions and how their code could be improved,
>> especially since some assumptions are not always correct.
>
> It's definitely made me think about the way I code, but it would make
> me think the best option would be to always assert against all
> allocations - which would be madness in a large pre-existing code
> base :-)

I think that's awesome if you (and others) want to document your code  
so thoroughly with assertions.  The problem I see is that you may need  
a lot of these assertions since allocations are frequent.  I'm  
concerned that some programmers may feel that it is necessary to add  
an assertion after every allocation just to reduce the noise from the  
checker.  That's an extra line of code for every allocation; that's  
cumbersome for both pre-existing and new code.

I'm curious about the following: if I add an option to have the static  
analyzer assume that all allocations succeed, what portion of your  
false warnings disappear?  If it is a significant number, I think this  
option is worth including (or even it including it as the default  
behavior).  Would you be interested in running this experiment?

> Seriously though, my case may still be rare enough outside of the code
> I work on that it could be successfully ignored. I'll gladly ignore
> the bug if it comes to that. I'm already ignoring some cases of dead
> stores and memory leaks that are in code that was not relevant
> (obvious because the original author named the function
> "never_called_but_needed"), and I'm not aiming for complete
> coverage... at least not yet.

I really appreciate your perspective.  As you continue to use the  
static analyzer (especially as it evolves) I'm very interested to hear  
any feedback on how the workflow of the tool can be improved to fit  
into your model of finding and fixing bugs.

Ted




More information about the cfe-dev mailing list