[cfe-dev] my experience with clang

Nuno Lopes nunoplopes at sapo.pt
Sun Jan 13 08:17:55 PST 2008


>> it is usually used like this:
>> zend_parse_parameters(ZEND_NUM_ARGS(), "s|l", &str, &str_len,  &number);
>
> So if I understand correctly, zend_parse_parameters has the following 
> postcondition:
>
> "return value" != FAILURE   =>    str == INITIALIZED, str_len == 
> INITIALIZED,
>
> "return value" == FAILURE   =>   str == UNINITIALIZED, str_len == 
> UNINITIALIZED

yes, and 'number' may or may not be initialized.


> What you would like to do is expand the "uninitialized values"  analysis 
> to take into account the "return value" so that you can flag  possible bad 
> uses of "str" and "str_len"?

exactly.


>> because I want to check if the parameters after the '|' are used  before 
>> initialization
>
> Let me see if I understand what you mean.  After a call to 
> "zend_parse_parameters", you want to track the possible initialized/ 
> uninitialized state of the "str" and "str_len" arguments (which  depends 
> on the "return value" of zend_parse_parameters).  If you use  "str" or 
> "str_len" (or whatever other variables were used as  arguments) if they 
> could be in the "uninitialized" state, you want to  flag an error.  Is 
> this what you mean?

yep :)


>> and if the ones before are not initialized unnecessarily.
>
> This one I'm not certain what you mean.  I'm not certain what you mean  by 
> "not initialized unnecessarily."

expanding the previous example:

1: char *str = NULL;
2: int str_len, number = 3;
3:
4: if (zend_parse_parameters(ZEND_NUM_ARGS(), "s|l", &str, &str_len, 
&number) == FAILURE) {
5:     return;
6: }
7:
8: printf("got the string: %s and the number: %d\n", str, number);

in this case the 'str' didn't need to be initialized, because it is 
guaranteed that after line 6 it was filled in by zend_parse_parameters. 
'number' needs to be initialized, because it is used in line 8 and it isn't 
guaranteed that zend_parse_parameters will fill it in.


> I'm not proposing, however, that we implement ESC/Java for clang, 
> although a subset of those features might be extremely useful, as it  is 
> better to encode such properties concerning the contract associated  with 
> a function's interface in the actual source code (e.g. header  files) 
> instead of hardwiring such knowledge into a specific tool.   This not only 
> allows the tool to become more extensible as more code  is annotated, but 
> also means that the knowledge is more portable, and  doesn't die out when 
> a specific tool dies out.

Uhm, interesting.. I wasn't aware of this ESC/Java tool. I'll investigate it 
further, thanks.


> The other thing that I would like to mention is that the particular 
> property you are describing is a little more than extending a flow- 
> sensitive uninitialized values analysis.  Because the uninitialized/ 
> initialized state of "str" and "str_len" depends on the return value  of 
> zend_parse_parameters, it almost inherently becomes a path- sensitive 
> property if you want to check it with any real precision.   We will likely 
> extend the uninitialized values analysis to work in the  new 
> path-sensitive dataflow engine that we are building; in that case  adding 
> such information might actually be pretty easy and should give  you the 
> precision that you need to not spit out too much noise to the  user.

Yes, you are right :) But in this case the usage of that function is pretty 
standard. If it returns FAILURE, the code simply returns. So most cases can 
be handled with this heuristic. Anyway it'll report much less 
false-positives than my current regex-based script :)
But sure, I'm waiting for your path-sensitive solver, so that I can trash 
mine :P


Thanks,
Nuno 




More information about the cfe-dev mailing list