[cfe-commits] strncpy checker - proposed patch

Lenny Maiorani lenny at Colorado.EDU
Wed Feb 16 18:39:19 PST 2011


Ted,

This checker certainly is getting a bit complicated. Maybe it is time for a good scrubbing. Sorry about that patch that didn't apply cleanly to TOT. There have been some big changes in that section, so I am not surprised. More comments inline below.

-Lenny


On Feb 11, 2011, at 9:46 PM, Ted Kremenek wrote:

> Hi Lenny,
> 
> This is looking better.  The patch doesn't apply cleanly to TOT, so would it be possible to regenerate it?  Some of the patch doesn't really match with the current contents of the checker, so it's hard to evaluate.
> 
> A few comments:
> 
> - Could you add comments about what the 'IsPotential' flag is for?  This checker is really lacking in comments, and the logic is starting to look really complicated.

In addition to adding some comments, I have a question. With strncpy(), obviously the dest buffer needs to be large enough for the number of bytes being copied. The number of bytes to be copied is the lesser of strlen(src) and the max value, size_t n, the 3rd argument to strnlen(). Should we also have another check to ensure that the size_t n (3rd argument) is always less than or equal to (<=) the size of the destination buffer? 

This is what I was adding and using the IsPotential flag to indicate that it is a different bug needs to be reported since this is only a potential future bug, and not a current problem. Do we even want that? It is making the code more confusing, certainly.

> 
> - While I can't quite tell because the patch doesn't apply correctly, the following code bothers me a bit:
> 
>> +  NonLoc * lenValNL;
> 
>> +  SVal lenVal;
>> +  bool checkPotentialLen = false;
>>    if (isStrncpy) {
>>      // Get the max number of characters to copy
>>      const Expr *lenExpr = CE->getArg(2);
>> -    SVal lenVal = state->getSVal(lenExpr);
>> -
>> +    lenVal = state->getSVal(lenExpr);
>> +    
>>      NonLoc * strLengthNL = dyn_cast<NonLoc>(&strLength);
>> -    NonLoc * lenValNL = dyn_cast<NonLoc>(&lenVal);
>> +    lenValNL = dyn_cast<NonLoc>(&lenVal);
> ... <SNIP>
>> +    // Max number to copy is greater than the length of the src buffer. So
>> +    // also check that it is still <= length of destination buffer.
>> +    if (checkPotentialLen) {
>> +      SVal lastElement =
>> +        C.getSValBuilder().evalBinOpLN(state, BO_Add, *dstRegVal,
>> +                                       *lenValNL, Dst->getType());
>> +      
>> +   
> 
> 
> I'm not a huge fan of declaring variables (e.g., lenValN:) and conditionally initializing them on one branch, and then conditionally using them later on another branch.  I often feel that makes the logic of the checker not well-composed, difficult to follow, and error prone.  I can't make more specific comments since the patch doesn't apply cleanly, but if the method probably could be further factored into additional methods where the shared logic was composed using calls to sub-functions rather than a bunch of branches it would honestly be much easier to follow.

I am with you on the declaring of variables and conditionally initializing them. I was trying to avoid declaring them twice in different places and initializing them both times. This is only needed if we want the pessimistic checker described above and we want to keep the current checker design. It sounds like we are in argreement that this checker is more complicated than it needs to be though. I think I am slowly talking myself into completely rewriting it.

Thoughts?

> 
> On Feb 4, 2011, at 4:39 PM, Lenny Maiorani wrote:
> 
>> 
>> On Feb 3, 2011, at 3:13 PM, Lenny Maiorani wrote:
>> 
>>> 
>>> On Dec 21, 2010, at 9:52 AM, Ted Kremenek wrote:
>>> 
>>>> Hi Lenny,
>>>> 
>>>> Thank you for your patience.  Overall the patch looks great, but I'm a little confused about the following section:
>>>> 
>>>>>    // Get the string length of the source.
>>>>>    SVal strLength = getCStringLength(C, state, srcExpr, srcVal);
>>>>>  
>>>>> +  if (isStrncpy) {
>>>>> +    // Check if the number of bytes to copy is less than the size of the src
>>>>> +    const Expr *lenExpr = CE->getArg(2);
>>>>> +    strLength = state->getSVal(lenExpr);
>>>>> +  }
>>>>> +
>>>>>    // If the source isn't a valid C string, give up.
>>>>>    if (strLength.isUndef())
>>>>>      return;
>>>> 
>>>> This looks like an intermingling of logic that it's not clear should compose is this way.
>>>> 
>>>> At the beginning we (a) fetch a value for 'strLength', then (b) overwrite that value if 'isStrncpy' is true, and then (c) we check if strLength is undefined.   Both (a) and (b) look like competing logic.  If they are truly mutually exclusive, I rather have one, but not both, get computed.  This logic also looks slightly pessimistic, as the length of the string can be smaller than the max number of bytes specified to strncpy().  If the value retrieved at (a) is less than the value retrieved at (b), should we use the strLength from (a) and not (b)?  I can see the argument to always use the most pessimistic value, but then our error reporting should probably reflect that 'size_t n' argument to strncpy() is too large, and not necessarily that we have a buffer overflow.  That would make it clearer to the user what they actually need to fix in their code (i.e., while it might not be a buffer overflow, it is one waiting to happen, etc.).
>>>> 
>>>> Overall, this looks great.  I'd just like to iron these last details out a bit (and document the final design decision in the code itself with comments) so it's clear the checker is always doing what you intend and that the user understands why they are getting a warning for their code.
>>>> 
>>>> Cheers,
>>>> Ted
>>> 
>>> After a hiatus, I am back. Ted, you were correct. My patch was pessimistic. I have modified it to accurately reflect whether or not there is a buffer overflow. Now, it compares the size of the src buffer and the value of the size_t and takes the smaller. See attached patch.
>>> 
>>> Maybe there should be an additional check to see if the size_t (3rd arg) is larger than the size of dst. This would be more of a potential logic error waiting to happen when the code changed sometime in the future. This patch does not contain that.
>>> 
>>> -Lenny
>>> <strncpy-checker.diff>
>>> 
>>>        __o
>>>      _`\<,_
>>>     (*)/ (*)
>>> ~~~~~~~~~~~~~~~~~~~~
>>> 
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> 
>> This patch extends my previous patch to also check for a separate pessimistic case. It ensures that the size_t n (3rd arg to strncpy()) is less than the size of the destination buffer. It contains a different warning message than the other strict buffer overruns since this one is not actually a buffer overrun, only a chance of a buffer overrun in the future.
>> 
>> -Lenny
>> 
>> <strncpy-pessimistic-checker.diff>
>> 
>> 
>>        __o
>>      _`\<,_
>>     (*)/ (*)
>> ~~~~~~~~~~~~~~~~~~~~
>> 
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> 



the definition of open: "mkdir android ; cd android ; repo init -i git://android.git.kernel.org/platform/manifest.git ; repo sync ; make" - Andy Rubin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110216/759ec9ba/attachment.html>


More information about the cfe-commits mailing list