[cfe-commits] PATCH: Enhance array bounds checking

Kaelyn Uhrain rikka at google.com
Mon Aug 1 15:33:27 PDT 2011


Hi Ted,

I've gone ahead and attached the full patch to facilitate seeing all of the
changes in better context.

On Mon, Aug 1, 2011 at 2:18 PM, Ted Kremenek <kremenek at apple.com> wrote:

> Hi Kaelyn,
>
> Sorry for the delay in responding.  I'm trying to understand the
> differences proposed between your last patch and the new one.  When I look
> at the "delta from patch set" from 2 to 3, I see changes that appear
> unrelated to this feature, which I find confusing.
>
> What would help me is if I understood more of the high-level motivation of
> the changes.  What I see is calls to "CheckArrayAccess" moved to various
> places, for example when checking the comma operator:
>
>  7578   CheckArrayAccess(lhs.get());
>  7579   CheckArrayAccess(rhs.get());
>
> This feels a little kludgy.  I'm not saying it is wrong; I just don't
> understand the motivation.  Could you provide some insight behind these
> changes other than they got the particular test cases working?
>

The changes to get everything working properly (from what I had committed to
svn and then reverted) are, except for the unary operator handling in
CheckArrayAccess, pretty much just shuffling around when CheckArrayAccess is
called. I agree that it feels a bit kludgy, but unfortunately it was the
only way I could get CheckArrayAccess called once and only once for each
array access for all the cases I could come up with (see the changes to
test/SemaCXX/array-bounds.cpp).

For example, with my original patch that had to be reverted "p = &u.a[3];",
"++foo[5];", and "return --foo[6];" from array-bounds.cpp (lines 44, 185,
and 187) do not trigger any warning. Additionally, "p == &u.a[2]" triggers a
false warning (where "a" is a 2-element array). There are a few places like
that where the warnings are triggered incorrectly, but even more places
where CheckArrayAccess is never called--such as on the right hand side of
assignments. It not being called at all in some cases is why e.g. line 43
"short *p = &u.a[2];" did not give a warning.

With just the additional changes I made within CheckArrayAccess and leaving
the CheckArrayAccess call sites alone, "p = &u.a[3];", "++foo[5];", and
"return --foo[6];" from array-bounds.cpp (lines 44, 185, and 187) still do
not trigger any warning. The noticeable change is that "*(&u.a[2]) = 1;" and
"*(&u.a[3]) = 1;" (lines 45 and 46) generate the warning twice. Likewise
just changing where CheckArrayAccess is called, without the additional
changes to CheckArrayAccess, leads to quite a few situations where the
warning is never properly emitted because CheckArrayAccess was called with
an unrecognized type of statement (15 instances in
test/SemaCXX/array-bounds.cpp spread over the entire file--first case is on
line 9 and the last on line 189).

Finally, without either my original or revised patches, there are quite a
few cases where the warning is never triggered, such as "int val = a[3];",
"p = &u.a[3];", "array[const_subscript] = 0;", "(*array_ptr)[3] = 1;", and
"++foo[5];" (lines 28, 44, 51, 67, and 185 of the version of
test/SemaCXX/array-bounds.cpp with my patch applied).

It took a few days of trial, error, tracing through call stacks, digging
through code paths, and general frustration to figure out where to place the
calls to CheckArrayAccess to get as much coverage as possible while avoiding
emitting the same warnings multiple times. Hope that helps shed some light
on the nature of my additional changes.

Cheers,
Kaelyn


> Thanks so much,
> Ted
>
> On Jul 27, 2011, at 4:18 PM, Kaelyn Uhrain wrote:
>
> Ted & Eric,
>
> After having to revert my array bounds enhancements (which had gone in as
> r136046) due to false positives in cases like &array[array_size], I've
> reworked the array bounds checking a bit and think I have it working
> correctly now. I've uploaded the full set of changes including my reverted
> changes to
> http://codereview.appspot.com/4675068 and attached a diff of the changes
> from what I'd committed and later rolled back.
>
> Cheers,
> Kaelyn
>
> On Mon, Jul 25, 2011 at 6:17 PM, Kaelyn Uhrain <rikka at google.com> wrote:
>
>> Just realized my patch missed the new unit test file. d'oh!
>>
>>
>> On Sat, Jul 23, 2011 at 11:35 AM, Ted Kremenek <kremenek at apple.com>wrote:
>>
>>> This all sounds good to me.  I think the patch can go in as is.
>>>
>>> On Jul 21, 2011, at 4:04 PM, Kaelyn Uhrain wrote:
>>>
>>> Hi Ted,
>>>
>>> I've attached a new version of my patch that moves the pointer arithmetic
>>> portion of the -Warray-bounds improvements into a separate flag,
>>> -Warray-bounds-pointer-arithmetic. Since the remaining noise in the pointer
>>> arithmetic check is more a case of undefined behavior that does the right
>>> thing with most (all?) compilers than a case of being true false positives,
>>> the plan is to have -Warray-bounds-pointer-arithmetic initially be disabled
>>> by default and in later patches to enhance the warning by e.g. providing a
>>> fixit note to add parens to exprs like "ptr + sizeof("foo") - 1" so they
>>> become "ptr + (sizeof("foo")) - 1".
>>>
>>> Cheers,
>>> Kaelyn
>>>
>>> On Thu, Jul 21, 2011 at 1:18 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>
>>>> Hi Kaelyn,
>>>>
>>>> While I think there will be some contention here, I think this is still
>>>> too high of a false positive rate for this warning to be on by default in
>>>> the compiler.  Users simply aren't going to tolerate it.  I think it is
>>>> reasonable for a codebase to adopt a strict policy for this warning, but I
>>>> don't think a 17% false positive rate (or possibly higher) is acceptable for
>>>> a default warning for all users of Clang.
>>>>
>>>> Unless you believe you think there are additional heuristics to drop the
>>>> false positive rate down below 5-10%, I think I'm fine with proceeding as
>>>> putting this in as an opt-in warning, and then refine from there.
>>>>
>>>> On Jul 20, 2011, at 11:39 AM, Kaelyn Uhrain wrote:
>>>>
>>>> I've attached an updated version of my patch that better handles cases
>>>> where pointer arithmetic is done after casting a constant-size array to a
>>>> pointer for a smaller base type (e.g. casting an int array to char*). Of the
>>>> pointer arithmetic warnings, about 24% could be considered false positives;
>>>> however, the actual number of false positives is quite small and 2/3 of them
>>>> stem from the use of a single macro--if you count those as a single warning
>>>> & false positive, the rate drops to 17%. Of the false positives most are
>>>> from semi-questionable pointer arithmetic where a constant greater than the
>>>> length of the array/pointer is being added to the pointer and some int > 1
>>>> being subtracted from it, e.g.:
>>>>
>>>> void foo(int n) {
>>>>   char x[5];
>>>>   if (n > 0) bar(x + 6 - n);
>>>> }
>>>>
>>>> Cheers,
>>>> Kaelyn
>>>>
>>>> On Tue, Jul 19, 2011 at 11:44 AM, Kaelyn Uhrain <rikka at google.com>wrote:
>>>>
>>>>> I'm still looking at the pointer arithmetic warnings to determine which
>>>>> are false positives, but I realized I screwed up my previous stats a bit as
>>>>> I forgot to account for duplicated/repeated warnings (e.g. the same header
>>>>> included in multiple compilation units and so generating the same warnings
>>>>> multiple times). For unique warnings, the stats are:
>>>>>
>>>>> - 16.5% increase in warnings from before my patch (originally reported
>>>>> a 24% increase)
>>>>> - 47% of those new warnings being about pointer arithmetic
>>>>> - 6.7% of all of the warnings emitted with my patch applied are
>>>>> concerning pointer arithmetic. (originally reported 8.3%)
>>>>> - The new pointer arithmetic warnings represent a 7.7% increase in
>>>>> warnings from before my patch, not 10%.
>>>>>
>>>>> I'll send another email once I have a feel for how noisy the pointer
>>>>> arithmetic warnings are.
>>>>>
>>>>> Cheers,
>>>>>  Kaelyn
>>>>>
>>>>>
>>>>> On Mon, Jul 18, 2011 at 5:54 PM, Kaelyn Uhrain <rikka at google.com>wrote:
>>>>>
>>>>>> Ted,
>>>>>>
>>>>>> You're welcome. I'll try to figure out what fraction of the pointer
>>>>>> arithmetic warnings are false positives (requires a bit of manual digging on
>>>>>> my part to determine if the code is indeed buggy or if it is valid /
>>>>>> intended). For the overall 24% increase in warnings, keep in mind that over
>>>>>> half of that is the existing bounds checking now being applied to cases
>>>>>> where it wasn't before, i.e.:
>>>>>>
>>>>>> char *foo[5];
>>>>>> foo[77];  // -Warray-bounds already found
>>>>>> &foo[77];  // -Warray-bounds currently misses
>>>>>> *foo[77];  // -Warray-bounds currently misses
>>>>>>
>>>>>> The function that did the bounds checking would never catch the last
>>>>>> two cases because it would see a UnaryOperator (in the above cases for the
>>>>>> '&' and the '*') and skip the expression instead of looking inside the
>>>>>> UnaryOperator expression for the array subscripting.
>>>>>>
>>>>>> The new pointer arithmetic bounds checking only represents a 10%
>>>>>> increase in warnings--and IMHO that is the only part where the number of
>>>>>> false positives introduced might be an issue.
>>>>>>
>>>>>> Cheers,
>>>>>> Kaelyn
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 18, 2011 at 5:36 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>>>>
>>>>>>> Hi Kaelyn,
>>>>>>>
>>>>>>> Thanks for the statistics.  What would be good to know is what
>>>>>>> fraction of these are false positives (i.e., are these all real bugs)?  A
>>>>>>> small random sample might be helpful.  A 24% increase in warnings is fairly
>>>>>>> substantial, and we don't want to do that unless there is a real benefit.
>>>>>>>
>>>>>>> Ted
>>>>>>>
>>>>>>> On Jul 18, 2011, at 5:18 PM, Kaelyn Uhrain wrote:
>>>>>>>
>>>>>>> Ted,
>>>>>>>
>>>>>>> On Thu, Jul 14, 2011 at 5:04 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> The only other issue: should this be controlled under a separate
>>>>>>>> warning flag, at least initially so we can experiment with this new warning
>>>>>>>> and see how noisy it is?  E.g. "-Warray-bounds-pointer-arithmetic".
>>>>>>>>
>>>>>>>
>>>>>>> I've tested the -Warray-bounds changes against the Google codebase
>>>>>>> and my patch increases the number of warnings from -Warray-bounds by 24%. Of
>>>>>>> the new warnings, 57.33% are for array indexes that most likely weren't
>>>>>>> picked up before because of unary operators like & or * (11.1% of all the
>>>>>>> warnings now emitted), and the remaining 42.67% are from out-of-bounds
>>>>>>> pointer arithmetic (8.3% of all the warnings from -Warray-bounds).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Kaelyn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> <array-bounds-enhancement3.diff>
>>>>
>>>>
>>>>
>>> <array-bounds-enhancement4.diff>
>>>
>>>
>>>
>>
> <array-bounds-enhancement-changes.diff>
> _______________________________________________
>
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110801/c265cadd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: array-bounds-enhancement-revised.diff
Type: text/x-diff
Size: 17549 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110801/c265cadd/attachment.diff>


More information about the cfe-commits mailing list