[cfe-commits] PATCH: Enhance array bounds checking
Kaelyn Uhrain
rikka at google.com
Fri Aug 5 14:35:42 PDT 2011
ping?
On Mon, Aug 1, 2011 at 3:33 PM, Kaelyn Uhrain <rikka at google.com> wrote:
> Hi Ted,
>
> I've gone ahead and attached the full patch to facilitate seeing all of the
> changes in better context.
>
>
> On Mon, Aug 1, 2011 at 2:18 PM, Ted Kremenek <kremenek at apple.com> wrote:
>
>> Hi Kaelyn,
>>
>> Sorry for the delay in responding. I'm trying to understand the
>> differences proposed between your last patch and the new one. When I look
>> at the "delta from patch set" from 2 to 3, I see changes that appear
>> unrelated to this feature, which I find confusing.
>>
>> What would help me is if I understood more of the high-level motivation of
>> the changes. What I see is calls to "CheckArrayAccess" moved to various
>> places, for example when checking the comma operator:
>>
>> 7578 CheckArrayAccess(lhs.get());
>> 7579 CheckArrayAccess(rhs.get());
>>
>> This feels a little kludgy. I'm not saying it is wrong; I just don't
>> understand the motivation. Could you provide some insight behind these
>> changes other than they got the particular test cases working?
>>
>
> The changes to get everything working properly (from what I had committed
> to svn and then reverted) are, except for the unary operator handling in
> CheckArrayAccess, pretty much just shuffling around when CheckArrayAccess is
> called. I agree that it feels a bit kludgy, but unfortunately it was the
> only way I could get CheckArrayAccess called once and only once for each
> array access for all the cases I could come up with (see the changes to
> test/SemaCXX/array-bounds.cpp).
>
> For example, with my original patch that had to be reverted "p = &u.a[3];",
> "++foo[5];", and "return --foo[6];" from array-bounds.cpp (lines 44, 185,
> and 187) do not trigger any warning. Additionally, "p == &u.a[2]" triggers a
> false warning (where "a" is a 2-element array). There are a few places like
> that where the warnings are triggered incorrectly, but even more places
> where CheckArrayAccess is never called--such as on the right hand side of
> assignments. It not being called at all in some cases is why e.g. line 43
> "short *p = &u.a[2];" did not give a warning.
>
> With just the additional changes I made within CheckArrayAccess and leaving
> the CheckArrayAccess call sites alone, "p = &u.a[3];", "++foo[5];", and
> "return --foo[6];" from array-bounds.cpp (lines 44, 185, and 187) still do
> not trigger any warning. The noticeable change is that "*(&u.a[2]) = 1;" and
> "*(&u.a[3]) = 1;" (lines 45 and 46) generate the warning twice. Likewise
> just changing where CheckArrayAccess is called, without the additional
> changes to CheckArrayAccess, leads to quite a few situations where the
> warning is never properly emitted because CheckArrayAccess was called with
> an unrecognized type of statement (15 instances in
> test/SemaCXX/array-bounds.cpp spread over the entire file--first case is on
> line 9 and the last on line 189).
>
> Finally, without either my original or revised patches, there are quite a
> few cases where the warning is never triggered, such as "int val = a[3];",
> "p = &u.a[3];", "array[const_subscript] = 0;", "(*array_ptr)[3] = 1;", and
> "++foo[5];" (lines 28, 44, 51, 67, and 185 of the version of
> test/SemaCXX/array-bounds.cpp with my patch applied).
>
> It took a few days of trial, error, tracing through call stacks, digging
> through code paths, and general frustration to figure out where to place the
> calls to CheckArrayAccess to get as much coverage as possible while avoiding
> emitting the same warnings multiple times. Hope that helps shed some light
> on the nature of my additional changes.
>
> Cheers,
> Kaelyn
>
>
>> Thanks so much,
>> Ted
>>
>> On Jul 27, 2011, at 4:18 PM, Kaelyn Uhrain wrote:
>>
>> Ted & Eric,
>>
>> After having to revert my array bounds enhancements (which had gone in as
>> r136046) due to false positives in cases like &array[array_size], I've
>> reworked the array bounds checking a bit and think I have it working
>> correctly now. I've uploaded the full set of changes including my reverted
>> changes to
>> http://codereview.appspot.com/4675068 and attached a diff of the changes
>> from what I'd committed and later rolled back.
>>
>> Cheers,
>> Kaelyn
>>
>> On Mon, Jul 25, 2011 at 6:17 PM, Kaelyn Uhrain <rikka at google.com> wrote:
>>
>>> Just realized my patch missed the new unit test file. d'oh!
>>>
>>>
>>> On Sat, Jul 23, 2011 at 11:35 AM, Ted Kremenek <kremenek at apple.com>wrote:
>>>
>>>> This all sounds good to me. I think the patch can go in as is.
>>>>
>>>> On Jul 21, 2011, at 4:04 PM, Kaelyn Uhrain wrote:
>>>>
>>>> Hi Ted,
>>>>
>>>> I've attached a new version of my patch that moves the pointer
>>>> arithmetic portion of the -Warray-bounds improvements into a separate flag,
>>>> -Warray-bounds-pointer-arithmetic. Since the remaining noise in the pointer
>>>> arithmetic check is more a case of undefined behavior that does the right
>>>> thing with most (all?) compilers than a case of being true false positives,
>>>> the plan is to have -Warray-bounds-pointer-arithmetic initially be disabled
>>>> by default and in later patches to enhance the warning by e.g. providing a
>>>> fixit note to add parens to exprs like "ptr + sizeof("foo") - 1" so they
>>>> become "ptr + (sizeof("foo")) - 1".
>>>>
>>>> Cheers,
>>>> Kaelyn
>>>>
>>>> On Thu, Jul 21, 2011 at 1:18 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>>
>>>>> Hi Kaelyn,
>>>>>
>>>>> While I think there will be some contention here, I think this is still
>>>>> too high of a false positive rate for this warning to be on by default in
>>>>> the compiler. Users simply aren't going to tolerate it. I think it is
>>>>> reasonable for a codebase to adopt a strict policy for this warning, but I
>>>>> don't think a 17% false positive rate (or possibly higher) is acceptable for
>>>>> a default warning for all users of Clang.
>>>>>
>>>>> Unless you believe you think there are additional heuristics to drop
>>>>> the false positive rate down below 5-10%, I think I'm fine with proceeding
>>>>> as putting this in as an opt-in warning, and then refine from there.
>>>>>
>>>>> On Jul 20, 2011, at 11:39 AM, Kaelyn Uhrain wrote:
>>>>>
>>>>> I've attached an updated version of my patch that better handles cases
>>>>> where pointer arithmetic is done after casting a constant-size array to a
>>>>> pointer for a smaller base type (e.g. casting an int array to char*). Of the
>>>>> pointer arithmetic warnings, about 24% could be considered false positives;
>>>>> however, the actual number of false positives is quite small and 2/3 of them
>>>>> stem from the use of a single macro--if you count those as a single warning
>>>>> & false positive, the rate drops to 17%. Of the false positives most are
>>>>> from semi-questionable pointer arithmetic where a constant greater than the
>>>>> length of the array/pointer is being added to the pointer and some int > 1
>>>>> being subtracted from it, e.g.:
>>>>>
>>>>> void foo(int n) {
>>>>> char x[5];
>>>>> if (n > 0) bar(x + 6 - n);
>>>>> }
>>>>>
>>>>> Cheers,
>>>>> Kaelyn
>>>>>
>>>>> On Tue, Jul 19, 2011 at 11:44 AM, Kaelyn Uhrain <rikka at google.com>wrote:
>>>>>
>>>>>> I'm still looking at the pointer arithmetic warnings to determine
>>>>>> which are false positives, but I realized I screwed up my previous stats a
>>>>>> bit as I forgot to account for duplicated/repeated warnings (e.g. the same
>>>>>> header included in multiple compilation units and so generating the same
>>>>>> warnings multiple times). For unique warnings, the stats are:
>>>>>>
>>>>>> - 16.5% increase in warnings from before my patch (originally reported
>>>>>> a 24% increase)
>>>>>> - 47% of those new warnings being about pointer arithmetic
>>>>>> - 6.7% of all of the warnings emitted with my patch applied are
>>>>>> concerning pointer arithmetic. (originally reported 8.3%)
>>>>>> - The new pointer arithmetic warnings represent a 7.7% increase in
>>>>>> warnings from before my patch, not 10%.
>>>>>>
>>>>>> I'll send another email once I have a feel for how noisy the pointer
>>>>>> arithmetic warnings are.
>>>>>>
>>>>>> Cheers,
>>>>>> Kaelyn
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 18, 2011 at 5:54 PM, Kaelyn Uhrain <rikka at google.com>wrote:
>>>>>>
>>>>>>> Ted,
>>>>>>>
>>>>>>> You're welcome. I'll try to figure out what fraction of the pointer
>>>>>>> arithmetic warnings are false positives (requires a bit of manual digging on
>>>>>>> my part to determine if the code is indeed buggy or if it is valid /
>>>>>>> intended). For the overall 24% increase in warnings, keep in mind that over
>>>>>>> half of that is the existing bounds checking now being applied to cases
>>>>>>> where it wasn't before, i.e.:
>>>>>>>
>>>>>>> char *foo[5];
>>>>>>> foo[77]; // -Warray-bounds already found
>>>>>>> &foo[77]; // -Warray-bounds currently misses
>>>>>>> *foo[77]; // -Warray-bounds currently misses
>>>>>>>
>>>>>>> The function that did the bounds checking would never catch the last
>>>>>>> two cases because it would see a UnaryOperator (in the above cases for the
>>>>>>> '&' and the '*') and skip the expression instead of looking inside the
>>>>>>> UnaryOperator expression for the array subscripting.
>>>>>>>
>>>>>>> The new pointer arithmetic bounds checking only represents a 10%
>>>>>>> increase in warnings--and IMHO that is the only part where the number of
>>>>>>> false positives introduced might be an issue.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Kaelyn
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 18, 2011 at 5:36 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>>>>>
>>>>>>>> Hi Kaelyn,
>>>>>>>>
>>>>>>>> Thanks for the statistics. What would be good to know is what
>>>>>>>> fraction of these are false positives (i.e., are these all real bugs)? A
>>>>>>>> small random sample might be helpful. A 24% increase in warnings is fairly
>>>>>>>> substantial, and we don't want to do that unless there is a real benefit.
>>>>>>>>
>>>>>>>> Ted
>>>>>>>>
>>>>>>>> On Jul 18, 2011, at 5:18 PM, Kaelyn Uhrain wrote:
>>>>>>>>
>>>>>>>> Ted,
>>>>>>>>
>>>>>>>> On Thu, Jul 14, 2011 at 5:04 PM, Ted Kremenek <kremenek at apple.com>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The only other issue: should this be controlled under a separate
>>>>>>>>> warning flag, at least initially so we can experiment with this new warning
>>>>>>>>> and see how noisy it is? E.g. "-Warray-bounds-pointer-arithmetic".
>>>>>>>>>
>>>>>>>>
>>>>>>>> I've tested the -Warray-bounds changes against the Google codebase
>>>>>>>> and my patch increases the number of warnings from -Warray-bounds by 24%. Of
>>>>>>>> the new warnings, 57.33% are for array indexes that most likely weren't
>>>>>>>> picked up before because of unary operators like & or * (11.1% of all the
>>>>>>>> warnings now emitted), and the remaining 42.67% are from out-of-bounds
>>>>>>>> pointer arithmetic (8.3% of all the warnings from -Warray-bounds).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kaelyn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> <array-bounds-enhancement3.diff>
>>>>>
>>>>>
>>>>>
>>>> <array-bounds-enhancement4.diff>
>>>>
>>>>
>>>>
>>>
>> <array-bounds-enhancement-changes.diff>
>> _______________________________________________
>>
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110805/09fe7aa5/attachment.html>
More information about the cfe-commits
mailing list