[cfe-dev] [analyzer] Tracking values for generating bug reports

Kristóf Umann via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 8 14:05:17 PDT 2019


I got another round! The following new run names were added:

.* + all non-control dependency changes: [1, 2] applied
.*AFTER tracking conditions + all non-control dependency changes: [1, 2]
applied, "track-conditions" enabled
.*AFTER tracking conditions FINAL: "track-conditions" enabled, [1-8]
applied (also with ~10 or so NFC patches)
.*AFTER tracking conditions FINAL + DEBUG notes: "track-conditions" and
"track-conditions-debug" enabled, [1-8] applied (also with ~10 or so NFC
patches)

[1] https://reviews.llvm.org/D64232 [analyzer] Prune calls to functions
with linear CFGs that return a non-zero constrained value
[2] https://reviews.llvm.org/D64287 [analyzer] Track the right hand side of
the last store regardless of its value
[3] https://reviews.llvm.org/D64270 [analyzer][NFC] Prepare visitors for
different tracking kinds

[4] https://reviews.llvm.org/D64272 [analyzer] Note last writes to a
condition only in a nested stackframe
[5] https://reviews.llvm.org/D65287 [analyzer][CFG] Don't track the
condition of asserts
[6] https://reviews.llvm.org/D65575 [analyzer] Mention whether an event is
about a condition in a bug report part 1
[7] https://reviews.llvm.org/D65724 [analyzer] Don't make
ConditionBRVisitor events prunable when the condition is an interesting
field
[8] https://reviews.llvm.org/D65725 [analyzer] Mention whether an event is
about a condition in a bug report part 2

Conclusions:

Shockingly, no condition tracking related changes were made in the FINAL
run for either Bitcoin or Xerces, so we only have LLVM/Clang to talk about.
This time, I added comments, which, after clicking on a report, can be
retrieved by clicking on 'Comments' in the upper right corner. These
comments are far more precise than the notes I left previously.

http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL

The negatives:
1. Extra notes to functions following this pattern were very-very common:

bool a(int b) {
  return b == 0;
  // Assuming 'b' is equal to 0
  // Returning the value 1, which will be (a part of a) condition
}

void f(int b) {
  int *x = 0;
  if (a(b)) // call to 'a', returning from 'a'
    *x = 5; // nullptr dereference
}

These notes are totally meaningless, and they aren't pruned out by [1]
because the assumption notes are non-prunable (they explain a variable, b
in this case, that is marked as interesting, as the return value is tracked
by ReturnVisitor). A possible solution would be to prune linear functions
in ConditionBRVisitor as well similarly to [1], but then we'd lose the
assumption note. Instead, moving the non-prunable assumption note to the
actual collapse point would be better, but a quick look at the generated
ExplodedGraph reveals that this no longer an bug report construction issue:
the collapse point is in fact at the return statement, rather then the
condition.

So, I think the bug report should always explain how the analyzer thinks.
Ideally, this should be identical to how the bug could be reproduced, but
if it is not (in this case the assumption should be made when a(b) is the
condition), maybe the logic of the analysis should be changed instead.

Example: Notes 19-23,
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8c536e75492d2588898e03178a704efb&report=12672&subtab=8c536e75492d2588898e03178a704efb

2. I didn't come across this too often, but it happened:

int getInt(); int getInt2();

bool a() {
  return getInt2() && getInt();
  // Returning value, which will be (a part of a) condition
}

void f() {
  int *x = 0;
  if (a()) // call to 'a', returning from 'a'
    *x = 5; // nullptr dereference
}

This note wasn't pruned because a() has in fact more than 3 basic blocks,
but it really doesn't change anything. I guess we *could* prune it, but I'm
leaning on accepting it as a sacrifice.

Example: Notes 65-68,
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8cc2c55ab79efc36fcd2b1725648c2c3&report=12678&subtab=8cc2c55ab79efc36fcd2b1725648c2c3

3. Some function calls are lost due to retrieving information from nested
stack frames only, rather than different stack frames. I only encountered
this when the condition variable was invalidated: The "AFTER tracking
conditions WITH improved moderation" runs display the path to the
invalidation point, yet the FINAL runs do not.

Example: Displaying the invalidation point on note 9:
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=1d6a5c6fcca79b12b760b41924c9c2c3&report=7774&subtab=1d6a5c6fcca79b12b760b41924c9c2c3

4. Calls to operator!= in foreach loops -- Are these necessary? Is there
any, even articial cases where this is actually meaningful? This is pretty
much the only non-constructive addition of notes that are introduced only
because of condition tracking.

Example: Notes 13-17, 22-26, 37-41:
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=50cc2bed223dc4ba62f46d0665346a5e&report=12743&subtab=50cc2bed223dc4ba62f46d0665346a5e

Positives:

In general, the wording as the analyses were run may be imperfect, but they
indicate what the message is about very precisely. The path of execution is
much better explained, and even if the extra function call isn't
meaningful, the note messages state clearly that we're only talking about a
condition, I can skip them without being bothered by it too much. I am
given extra information without the cost of polluting the rest of the
report.

Here is a collection of my favorites, though I found most of the extra
notes nice:

1. Here, we encounter a nullpointer dereference issue. Op0 is assumed to be
equal to Op1, but only due to condition tracking do we explain that Op1 is
believed to be null, and THAT is why this assumption (Op1 == Op0) is so
important!

Notes 23-31:
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=6b5f3e43213b64eaef4a25a249c59baf&report=12955&subtab=6b5f3e43213b64eaef4a25a249c59baf

2. The seldom addition of extra "Returning value, which will be (a part of
a) condition" notes, love them!

Example: Note 13 of
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=5086ed6620709edc2d9a5ed4609e186d&report=12888&subtab=5086ed6620709edc2d9a5ed4609e186d

3. This one is weird, I'm not yet sure how this note came about, but yet
again, it isn't clear AT ALL in the original report why CountS will be a
source of a null dereference error. The extra note tells us that CountS is
non-null, but its pointee is.

Note 19:
http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=356d88cb062474fdf17584bf1451bffc&report=13412&subtab=356d88cb062474fdf17584bf1451bffc

Also, adding the actual mail that was lost ;)

On Wed, 24 Jul 2019 at 13:57, Kristóf Umann <dkszelethus at gmail.com> wrote:

>
>
> On Wed, 24 Jul 2019 at 04:29, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> Xerces - DOMRangeImpl.cpp: Ugh, are those "Value assigned to..." notes
>> triggered by invalidation? I have a plan of my own to wipe such notes out
>> (or at least formulate them more carefully), so i guess it's not an action
>> item for you (nor much of a regression), but still ugh. Another thing to
>> think about for those is to suppress the warning entirely when a region
>> that stores is important condition is first assumed to contain false, then
>> invalidated, then assumed to contain true.
>>
>
> Totally agreed.
>
>
>> Xerces - XMLString.hpp: Mmm, i guess passing unconstrained values as
>> parameters is also not very meaningful. It doesn't look like it's a *big*
>> problem, but i guess it's worth considering fixing eventually.
>>
>
> In this particular case, partly due to me not being familiar with the
> code, I found the note helpful, but in LLVM there were plenty of reports
> where I didn't feel this way, so I'll get that fixed.
>
>
>> > Assert conditions are not yet ignored, I have a neat solution locally
>>
>> Just to make sure - do you have a test case with control flow within the
>> condition, eg. `assert((a || b) && "the universe is collapsing")`?
>>
>
> I don't :) Thanks!
>
>
>> On 7/23/19 12:49 PM, Kristóf Umann wrote:
>>
>> I'm happy to delight you all with some more results! I was far more
>> strict on my judgement this time around, marking a report good when I
>> genuinely believed that the note made the report better.
>>
>> All the runs were made on monorepo commit
>> db4e335b8643056f5afa2a0d8d175d95b1fb7e28  with the following patches
>> applied:
>>
>> BEFORE tracking condition:  Without any changes being made, default
>> CodeChecker settings.
>> AFTER tracking conditions: Without any changes being made,
>> track-conditions set to true.
>> AFTER tracking conditions + DEBUG notes:  Without any changes being made,
>> track-conditions and track-conditions-debug set to true.
>> AFTER linear pruning (or similar run name): [1] applied, track-conditions
>> set to true
>> AFTER tracking conditions WITH improved moderation: [1, 2, 3, 4] applied,
>> track-conditions set to true (stupid name but it's gonna stay like this for
>> now :) )
>> AFTER tracking conditions WITH improved moderation: [1, 2, 3, 4] applied,
>> track-conditions and track-conditions-debug  set to true
>> [1] https://reviews.llvm.org/D64232 [analyzer] Prune calls to functions
>> with linear CFGs that return a non-zero constrained value
>> [2] https://reviews.llvm.org/D64287 [analyzer] Track the right hand side
>> of the last store regardless of its value
>>
>> [3] https://reviews.llvm.org/D64270 [analyzer][NFC] Prepare visitors for
>> different tracking kinds
>>
>> [4] https://reviews.llvm.org/D64272 [analyzer] Note last writes to a
>> condition only in a nested stackframe
>>
>> LLVM/Clang (no longer analyzing clang-tools-extra):
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Bitcoin:
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Xerces:
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Some conclusions:
>>
>>    - [4] Actually notes last writes in *different*, not *nested* stack
>>    frames. Gotta fix that.
>>    - [1] Returning an unconstrained value when the function is linear is
>>    *never* meaningful. Not for conditions, not for anything else. Gotta
>>    fix that too. It's worth talking about whether we should prune these for
>>    conditions even if the function isn't linear.
>>    - [1] This one gets rid of almost all the operator bool calls. Maybe
>>    in the rare case it doesn't, we should preserve them? I always planned to
>>    get rid of calls to operator bool, but now I'm hesitant.
>>    - Assert conditions are not yet ignored, I have a neat solution
>>    locally, but I haven't published or tested it yet.
>>
>>
> One more thing: If the condition is a binary operator, like (B || A) is
> assumed true because A is true, or (A && B) is assumed false because A is
> false, we should only track A.
>
>
>>
>>
>> There honestly isn't that much I have on my mind, I think after the next
>> round of analyses we should be good! I really should've made an analysis
>> with all the non-condition tracking changes to get an even better
>> perspective, so I'll do that next time.
>>
>> On Tue, 25 Jun 2019 at 04:00, Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> Yup, looks much better! Let's actually dig through other projects a bit,
>>> but results on LLVM look fairly perfect to me.
>>>
>>>
>>> > ASTDiagnostic.cpp
>>>
>>> Why wasn't the extra tracking removed in the moderate mode? There
>>> doesn't seem to be an obvious overwrite for `kind`. Is it reacting to an
>>> invalidation?
>>>
>>
>> The report is now untouched!
>>
>> > DominanceFrontierImpl.h
>>>
>>> Why does it track so much more expressions in the moderate mode (even if
>>> not producing any actual nodes for them)? Is it because you changed the
>>> de-duplication method from by-expr to by-node in between?
>>>
>>
>> I'll make a testcase for that. Honestly, I'm thinking of reimplementing
>> the visitor as well, so that I'll collect the control dependencies in a set
>> when the visitor object is created, and track them at BlockExit locations,
>> that should limit it, because yeah, I believe they originate from the
>> de-duplication.
>>
>> > CGObjCGNU.cpp
>>>
>>> I'd like to think more about this one. It doesn't look to me that the
>>> new notes are helpful here, for the reason that the problem is pretty much
>>> entirely between the definition of OID and the use, while all of the notes
>>> are outside that range. The report wouldn't become less valid (for a
>>> certain definition of valid) if we simply omit everything except the last
>>> three notes:
>>>   - 'OID' is initialized to a null pointer value
>>>   - Loop body skipped when range is empty
>>>   - Called ++ object pointer is null
>>> I don't see any immediate solutions here; i guess the right thing to do
>>> would be to simply learn how to chop off the irrelevant beginning of the
>>> path.
>>>
>>
>> This time around, BodyFarm.cpp is in the same boat. I really just made a
>> fundamentally bad (like, really bad) report even worse.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190808/7c50a304/attachment.html>


More information about the cfe-dev mailing list